Quantify approximation error and calibration of a PN Method¶

This notebook explains how to use the functions in probnum-evaluation to quantify the approximation error as well as the uncertainty calibration of the output of a probabilistic numerical method.

[1]:

from probnum import quad, diffeq
import numpy as np

np.random.seed(1)

[2]:

from probnumeval import multivariate, timeseries

One-dimensional example¶

The first example shall be Bayesian quadrature, which computes a probabilistic numerical solution of (e.g.) the integral

\[F = \int_0^\pi \sin(x) \,\text{d}x.\]

The true solution of this integral is \(F = 2.0\). Begin by setting the problem up.

[3]:

input_dim = 1
domain = (0, np.pi)

# Define the integrand
def f(x):
    return np.sin(x)


# True solution of \int_0^\pi sin(x) dx is 2.0
sol = 2.0

The solution of this integration problem is computed with probnum.quad (https://probnum.readthedocs.io/en/latest/public_api/quad.html). Let us only use few function evaluations, so the solution is sufficiently inaccurate and the covariance of the output will be \(\gg 0\).

[4]:

F, _ = quad.bayesquad(fun=f, input_dim=input_dim, domain=domain, nevals=5)

In ProbNum-Evaluation, finite-dimensional outputs (such as \(F\)) are quantified with probnumeval.multivariate. Among other things, this module contains error measures such as root mean-square errors (RMSEs), mean absolute errors, or their relative counterparts. For instance, the RMSE and relative RMSE are computed as follows.

[5]:

rmse = multivariate.rmse(F.mean, reference_solution=sol)
relative_rmse = multivariate.relative_rmse(F.mean, reference_solution=sol)
print(rmse, relative_rmse)

0.06127865361488327 0.030639326807441636

We can also check how well the covariance quantifies the approximation error – for instance, with the averaged normalised estimation error squared (ANEES). A value close to 1 is good.

[6]:

anees = multivariate.anees(F, reference_solution=sol)
print(anees)

0.12591126045245504

Time-series example¶

The same functionality can be used for a more complex posterior. The next example is a probabilistic numerical ODE solution of

\[\dot y(t) = y(t) * (1 - y(t)), \quad y(0) = 0.5\]

for \(t\in (0.0, 5.0)\). This logistic ODE has the analytic solution

\[y(t) = \frac{y_0 e^t}{1 + y_0 (e^t - 1)}.\]

Let us begin by setting up the problem.

[7]:

def f(t, x):
    return x * (1 - x)

y0 = np.array([0.5])

t0 = 0.0
tmax = 5.0

def sol(t):
    return (y0 * np.exp(t)) / (1 + y0 * (np.exp(t) - 1))

In ProbNum, ODEs are solved with probnum.diffeq. The outputs are usually TimeSeriesPosterior, i.e. functions that carry the states of the solution at a grid, and can also be called at any input. To evaluate the posterior, let us include the zeroth point of the domain (which is error-free by construction, and has covariance zero). If we did not do this, computation of calibration indices may be subject to numerical instability.

[8]:

pnsol = diffeq.probsolve_ivp(f, y0=y0, t0=t0, tmax=tmax)

approx_sol = pnsol.states[1:]
true_sol = sol(pnsol.locations[1:])[:, None]

On the given set of evaluations of the solution, we can compute any error measure. This time, we choose the mean absolute error (MAE) and its relative counterpart, the relative MAE.

[9]:

mae = multivariate.mae(approx_sol.mean, true_sol)
relative_mae = multivariate.relative_mae(approx_sol.mean, true_sol)
print(mae, relative_mae)

0.0006241640303712251 0.0006614154047129657

Since covariances of ODE solutions can be ill-conditioned, we make sure that all the covariance calculations use pseudo-inverses. This step is totally optional.

[10]:

from probnumeval import config

config.set_covariance_inversion_parameters(
    strategy="pinv", symmetrize=True, damping=0.0
)

We can also quantify the uncertainty calibration, for instance by calling the ANEES again. This time however, we opt for the non-credibility index and the inclination index. The difference to the ANEES is that NCI and II compare the calibration of the posterior to a “reference calibration”, and thus respond differently to optimism and pessimism. The difference between NCI and II is that NCI is an absolute number (the lower the better), whereas the II indicates over- and underconfidence. For both values, the closer to zero the better.

[11]:

nci = multivariate.non_credibility_index(approx_sol, true_sol)
ii = multivariate.inclination_index(approx_sol, true_sol)
print(nci, ii)

12.803778627298394 11.306190199210022

All of the above could have been obtained by applying the timeseries module to the (callable) ODE posteriors. While the multivariate module expects (lists of) random variables, the timeseries acts directly on the ODE solution.

[12]:

# Reshape the reference solution to match it to the ODE solution
def sol_reshaped(*args, **kwargs):
    return sol(*args, **kwargs)[:, None]


# Extract the mean function from the PN solution.
def pnsol_mean(*args, **kwargs):
    return pnsol(*args, **kwargs).mean


mae = timeseries.mae(pnsol_mean, sol_reshaped, locations=pnsol.locations[1:])
relative_mae = timeseries.relative_mae(
    pnsol_mean, sol_reshaped, locations=pnsol.locations[1:]
)

nci = timeseries.non_credibility_index(
    pnsol, sol_reshaped, locations=pnsol.locations[1:]
)
ii = timeseries.inclination_index(pnsol, sol_reshaped, locations=pnsol.locations[1:])
print(mae, relative_mae, nci, ii)

0.0006241640303712251 0.0006614154047129657 12.803778627298394 11.306190199210022

Summary

Probnum-Evaluation contains a wide range of measures for quantifying approximation error and uncertainty calibration. The measures for evaluating (finite-dimensional) random variables are in multivariate. The measures for evaluating processes directly are in timeseries.

[ ]: