Estimating Model Parameters¶
Note
See One Compartment Model with Absorption estimating V and CL for the Tut Script used here.
The previous chapters show how the observed concentrations are influenced by the parameters that define them such as the number of compartments and their function, how flows between compartments are specified, the shape of the dosing function, and the residual error model. We have explored these topics by taking models with given parameters, and using them to generate datasets of synthetic observations. This is often known as a forward problem.
We now turn to the more difficult inverse problem where we are given a dataset of measurements and a parameterized model, and want to estimate the most likely parameters for the given data. To solve this problem, we first need to make three design decisions.
First, we need a way to quantify the goodness-of-fit for a set of parameters, i.e. a single number that enables us to compare one set of parameters with another and determine which agrees better with the data. A popular choice for this is the likelihood that quantifies how likely the data are, given the model parameters.
Second, we must specify an algorithm for finding the “best” model parameters from the set of possible options. In practice, finding the best set of parameters is difficult or impossible, so we instead look for a locally optimal set that is better than all other nearby sets [DennisSchnabel1987] [NocedalWright2006].
Finally, we need a set of criteria that determine whether we have reached the best local solution, i.e. whether the algorithm has converged.
Likelihood and the Objective Function¶
Before we can find the “best” fit of a model to a set of observations, we first need a measure of what is “good”. A common measure of goodness-of-fit is a model’s likelihood [Millar2011]: how likely are the observations to arise if the given set of model parameters is correct? Although this is not a probability (the integral over model parameters does not equal one), for a given set of observations and a given model structure a higher likelihood indicates a better fit.
In practice, we compute the Maximum Likelihood by minimizing the negative of the likelihood (because most optimization algorithms are written to find the minimum of an error or cost function). Moreover, where the likelihood is a member of the exponential family of distributions it is mathematically convenient to minimize -2 log(likelihood) which is commonly referred to as the objective function, ObjV.
Using this measure of cost, we can take a “bird’s eye view” of the surface in two dimensions by plotting it as a contour map for pairs of parameter values. In One Compartment Model with Absorption, for example, we looked at a model with three parameters: KA, CL and V. Fixing KA at its optimal value, we can see how the cost surface varies with respect to CL and V (Fig. 30).
In this example, we see that the cost function has a single minimum in this region and is at its lowest close to the true values: CL = 3 and V = 20. The lowest cost parameters are close to, not exactly at, the true values because of noise added to the observations and a finite data set.
Minimization Algorithm¶
Once we have specified the cost function we want to minimize, we need an algorithm that will find the minimum. Because some models have large datasets and complex models with many parameters, it is not practical to find the minimum via an exhaustive search over every possible combination of parameters.
We therefore adopt an incremental approach that takes an initial guess at the parameter values, then looks for a different set of values nearby that has a lower cost. Repeating this process takes us on a path across the likelihood surface until there are no nearby solutions that are better than the current one. This, by definition, is the local minimum.
Depending on the shape of the cost surface and the particular algorithm used, finding the local minimum can take only a few steps or it can require many steps over a winding path. In the example given (Fig. 31), a local minimum was found by PoPy in a single step. PoPy’s search algorithm is very good at finding nearby minima with very few steps, however it is impossible to guarantee that this is the global minimum. This is true for all local search algorithms.
The current recommended search algorithm used by PoPy is called ND, see ND Fitting Method for more details.
Initial Values and Convergence¶
When using a local optimization algorithm, the current estimate heads “down the hill” to a nearby point where the cost is lower until there is nowhere lower to go. In some cases, there may be more than one point that is locally the lowest point. As a result, starting the algorithm from some points will lead to one local minimum whereas starting from other initial points will lead to other local minima.
The set of points that lead to a given local minimum are known as its basin of convergence. In simple models, this is rarely a major problem. However, in complex models or if the fitted parameters are infeasible it is worthwhile to test alternative starting values, especially if the initial guess at a starting estimate was a long way from the fitted value.