Fitting a Simple PopPK Model using PoPy¶
Here we are going to work with the simplest possible single compartment model and bolus dose, see Fig. 1:-
In this example, we will walk through fitting the one compartment model shown in Fig. 1 to a pre-existing data file using PoPy, explaining the commands, input files and output files at each step.
Note
See the Simple Fit Example obtained by the PoPy developers for this example, including input script and input data file.
Run the Fit Script¶
To fit a model in PoPy, you need a model file ending in .pyml and a data file in comma separated value format (.csv). See files in your PoPy ‘examples’ sub directory:-
c:\PoPy\examples\fit_example1.pyml
fit_example1_data.csv
Open a PoPy Command Prompt to setup the PoPy environment in this folder:-
c:\PoPy\examples\
With the PoPy environment enabled, open the script using:-
$ popy_edit fit_example1.pyml
then call popy_run on the Fit Script from the command line:-
$ popy_run fit_example1.pyml
While the script runs, you will see informative text regarding the progress of the fitting process.
You can observe the fitting process proceed through the text outputs in the command window. When completed, you can view the output using:-
$ popy_view fit_example1.pyml.html
Note the extra ‘.html’ extension in the above command. The command popy_view opens a local .html file in your web browser to summarise the result of the fitting.
You can compare your local html output with the pre-computed documentation output, see Simple Fit Example. You should expect some minor numerical differences when comparing results with the documentation. If you are concerned by any differences in results relative to the official PoPy documentation see Validate PoPy.
Summary of Fit Results¶
The results of running the fitting script are PoPy’s best estimate for the presumed unknown fixed effects variables:-
f[KE] = 0.0968
f[PNOISE] = 0.2135
f[KE_isv] = 0.0160
In PoPy fixed effects are denoted using the f[X]
notation, where ‘X’ is the name of the fixed effect.
The purpose of a Fit Script is to optimise the fixed effects and random effects by maximizing the likelihood of observing the input data given the model structure defined in ‘fit_example1.pyml’. The input data in this case, is the c[DV_CENTRAL]
column in ‘fit_example1_data.csv’, which contains 20 individuals each with 5 observations at random time points following a bolus dose event.
You can visually compare the PK curves using the fitted f[X]
outputs with the input data in Table 3.
In the graphs above the blue dots represent the observed data points. The solid blue line represents the model individual predictions based on the final f[X]
parameters and fitted r[X]
values for each individual. The dashed blue lines represent the model population predictions based on fitted f[X]
parameters and r[X]
values set to zero.
Note in this model a bolus dose is received by all individuals at time 1.0. Then the amount of dose follows a first order exponential decay curve as the drug is eliminated from the body over time.
The graphs illustrate how PoPy has optimized the f[X]
and r[X]
parameters to maximize the likelihood of the data under this model.
Syntax of Fit Script¶
This section explains the fitting script notation to represent the components of a mathematical model, such as fixed and random effects and the equation relating the parameters to the observed data. In this section, we will look more closely at how the model file works.
The data file included in this example is simulated from a first order PK model of the same form described in ‘fit_example1.pyml’. The population structure is defined in the EFFECTS section as follows:-
EFFECTS:
POP: |
f[KE] ~ unif(0.001, 100) 0.05
f[PNOISE] ~ unif(0.001, 100) 0.1
f[KE_isv] ~ unif(0.001, 100) 0.1
ID: |
r[KE] ~ norm(0, f[KE_isv])
There are three population fixed effects f[X]
parameters to be estimated and one r[X]
which can take a different value for each individual, sampled from the population distribution. There are 20 individuals in the data set, therefore this model is attempting to estimate 23 parameters in total (i.e 3 f[X] + 20 r[X]). The fixed effects are defined as follows:-
f[X] ~ unif(min_x, max_x) start_x
Here a uniform distribution is used to define a range of allowed values [min_x, max_x], as a kind of prior. Currently in PoPy, f[X]
are restricted to having a ~unif() distribution prior.
Note, it is quite common to require PK/PD model parameters be non-negative, in order to make physical sense. The ‘start_x’ value is the initial value for f[X]
used in the optimisation, which is usually an initial guess by the modeller. The r[X]
are here sampled from a zero-mean, univariate normal distribution with a variance f[KE_isv]
that is optimized for the population:-
r[KE] ~ norm(0, f[KE_isv])
Each individual has a unique set of r[X]
values, because the random effects are defined at the ID level. This has the effect of creating a single r[KE]
sample for each identity in the data file. For more info on the syntax above see EFFECTS.
The mapping from f[X]
and r[X]
to the m[X]
for each individual is defined in the MODEL_PARAMS section:-
MODEL_PARAMS: |
m[KE] = f[KE] * exp(r[KE])
m[PNOISE] = f[PNOISE]
m[ANOISE] = 0.001
This models the m[KE]
elimination rate for each individual as a log normally distribution with a median value of f[KE]
and variance parametrised by f[KE_isv]
. There is a shared proportional noise parameter f[PNOISE]
for all individuals. For more info on the syntax above see MODEL_PARAMS.
The DERIVATIVES section defines how the parameters and dosing history relate to the observed data. In this case, we have simple bolus dosing and first-order elimination:-
DERIVATIVES: |
d[CENTRAL] = @bolus{amt:c[AMT]} - m[KE]*s[CENTRAL]
The amount of the bolus dose is c[AMT]
, which is taken from the data file for each individual. In this example it is always 100 units and occurs at time point 1.0 for every individual. The m[KE]
elimination rate parameter is first order with respect to s[CENTRAL]
. Here s[CENTRAL]
is the amount in the single compartment. For more info on the syntax above see DERIVATIVES.
For each row of the data set, c[X]
values are compared with p[X]
variables predicted by the model, as defined below:-
PREDICTIONS: |
p[CEN] = s[CENTRAL]
var = (p[CEN]*m[PNOISE])**2 + m[ANOISE]**2
c[DV_CENTRAL] ~ norm(p[CEN], var)
This section shows that we are comparing model prediction p[CEN]
with c[DV_CENTRAL]
using a proportional noise model, where the standard deviation of the proportional noise is m[PNOISE]
. Here m[ANOISE]
is fixed to a small positive constant, in order to avoid zero variances when p[CEN]
is close to zero. For more detailed information on the syntax above see PREDICTIONS.
PoPy finds the best combination of the estimated parameters:-
f[KE]
- the median elimination rate - which roughly makes sure that the PK curves are of the correct shape to find the data.f[KE_isv]
- the magnitude of the variability inm[KE]
between individualsf[PNOISE]
- the proportional noise not explained by the model in thec[DV_CENTRAL]
data
The unexplained noise f[PNOISE]
and between subject variance f[KE_isv]
compete with each other to explain the data. For example, do measurements vary from the average model prediction due to measurements lacking precision (or some unknown mechanism) or because subjects just vary a lot in their physiology? This dual explanation for noisy data makes population mixed-effects models difficult to fit. However the population as a whole contains enough data to solve this problem using maximum likelihood [Sheiner1980].
In PoPy the likelihood is optimised iteratively, with the f[X]
and r[X]
being updated at each iteration. In this case, the likelihood (or objective function) progressed as shown in Table 4
Iteration | Time | OBJV |
0.00 | 4497.841579275584 | |
0.17 | 564.0664527231055 | |
0.85 | 564.0664527231055 | |
1.1 | 0.89 | 337.9370280058307 |
1.2 | 7.84 | 326.84044417435655 |
1.3 | 12.03 | 324.2624160410162 |
1.4 | 16.82 | 311.35327300915196 |
1.5 | 22.05 | 307.3397550215419 |
1.6 | 25.47 | 303.48857193780424 |
1.7 | 28.87 | 303.1302318883108 |
1.8 | 32.28 | 303.0605737503329 |
1.9 | 35.72 | 303.05543647770037 |
1.10 | 39.61 | 303.05479223958326 |
1.11 | 60.09 | 303.05478687244994 |
1.12 | 78.59 | 303.0547670793678 |
1.13 | 97.32 | 303.0547664682829 |
Note that the objective function is defined as -2 * the log likelihood (ignoring fixed proportionality constants). Therefore the lower the value of the objective function the better the estimated parameters fit the observed data. By default PoPy stops the fitting algorithm once the objective function has stopped decreasing.
Visual Predictive Check for Simple PopPK Model¶
Given the estimated parameter values, i.e. the optimised f[X]
variables, we can check whether the model and its estimate parameters are a good description of the observed data using a visual predictive check (VPC).
Running the MSim Script¶
When you run a PoPy Fit Script, it automatically generates several other scripts, including a ‘msim’ simulation script. For the simple model which we have already fitted, this script can be found in:-
fit_example1.pyml_output/
fit_example1_msim.pyml
To view or edit the MSim script, which runs simulation, navigate to:-
fit_example1.pyml_output/
Open a PoPy Command Prompt in the ‘msim’ sub folder then do:-
$ popy_edit fit_example1_msim.pyml
To view the MSim Script.
Then you can run the script with the following command:-
$ popy_run fit_example1_msim.pyml
Running the ‘fit_example1_msim.pyml’ script creates the following .svg file in the output directory:-
fit_example1.pyml_output/
fit_example1_msim.pyml_output/
fit_example1_vpc.pyml_output/
comb_quant_sim_vpc/
vpc.svg
This graphic should look something like Fig. 2:-
In the vpc graph the y axis is the amount in the single compartment and the x axis is the time since the last dose (TSLD). It’s common to use TSLD in a plot that combines all individuals, as different individuals may have been administered doses at different times in a real life analysis, so absolute times are not comparable.
The TSLD values are grouped into 5 bins along the x axis. Note you need a minimum number of data points in each bin and there are only 100 data points in this simple example, hence the small number of bins.
In Fig. 2, the pale blue dots represent the original data points. In each bin the 5%,50% and 95% quantiles are plotted for the original data set (see solid blue lines). Also in each bin, the same quantiles are computed for each of the 100 synthetic data samples. The 90% confidence interval for each of these quantiles, calculated across the 100 simulated data sets, is depicted by the shaded blue region.
The key result from the vpc graph is that the solid blue line (i.e. quantiles from the original data set) mostly lie within the shaded blue region (quantile ranges from the synthetic data sets). Since this is the case here, the model performs adequately on the VPC. Note that the solid blue line should be within the shaded region approx 90% of the time, because the synthetic quantile ranges are constructed as a 90% confidence interval. You can change the number of simulated data sets in the MSim Script. The quantiles of interest and the confidence intervals for those quantiles are specified in the Vpc Script.
The blue dots (original data) are mainly shown to give some visual corroboration of the quantiles (solid blue line). In this graph because there are only 5 time axis bins and therefore each time bin is quite wide, the data points on the left side of each time bin tend to have higher concentrations. This within-bin sample distortion is quite common. Only more bins, which in turn require more data, can address this issue.
Syntax of MSim Script¶
The MSim Script processes these three elements:-
- A data set
- The model - as automatically defined in the MSim Script file
- The estimated model parameters
For each individual in the original data set, new synthetic data sets are created by sampling new random effects r[X]
variables for each individual and new measurement noise for all data rows. i.e. The synthetic populations vary due to sampling the r[X]
for each individual here:-
EFFECTS:
ID: |
r[KE] ~ norm(0, f[KE_isv])
And adding measurement noise here:-
PREDICTIONS: |
p[CEN_sim] = s[CENTRAL]
var = (p[CEN_sim]*m[PNOISE])**2 + m[ANOISE]**2
c[DV_CENTRAL_sim] ~ norm(p[CEN_sim], var)
Note the simulated data c[DV_CENTRAL_sim] has a slightly different name from the original data set field c[DV_CENTRAL]
, in order to avoid name clashes when constructing graphs.
The ~ notation in the PREDICTIONS section of a PoPy script has two slightly different interpretations in fitting versus simulation scripts, in terms of how the operator compares the left hand side (lhs) and right hand side (rhs) of the expression:-
- In simulation scripts ~ means sample the lhs from the distribution on the rhs
- In fitting scripts ~ means evaluate the likelihood of the rhs given the lhs
In a MSim Script the former sampling definition is used. In a Fit Script the latter likelihood definition is employed.
This procedure creates a set of N new data sets, which can be compared with the original data set. In this case N=100 is defined in the OUTPUT_OPTIONS section:-
OUTPUT_OPTIONS:
n_pop_samples: 100
You can increase the number of samples, in order to estimate the percentiles, and their confidence intervals, more accurately. If the PK/PD model contains more parameters or the data file is more structured, you probably need 500-1000 population samples.
Conceptually, if the model is sensible and the fitted f[X]
parameters are well estimated then the artificial data sets generated by sampling the random variables should generate PK/PD curves that resemble the observed data PK/PD curves.
If your VPC curves do not look like the original data it may be possible to improve upon your model. The pattern of differences between your VPC predictions and the original data set, may give you some clues in how to improve your model.
Note that the VPC says nothing about your models ability to generalise, it only compares the model with the original data. For example, if you want to predict the response to much higher doses, than those present in the your original data set, the VPC provides no guarantee that predictions will be accurate.
Next Steps¶
You can see more complicated PoPy Example Models. Another Fit Script example walk through is Fitting a Two Compartment PopPK Model.
Alternatively read the PoPy Data Format description or see examples of using PoPy to generate synthetic data from a single script in Generate data and Fit using Simple PopPK Model.