Generate a Two Compartment PopPK Data Set¶
The Fitting a Two Compartment PopPK Model section showed fitting a PK/PD model to a pre-existing data set. However in PoPy it is also possible to use a Gen Script, to generate a data set from a model file instead. i.e. The opposite of a Fit Script.
In this example we will demonstrate how to generate new data from a two compartment model with absorption and bolus dosing, see Fig. 42:-
The ability to generate synthetic data from a model is especially useful if you wish to demonstrate a model, but do not have access to a real data set. Real data is expensive to obtain and even if it exists may have issues regarding completeness, accuracy or confidentiality. The other disadvantage of real data is that we never know the true underlying model.
Note
See the First order absorption model with peripheral compartment obtained by the PoPy developers for this example, including input script and output data file.
Running the Gen Script¶
This generating example make use of this single file:-
c:\PoPy\examples\builtin_gen_example.pyml
Open a PoPy Command Prompt to setup the PoPy environment in this folder:-
c:\PoPy\examples\
With the PoPy environment enabled, open the Gen Script in an editor as follows:-
$ popy_edit builtin_gen_example.pyml
then execute the script using popy_run from the command line:-
$ popy_run builtin_gen_example.pyml
When the gen script has completed, you can view the output of the generating process using popy_view, by typing the following command:-
$ popy_view builtin_gen_example.pyml.html
Note the extra ‘.html’ extension in the above command. This command opens a local .html file in your web browser to summarise the result of the generating process.
You can compare your local html output with the pre-computed documentation output, see First order absorption model with peripheral compartment. You should expect some minor numerical differences when comparing results with the documentation.
Summary of Gen Results¶
The main inputs of the generating script are the fixed effects f[X]
variables as defined in the EFFECTS section of the Gen Script. In this case the f[X]
are all constant and summarised here:-
f[KA] = 0.2000
f[CL] = 2.0000
f[V1] = 50.0000
f[Q] = 1.0000
f[V2] = 80.0000
f[KA_isv,CL_isv,V1_isv,Q_isv,V2_isv] = [
[ 0.1000, 0.0100, 0.0100, 0.0100, 0.0100 ],
[ 0.0100, 0.0300, -0.0100, 0.0200, 0.0200 ],
[ 0.0100, -0.0100, 0.0900, 0.0100, 0.0100 ],
[ 0.0100, 0.0200, 0.0100, 0.0700, 0.0100 ],
[ 0.0100, 0.0200, 0.0100, 0.0100, 0.0500 ],
]
f[PNOISE] = 0.1500
If the f[X]
are random variables, which in PoPy are defined using a ~, then the Gen Script will sample each f[X]
variable once. Sampling the f[X]
however makes more sense if you are creating multiple synthetic data sets, see MGen Script.
Given the population f[X]
variables, the Gen Script then creates the requested number of individuals (in this case 50) and samples a set of time points (in this case 5) and dosing times (in this case a single bolus dose) for each individual. This step defines the number of rows in the synthetic data set.
The next stage is to sample any c[X]
variables specified for each individual. In this example the only c[X]
variables defined in the gen_script are the c[ID]
field and c[AMT]
value (which in this case is constant for all individuals). The c[TIME]
and c[TYPE]
fields are created by PoPy automatically. We now have most of a valid PoPy data set, but no observation values are defined yet.
To generate observations the r[X]
variables for each individual are sampled. This along with the dose times and observation time period is enough to simulate smooth PK/PD curves from the MODEL_PARAMS, DERIVATIVES and PREDICTIONS defined in the script.
You can visualise the model predictions outputs (p[X]
variables) by examining the plots for the first three individuals in the data set.
In Table 30 above, the dotted blue line represents the model predictions given the f[X]
parameters and sampled r[X]
values for each individual. No noise is added to this curve and it is plotted at regular unit time steps, therefore it is smooth.
The solid blue dots represent the observations with noise added at randomly sampled time points for each individual. The solid blue dots are the values that end up in the synthetic data file under the c[DV_CENTRAL]
field.
Note in this model a bolus dose is received by all individuals at time 2.0. After the dose, the concentration of the drug in the Central compartment increases as drug is absorbed from the Depot compartment. Then the drug concentration falls as the drug is metabolised. The decay curve is first order with an inflection point due to the Peripheral compartment.
The doses are the same for all individuals, but the smooth curves generated by the model vary due to each individual having a different r[X]
vector.
Syntax in the Gen Script¶
The EFFECTS section defines the population structure that the Gen Script will create as follows:-
EFFECTS:
POP: |
c[AMT] = 100.0
f[KA] = 0.2
f[CL] = 2.0
f[V1] = 50
f[Q] = 1.0
f[V2] = 80
f[KA_isv,CL_isv,V1_isv,Q_isv,V2_isv] = [
[0.1],
[0.01, 0.03],
[0.01, -0.01, 0.09],
[0.01, 0.02, 0.01, 0.07],
[0.01, 0.02, 0.01, 0.01, 0.05],
]
f[PNOISE] = 0.15
ID: |
c[ID] = sequential(50)
t[DOSE] = 2.0
t[OBS] ~ unif(1.0, 50.0; 5)
# t[OBS] = range(1.0, 50.0; 5)
r[KA, CL, V1, Q, V2] ~ mnorm([0,0,0,0,0], f[KA_isv,CL_isv,V1_isv,Q_isv,V2_isv])
This EFFECTS structure is similar to the Syntax of Fit Script with some additional lines to define new individuals, doses and observation times.
The number of individuals is defined by the following line:-
c[ID] = sequential(50)
This specifies a sequence where the first individual is ‘1’, the 2nd is ‘2’ etc. up to ‘50’.
This line specifies a single dose record for each individual at time 2.0:-
t[DOSE] = 2.0
This line request a sample of 5 time points uniformly distributed in the period [1.0, 50.0]:-
t[OBS] ~ unif(1.0, 50.0; 5)
The random effects are here sampled from a zero-mean, multi-variate normal distribution, as follows:-
r[KA, CL, V1, Q, V2] ~ mnorm([0,0,0,0,0], f[KA_isv,CL_isv,V1_isv,Q_isv,V2_isv])
Note the second parameter of mnorm, the square covariance matrix f[KA_isv, CL_isv, V1_isv, Q_isv, V2_isv]
is a population parameter shared by all individuals. Each individual has a unique r[KA, CL, V1, Q, V2]
vector, because the random effects are defined at the ID level. For more info on the syntax above see EFFECTS.
The MODEL_PARAMS and DERIVATIVES sections of this Gen Script are the same as the Syntax of Fit Script, so are not discussed here.
The PREDICTIONS section in the Gen Script defines how the dependent c[X]
variables are sampled given the p[X]
model predictions:-
PREDICTIONS: |
p[CEN] = s[CENTRAL]/m[V1]
var = m[ANOISE]**2 + m[PNOISE]**2 * p[CEN]**2
c[DV_CENTRAL] ~ norm(p[CEN], var)
PoPy samples c[DV_CENTRAL]
for each row of the data set, to create a synthetic noisy measurement at each time point for each individual.
Structure of output synthetic data file¶
The c[X]
variables are saved to disk. For an example data file see First order absorption model with peripheral compartment. The first few lines of the ‘synthetic_data.csv’ are shown in (Table 31) below:-
TIME | ID | TYPE | AMT | DV_CENTRAL | DV_CENTRAL_FLAG | orig_data_row |
---|---|---|---|---|---|---|
2 | 1 | dose | 100 | 0.00139195005429 | 0 | 0 |
10.0120217722 | 1 | obs | 100 | 0.767596122748 | 1 | 1 |
11.0234536491 | 1 | obs | 100 | 0.916342367898 | 1 | 2 |
16.5024021745 | 1 | obs | 100 | 1.10684285134 | 1 | 3 |
28.818526425 | 1 | obs | 100 | 0.49698816377 | 1 | 4 |
46.551188548 | 1 | obs | 100 | 0.337495686522 | 1 | 5 |
2 | 2 | dose | 100 | 0.000571825928645 | 0 | 0 |
30.181690446 | 2 | obs | 100 | 0.655361728134 | 1 | 1 |
33.0056777467 | 2 | obs | 100 | 0.549247391453 | 1 | 2 |
… | … | … | … | … | … | … |
This shows some of the typical properties of PoPy’s PoPy Data Format, where the main fields are:-
- TYPE - Specifies either a dose or an observation row.
- ID - The identifier for a given subject.
- TIME - The time stamp of the row event.
- AMT - The size of the dose at a given time.
- DV_CENTRAL - The synthetic observed values.
- DV_CENTRAL_FLAG - Indicates valid measurement rows, 1=valid 0=ignore.
- orig_data_row - The data row number within an individual subject.
Controlling Random Seed in PoPy scripts¶
Note that the .csv data file generated by Gen Script on your own machine, will likely contain different values due to the random sampling of random effect realizations and then random noise added to each observation.
If you wish to obtain new random results each time your re-run the Gen Script then change the ‘rand_seed’ option to ‘auto’ as follows:-
METHOD_OPTIONS: {rand_seed: auto}
However if you re-run the Gen Script with a fixed number, you should obtain exactly the same results on your machine as before, due to this setting:-
METHOD_OPTIONS: {rand_seed: 12345}
Using a fixed number for the ‘rand_seed’ makes any sampling process in PoPy replicable.