PoPy Data Format¶
The PoPy data file records observation and dosing regimens for each individual in a study.
The columns or fields in the data file are split into four main types in Table 5:-
Field | Comment |
---|---|
Required Fields | TYPE/ID/TIME |
Dosing Fields | dosing regime data |
Observation Fields | observed measurements |
Extra Fields | extra co-variate information |
The data file values for each field can be accessed using the c[X]
notation in the PoPy script file.
Required Fields¶
A PoPy data set requires the following fields:-
Note the names ‘TYPE’, ‘ID’ and ‘TIME’ are the default names of these three required fields. You can use other field names if you choose to redefine them in the script file DATA_FIELDS section.
TYPE¶
The ‘TYPE’ field specifies the event that is happening in each row of the data file. The different types of row are as follows:-
- obs - Measurements that contribute to the log likelihood as defined in the PREDICTIONS section.
- dose - Creates a dose according to the dosing functions in the DERIVATIVES section.
- pred - Extra prediction data points. PoPy will output extra
p[X]
data at these time points, but they do not contribute to the likelihood. - reset - Set the
s[X]
compartment states back to the initial values (usually zero) - reset+dose - A ‘reset’ combined with a ‘dose’ event.
Typically a drug trial data set mainly consists mainly of ‘obs’ and ‘dose’ rows with a few ‘reset’ rows, per subject.
ID¶
The ‘ID’ field value defines the individual for a given row. As PoPy is a PopPK/PD system. The ‘ID’ field is required because the data is split over multiple individuals to form a population.
Note that non-population analysis can be performed in PoPy by assigning all rows the same ‘ID’ value.
TIME¶
The ‘TIME’ field defines the time stamp for each row.
The time field is required to be monotonically increasing, unless a TYPE = ‘reset’ or ‘reset+dose’ row is reached. Note that when the ID identifier changes between rows, then an implicit ‘reset’ occurs.
For an example of a valid combination of TYPE/ID/TIME data see Table 6.
TYPE | ID | TIME | comment |
---|---|---|---|
obs | Bob | 0.0 | observation at time zero |
dose | Bob | 4.0 | dose for bob at time 4.0 |
obs | Bob | 4.0 | observation for bob at time 4.0 |
obs | Bob | 8.0 | later observation |
obs | Ruth | 0.0 | time goes back, ok cos new ID |
dose | Ruth | 10.0 | dose for Ruth at time 10.0 |
obs | Ruth | 20.0 | later observation |
reset | Ruth | 30.0 | s[X] reset at time 30.0 |
obs | Ruth | 1.0 | observation following reset |
In Table 6 the time always increases or stays the same in consecutive rows, but time is allowed to go backwards after a new ID or a reset.
Dosing Fields¶
Dosing events are created in the data file using ‘dose’ values in the TYPE field.
There are two methods of associating data dose rows with the DERIVATIVES section in the PoPy script file, as follows:-
The first involves using just the ‘dose’ value, the second involves defining dose type names.
The amount of each dose is usually specified in an AMT field.
Note in PoPy AMT is not a keyword. It is just the conventional name for the dose amount field used in this documentation.
Single Dose Type¶
The simplest way to create doses at a set of fixed times is shown in Table 7.
TYPE | TIME | AMT | comment |
---|---|---|---|
dose | 1.0 | 100 | dose of 100 at time 1.0 |
dose | 2.0 | 200 | dose of 200 at time 2.0 |
dose | 3.0 | 100 | dose of 100 at time 3.0 |
Note that this creates 3 doses at times [1.0, 2.0, 3.0]. The script file loading this data set should have a DERIVATIVES section something like:-
DERIVATIVES: |
d[DEPOT] = @bolus{amt: c[AMT]} - m[KE] * s[DEPOT]
Note that the @bolus dose has no name associated with it.
Multiple Dose Types¶
If you have multiple types of dose in your analysis, e.g. two different drugs being prescribed, then you need to give each dose type a name, as shown in Table 8.
TYPE | TIME | AMT_DRUG1 | AMT_DRUG2 | comment |
---|---|---|---|---|
dose:drug1 | 1.0 | 100 | 0 | 100 units of drug1 |
dose:drug2 | 2.0 | 0 | 200 | 200 units of drug2 |
dose:drug1 | 3.0 | 50 | 0 | 50 units of drug1 |
The data file above creates 2 doses of drug1 and 1 dose of drug2. The script file loading this data set should have a DERIVATIVES section something like:-
DERIVATIVES: |
dose[drug1] = @bolus{amt: c[AMT_DRUG1]}
dose[drug2] = @bolus{amt: c[AMT_DRUG2]}
d[DEPOT1] = dose[drug1] - m[KE1] * s[DEPOT1]
d[DEPOT2] = dose[drug2] - m[KE2] * s[DEPOT2]
The important aspect here is that the @bolus doses are defined with names ‘drug1’ and ‘drug2’. These names also appear in the TYPE field in the data set as ‘dose:drug1’ and ‘dose:drug2’.
An alternative naming syntax is as follows:-
DERIVATIVES: |
d[DEPOT1] = @bolus{amt: c[AMT_DRUG1], name: 'drug1'} - m[KE1] * s[DEPOT1]
d[DEPOT2] = @bolus{amt: c[AMT_DRUG2], name: 'drug2'} - m[KE2] * s[DEPOT2]
Note that when creating a PoPy data set, you only need to specify a name for each type of dose. You can leave the modelling decision of where each dose appears in the compartment model to a later time.
Observation Fields¶
Another important set of fields in the data file are the columns that define observed measurements. Observation rows are defined by setting TYPE = ‘obs’.
This section shows examples of the following:-
Note in each case the PREDICTIONS section of the PoPy script file is associated with observation fields in the data file in order to compute the likelihood correctly.
Single Observed Field¶
An example of a single observed field is shown in Table 9.
TYPE | DRUG_CONC |
---|---|
obs | 10.5 |
obs | 15.5 |
obs | 2.0 |
In this simple case the PREDICTIONS section may look something like:-
PREDICTIONS: |
p[DRUG_CONC] = s[CEN]/m[V]
c[DRUG_CONC] ~ norm(p[DRUG_CONC], m[ANOISE_var])
Note that the c[DRUG_CONC]
references the ‘DRUG_CONC’ field of the data set. Here the likelihood is computed by comparing the model prediction p[DRUG_CONC]
and the data file observation c[DRUG_CONC]
for all rows of the data set, where TYPE = ‘obs’.
Therefore all values of the data column ‘DRUG_CONC’ have to be valid observations. If you have missing values then you need to use the data structure in Observed Field with missing data.
Observed Field with missing data¶
An example of a single observed field, with some missing data is shown in Table 10.
TYPE | DRUG_CONC | DRUG_CONC_FLAG | comment |
---|---|---|---|
obs | 10.5 | 1 | DRUG_CONC valid |
obs | 0.0 | 0 | DRUG_CONC invalid |
obs | -5.0 | 0 | DRUG_CONC invalid |
obs | 2.0 | 1 | DRUG_CONC valid |
In this case the PREDICTIONS section may still look something like:-
PREDICTIONS: |
p[DRUG_CONC] = s[CEN]/m[V]
c[DRUG_CONC] ~ norm(p[DRUG_CONC], m[ANOISE_var])
However not all the TYPE = ‘obs’ rows contribute to the likelihood in this case. Only the rows that have TYPE = ‘obs’ and DRUG_CONC_FLAG = 1.
It is similar to having the following ‘if’ statement in your PREDICTIONS section:-
PREDICTIONS: |
p[DRUG_CONC] = s[CEN]/m[V]
if c[DRUG_CONC_FLAG] > 0.5:
c[DRUG_CONC] ~ norm(p[DRUG_CONC], m[ANOISE_var])
You can include the ‘if’ statement in your PREDICTIONS section if you like, but it is not required (or encouraged).
Note also that missing out the ‘DRUG_CONC_FLAG’ field from your data set, has a similar effect to creating a ‘DRUG_CONC_FLAG’ field and setting all the values to 1. i.e. Flags default to 1 in PoPy.
If you have multiple observation types in your data set then flag fields become more important, see the example data structure in Multiple Observed Fields.
Multiple Observed Fields¶
An example of multiple observed fields, is shown in Table 11.
TYPE | DRUG1 | DRUG1_FLAG | DRUG2 | DRUG2_FLAG | comment |
---|---|---|---|---|---|
obs | 10.5 | 1 | 0.2 | 1 | Both drugs valid |
obs | 10.5 | 1 | 0.0 | 0 | only drug1 valid |
obs | -4.1 | 0 | 0.0 | 0 | both drugs invalid |
obs | -4.1 | 0 | 0.5 | 1 | only drug2 valid |
In this case the PREDICTIONS section may look something like:-
PREDICTIONS: |
p[DRUG1] = s[CEN1]/m[V1]
c[DRUG1] ~ norm(p[DRUG1], m[ANOISE_var1])
p[DRUG2] = s[CEN2]/m[V2]
c[DRUG2] ~ norm(p[DRUG2], m[ANOISE_var2])
Here PoPy uses the ‘DRUG1_FLAG’ and ‘DRUG2_FLAG’ fields from the data set to only compute the likelihood from valid observations. You don’t have to use ‘if’ statements in the PREDICTIONS section to achieve this.
Extra Fields¶
The other columns of the PoPy data file are available to use in the following verbatim sections:-
For example see below for a simple example of covariate modelling using the MODEL_PARAMS:-
MODEL_PARAMS: |
m[X] = f[X] + f[X_Y_EFFECT]*c[Y]
Here the m[X]
parameter is modelled as having a linear relationship with the c[Y]
covariate from the data file.
It is also possible to use c[X]
variables in the other sections. One usage case is when you already have PK parameters estimated (from a previous study) and wish to use these c[X]
variables in the DERIVATIVES section, instead of estimating m[X]
parameters for each individual.
Next Steps¶
You can use the information above to construct your own PoPy data sets from real data. If you have a previously constructed Nonmem data set then see Nonmem Data to PoPy Data File for guidance on how to convert such a data set to PoPy format.
See Generate data and Fit using Simple PopPK Model for an example of creating a synthetic PoPy data file from a single script. It is also possible to create multiple data sets, see Generate multiple data sets and Fit using Simple PopPK Model.