• Language: en

PoPy Data Format

The PoPy data file records observation and dosing regimens for each individual in a study.

The columns or fields in the data file are split into four main types in Table 5:-

Table 5 PoPy data fields
Field Comment
Required Fields TYPE/ID/TIME
Dosing Fields dosing regime data
Observation Fields observed measurements
Extra Fields extra co-variate information

The data file values for each field can be accessed using the c[X] notation in the PoPy script file.

Required Fields

A PoPy data set requires the following fields:-

  • TYPE - type of row
  • ID - identity
  • TIME - time field

Note the names ‘TYPE’, ‘ID’ and ‘TIME’ are the default names of these three required fields. You can use other field names if you choose to redefine them in the script file DATA_FIELDS section.


The ‘TYPE’ field specifies the event that is happening in each row of the data file. The different types of row are as follows:-

  • obs - Measurements that contribute to the log likelihood as defined in the PREDICTIONS section.
  • dose - Creates a dose according to the dosing functions in the DERIVATIVES section.
  • pred - Extra prediction data points. PoPy will output extra p[X] data at these time points, but they do not contribute to the likelihood.
  • reset - Set the s[X] compartment states back to the initial values (usually zero)
  • reset+dose - A ‘reset’ combined with a ‘dose’ event.

The row types above have direct equivalents in Nonmem in terms of the EVID integer values.

Typically a drug trial data set mainly consists mainly of ‘obs’ and ‘dose’ rows with a few ‘reset’ rows, per subject.


The ‘ID’ field value defines the individual for a given row. As PoPy is a PopPK/PD system. The ‘ID’ field is required because the data is split over multiple individuals to form a population.

Note that non-population analysis can be performed in PoPy by assigning all rows the same ‘ID’ value.


The ‘TIME’ field defines the time stamp for each row.

The time field is required to be monotonically increasing, unless a TYPE = ‘reset’ or ‘reset+dose’ row is reached. Note that when the ID identifier changes between rows, then an implicit ‘reset’ occurs.

For an example of a valid combination of TYPE/ID/TIME data see Table 6.

Table 6 PoPy time reset example
TYPE ID TIME comment
obs Bob 0.0 observation at time zero
dose Bob 4.0 dose for bob at time 4.0
obs Bob 4.0 observation for bob at time 4.0
obs Bob 8.0 later observation
obs Ruth 0.0 time goes back, ok cos new ID
dose Ruth 10.0 dose for Ruth at time 10.0
obs Ruth 20.0 later observation
reset Ruth 30.0 s[X] reset at time 30.0
obs Ruth 1.0 observation following reset

In Table 6 the time always increases or stays the same in consecutive rows, but time is allowed to go backwards after a new ID or a reset.

Dosing Fields

Dosing events are created in the data file using ‘dose’ values in the TYPE field.

There are two methods of associating data dose rows with the DERIVATIVES section in the PoPy script file, as follows:-

The first involves using just the ‘dose’ value, the second involves defining dose type names.

The amount of each dose is usually specified in an AMT field, see below.


Note in PoPy AMT is not a keyword. It is just the conventional name for the dose amount field used in this documentation. See AMT for the Nonmem keyword.

Single Dose Type

The simplest way to create doses at a set of fixed times is shown in Table 7.

Table 7 PoPy single dose type example
dose 1.0 100 dose of 100 at time 1.0
dose 2.0 200 dose of 200 at time 2.0
dose 3.0 100 dose of 100 at time 3.0

Note that this creates 3 doses at times [1.0, 2.0, 3.0]. The script file loading this data set should have a DERIVATIVES section something like:-

    d[DEPOT] = @bolus{amt: c[AMT]} - m[KE] * s[DEPOT]

Note that the @bolus dose has no name associated with it.

Multiple Dose Types

If you have multiple types of dose in your analysis, e.g. two different drugs being prescribed, then you need to give each dose type a name, as shown in Table 8.

Table 8 PoPy multi dose type example
dose:drug1 1.0 100 0 100 units of drug1
dose:drug2 2.0 0 200 200 units of drug2
dose:drug1 3.0 50 0 50 units of drug1

The data file above creates 2 doses of drug1 and 1 dose of drug2. The script file loading this data set should have a DERIVATIVES section something like:-

    dose[drug1] = @bolus{amt: c[AMT_DRUG1]}
    dose[drug2] = @bolus{amt: c[AMT_DRUG2]}
    d[DEPOT1] = dose[drug1] - m[KE1] * s[DEPOT1]
    d[DEPOT2] = dose[drug2] - m[KE2] * s[DEPOT2]

The important aspect here is that the @bolus doses are defined with names ‘drug1’ and ‘drug2’. These names also appear in the TYPE field in the data set as ‘dose:drug1’ and ‘dose:drug2’.

An alternative naming syntax is as follows:-

    d[DEPOT1] = @bolus{amt: c[AMT_DRUG1], name: 'drug1'} - m[KE1] * s[DEPOT1]
    d[DEPOT2] = @bolus{amt: c[AMT_DRUG2], name: 'drug2'} - m[KE2] * s[DEPOT2]

Note that when creating a PoPy data set, you only need to specify a name for each type of dose. You can leave the modelling decision of where each dose appears in the compartment model to a later time.

Observation Fields

Another important set of fields in the data file are the columns that define observed measurements. Observation rows are defined by setting TYPE = ‘obs’.

This section shows examples of the following:-

Note in each case the PREDICTIONS section of the PoPy script file is associated with observation fields in the data file in order to compute the likelihood correctly.

Single Observed Field

An example of a single observed field is shown in Table 9.

Table 9 PoPy single observed field example
obs 10.5
obs 15.5
obs 2.0

In this simple case the PREDICTIONS section may look something like:-

    p[DRUG_CONC] = s[CEN]/m[V]
    c[DRUG_CONC] ~ norm(p[DRUG_CONC], m[ANOISE_var])

Note that the c[DRUG_CONC] references the ‘DRUG_CONC’ field of the data set. Here the likelihood is computed by comparing the model prediction p[DRUG_CONC] and the data file observation c[DRUG_CONC] for all rows of the data set, where TYPE = ‘obs’.

Therefore all values of the data column ‘DRUG_CONC’ have to be valid observations. If you have missing values then you need to use the data structure in Observed Field with missing data.

Observed Field with missing data

An example of a single observed field, with some missing data is shown in Table 10.

Table 10 PoPy single observed field missing data example
obs 10.5 1 DRUG_CONC valid
obs 0.0 0 DRUG_CONC invalid
obs -5.0 0 DRUG_CONC invalid
obs 2.0 1 DRUG_CONC valid

In this case the PREDICTIONS section may still look something like:-

    p[DRUG_CONC] = s[CEN]/m[V]
    c[DRUG_CONC] ~ norm(p[DRUG_CONC], m[ANOISE_var])

However not all the TYPE = ‘obs’ rows contribute to the likelihood in this case. Only the rows that have TYPE = ‘obs’ and DRUG_CONC_FLAG = 1.

It is similar to having the following ‘if’ statement in your PREDICTIONS section:-

    p[DRUG_CONC] = s[CEN]/m[V]
    if c[DRUG_CONC_FLAG] > 0.5:
        c[DRUG_CONC] ~ norm(p[DRUG_CONC], m[ANOISE_var])

You can include the ‘if’ statement in your PREDICTIONS section if you like, but it is not required (or encouraged).

Note also that missing out the ‘DRUG_CONC_FLAG’ field from your data set, has a similar effect to creating a ‘DRUG_CONC_FLAG’ field and setting all the values to 1. i.e. Flags default to 1 in PoPy.

If you have multiple observation types in your data set then flag fields become more important, see the example data structure in Multiple Observed Fields.

Multiple Observed Fields

An example of multiple observed fields, is shown in Table 11.

Table 11 PoPy multiple observed fields
obs 10.5 1 0.2 1 Both drugs valid
obs 10.5 1 0.0 0 only drug1 valid
obs -4.1 0 0.0 0 both drugs invalid
obs -4.1 0 0.5 1 only drug2 valid

In this case the PREDICTIONS section may look something like:-

    p[DRUG1] = s[CEN1]/m[V1]
    c[DRUG1] ~ norm(p[DRUG1], m[ANOISE_var1])
    p[DRUG2] = s[CEN2]/m[V2]
    c[DRUG2] ~ norm(p[DRUG2], m[ANOISE_var2])

Here PoPy uses the ‘DRUG1_FLAG’ and ‘DRUG2_FLAG’ fields from the data set to only compute the likelihood from valid observations. You don’t have to use ‘if’ statements in the PREDICTIONS section to achieve this.

Extra Fields

The other columns of the PoPy data file are available to use in the following verbatim sections:-

For example see below for a simple example of covariate modelling using the MODEL_PARAMS:-

    m[X] = f[X] + f[X_Y_EFFECT]*c[Y]

Here the m[X] parameter is modelled as having a linear relationship with the c[Y] covariate from the data file.

It is also possible to use c[X] variables in the other sections. One usage case is when you already have PK parameters estimated (from a previous study) and wish to use these c[X] variables in the DERIVATIVES section, instead of estimating m[X] parameters for each individual.

Next Steps

You can use the information above to construct your own PoPy data sets from real data. If you have a previously constructed Nonmem data set then see Nonmem Data to PoPy Data File for guidance on how to convert such a data set to PoPy format.

See Generate data and Fit using Simple PopPK Model for an example of creating a synthetic PoPy data file from a single script. It is also possible to create multiple data sets, see Generate multiple data sets and Fit using Simple PopPK Model.

Back to Top