• Language: en

PREPROCESS

An optional verbatim section that creates extra c[X] variables after loading in a input data file and can also remove some rows from the data. A kind of flexible filter implemented in Python.

The PREPROCESS is available in the following scripts:-

i.e. where there is a data file loaded by the script.

Example PREPROCESS section

PREPROCESS: |
    # exclude negative concentrations
    if c[CONC] < 0.0: return
    # create new OCCASION variable
    if c[DAY] <= 3:
        c[OCCASION] = 1
    elif 3 < c[DAY] <= 6:
        c[OCCASION] = 2
    elif 6 < c[DAY] <= 8:
        c[OCCASION] = 3
    else:
        c[OCCASION] = 4

The example above shows the two operations a PREPROCESS section can perform, namely:-

  • Exclude data rows
  • Create extra c[X] data columns

The line:-

if c[CONC] < 0.0: return

Removes all rows from the data set with CONC less than zero. The null return is a PoPy convention for ignoring a particular row.

The other rows create a new c[OCCASION] variable, as follows:-

if c[DAY] <= 3:
    c[OCCASION] = 1
elif 3 < c[DAY] <= 6:
    c[OCCASION] = 2
elif 6 < c[DAY] <= 8:
    c[OCCASION] = 3
else:
    c[OCCASION] = 4

The simple Python assignment to c[OCCASION] creates the ‘OCCASION’ field. The if/elif/else statements are standard Python syntax and partition the data rows into occasions according to the existing c[DAY] data field.

Note that the remaining sections of the script file, e.g. EFFECTS, DERIVATIVES etc are able to use the new c[OCCASION] variable as though it already existed in the data file.

The use of Python syntax here means the above can be expanded in arbitrary complex ways to add more c[X] variables or exclude other rows from the data set.

Note a common usage of the PREPROCESS section is to remove an individual from the analysis as follows:-

PREPROCESS: |
    # exclude an individual
    if c[ID] == '7': return

Or alternatively keep just one individual:-

PREPROCESS: |
    # exclude all individuals apart from 7
    if c[ID] != '7': return

Or potentially exclude multiple individuals:-

PREPROCESS: |
    # exclude multiple individuals
    if c[ID] in ['7','9','41']: return

Or retain only a few individuals:-

PREPROCESS: |
    # exlude all individuals apart from 1-3
    if c[ID] not in ['1','2','3']: return

Note here the ‘ID’ field is a Python string not a float or integer.

If you want to exclude individuals based on a numerical calculation, you can do this:-

PREPROCESS: |
    # exlude all individuals apart from 1-3
    if int(c[ID]) > 3: return

The code above assumes that all c[ID] values can be converted to an integer. This will not be the case if one of your individuals has the identifier ‘3A’ for example.

Rules for PREPROCESS section

Like all verbatim sections the PREPROCESS section of the config file accepts free form pseudo Python code, but there are some rules regarding which variables are allowed in a PREPROCESS section as follows:-

  • Only c[X] variables and local Python variables are allowed
  • c[X] on the right hand side and within if statements must be previously defined on the left hand side or in the data file
  • c[X] declared on the left hand side must not already exist in the data file
  • return must always be null

So you can not use m[X], f[X], r[X], d[X] etc variables in this section.

The PREPROCESS function is run once, shortly after loading in the data file, so it is efficient to create required c[X] variables in this section, as opposed to creating temporary variables in the MODEL_PARAMS or DERIVATIVES sections.

Like all verbatim sections it is possible to introduce syntax errors by writing malformed Python. This will hopefully be picked up when PoPy attempts to compile or run the PREPROCESS function as a temporary .py file.

Back to Top