Consolidating Configuration

In the last section we wrote out the following code in the flow.cylc file:

[runtime]
    [[get_observations_heathrow]]
        script = get-observations
        [[[environment]]]
            SITE_ID = 3772
            API_KEY = xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
    [[get_observations_camborne]]
        script = get-observations
        [[[environment]]]
            SITE_ID = 3808
            API_KEY = xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
    [[get_observations_shetland]]
        script = get-observations
        [[[environment]]]
            SITE_ID = 3005
            API_KEY = xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
    [[get_observations_aldergrove]]
        script = get-observations
        [[[environment]]]
            SITE_ID = 3917
            API_KEY = xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

In this code the script item and the API_KEY environment variable have been repeated for each task. This is bad practice as it makes the configuration lengthy and harder to maintain.

Likewise the graph relating to the get_observations tasks is highly repetitive:

[scheduling]
    [[graph]]
        T00/PT3H = """
            get_observations_aldergrove => consolidate_observations
            get_observations_camborne => consolidate_observations
            get_observations_heathrow => consolidate_observations
            get_observations_shetland => consolidate_observations
        """

Cylc offers three ways of consolidating configurations to help improve the structure of a workflow and avoid duplication.

The cylc config Command

The cylc config command reads in either the global.cylc file, or a specific workflow’s flow.cylc file, and it prints the parsed configuration out to the terminal.

Throughout this section as we introduce methods for consolidating the flow.cylc file, the cylc config command can be used to “expand” the file back to its full form.

Note

A primary use of cylc config is inspecting the [runtime] section of a workflow. However, the command does not expand parameterizations and families in the workflow graph. To see the expanded graph use the cylc graph command.

Call cylc config with the path of the workflow (or . if you are already in the source directory or the run directory).

cylc config <path>

To view the configuration of a particular section or setting refer to it by name using the -i option (see The flow.cylc File Format for details), e.g:

# Print the contents of the [scheduling] section.
cylc config <path> -i '[scheduling]'
# Print the contents of the get_observations_heathrow task.
cylc config <path> -i '[runtime][get_observations_heathrow]'
# Print the value of the script setting in the get_observations_heathrow task
cylc config <path> -i '[runtime][get_observations_heathrow]script'

The Three Approaches

The next three sections cover the three consolidation approaches and how we could use them to simplify the workflow from the previous tutorial. Work through them in order!

Which Approach To Use

Each approach has its uses. Cylc permits mixing approaches, allowing us to use what works best for us. As a rule of thumb:

  • Families work best consolidating runtime configuration by collecting tasks into broad groups, e.g. groups of tasks which run on a particular machine or groups of tasks belonging to a particular system.

  • Jinja2 is good at configuring settings which apply to the entire workflow rather than just a single task, as we can define variables then use them throughout the workflow.

  • Parameterization works best for describing tasks which are very similar but which have subtly different configurations (e.g. different arguments or environment variables).