Consolidating Configuration
In the last section we wrote out the following code in the
flow.cylc
file:
[runtime]
[[get_observations_heathrow]]
script = get-observations
[[[environment]]]
SITE_ID = 3772
API_KEY = xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
[[get_observations_camborne]]
script = get-observations
[[[environment]]]
SITE_ID = 3808
API_KEY = xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
[[get_observations_shetland]]
script = get-observations
[[[environment]]]
SITE_ID = 3005
API_KEY = xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
[[get_observations_aldergrove]]
script = get-observations
[[[environment]]]
SITE_ID = 3917
API_KEY = xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
In this code the script
item and the API_KEY
environment variable have
been repeated for each task. This is bad practice as it makes the
configuration lengthy and harder to maintain.
Likewise the graph relating to the get_observations
tasks is highly
repetitive:
[scheduling]
[[graph]]
T00/PT3H = """
get_observations_aldergrove => consolidate_observations
get_observations_camborne => consolidate_observations
get_observations_heathrow => consolidate_observations
get_observations_shetland => consolidate_observations
"""
Cylc offers three ways of consolidating configurations to help improve the structure of a workflow and avoid duplication.
The cylc config
Command
The cylc config
command reads in either the
global.cylc
file, or a specific workflow’s flow.cylc
file, and it prints the parsed configuration out to the terminal.
Throughout this section as we introduce methods for consolidating
the flow.cylc
file, the cylc config
command can be used to
“expand” the file back to its full form.
Note
A primary use of cylc config
is inspecting the
[runtime]
section of a workflow. However, the command does not
expand parameterizations and
families in the workflow graph. To see the
expanded graph use the cylc graph
command.
Call cylc config
with the path of the workflow (or .
if you are
already in the source directory or the run directory).
cylc config <path>
To view the configuration of a particular section or setting refer to it by
name using the -i
option (see The flow.cylc File Format for details), e.g:
# Print the contents of the [scheduling] section.
cylc config <path> -i '[scheduling]'
# Print the contents of the get_observations_heathrow task.
cylc config <path> -i '[runtime][get_observations_heathrow]'
# Print the value of the script setting in the get_observations_heathrow task
cylc config <path> -i '[runtime][get_observations_heathrow]script'
The Three Approaches
The next three sections cover the three consolidation approaches and how we could use them to simplify the workflow from the previous tutorial. Work through them in order!
Which Approach To Use
Each approach has its uses. Cylc permits mixing approaches, allowing us to use what works best for us. As a rule of thumb:
Families work best consolidating runtime configuration by collecting tasks into broad groups, e.g. groups of tasks which run on a particular machine or groups of tasks belonging to a particular system.
Jinja2 is good at configuring settings which apply to the entire workflow rather than just a single task, as we can define variables then use them throughout the workflow.
Parameterization works best for describing tasks which are very similar but which have subtly different configurations (e.g. different arguments or environment variables).