In the last section we wrote out the following code in the
[runtime] [[get_observations_heathrow]] script = get-observations [[[environment]]] SITE_ID = 3772 API_KEY = xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx [[get_observations_camborne]] script = get-observations [[[environment]]] SITE_ID = 3808 API_KEY = xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx [[get_observations_shetland]] script = get-observations [[[environment]]] SITE_ID = 3005 API_KEY = xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx [[get_observations_aldergrove]] script = get-observations [[[environment]]] SITE_ID = 3917 API_KEY = xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
In this code the
script item and the
API_KEY environment variable have
been repeated for each task. This is bad practice as it makes the
configuration lengthy and harder to maintain.
Likewise the graph relating to the
get_observations tasks is highly
[scheduling] [[graph]] T00/PT3H = """ get_observations_aldergrove => consolidate_observations get_observations_camborne => consolidate_observations get_observations_heathrow => consolidate_observations get_observations_shetland => consolidate_observations """
Cylc offers three ways of consolidating configurations to help improve the structure of a workflow and avoid duplication.
cylc config Command
cylc config command reads in either the
global.cylc file, or a specific workflow’s
file, and it prints the parsed configuration out to the terminal.
Throughout this section as we introduce methods for consolidating
flow.cylc file, the
cylc config command can be used to
“expand” the file back to its full form.
A primary use of
cylc config is inspecting the
[runtime] section of a workflow. However, the command does not
expand parameterizations and
families in the workflow graph. To see the
expanded graph use the
cylc graph command.
cylc config with the path of the workflow (or
. if you are
already in the source directory or the run directory).
cylc config <path>
To view the configuration of a particular section or setting refer to it by
name using the
-i option (see The flow.cylc File Format for details), e.g:
# Print the contents of the [scheduling] section. cylc config <path> -i '[scheduling]' # Print the contents of the get_observations_heathrow task. cylc config <path> -i '[runtime][get_observations_heathrow]' # Print the value of the script setting in the get_observations_heathrow task cylc config <path> -i '[runtime][get_observations_heathrow]script'
The Three Approaches
The next three sections cover the three consolidation approaches and how we could use them to simplify the workflow from the previous tutorial. Work through them in order!
Which Approach To Use
Each approach has its uses. Cylc permits mixing approaches, allowing us to use what works best for us. As a rule of thumb:
Families work best consolidating runtime configuration by collecting tasks into broad groups, e.g. groups of tasks which run on a particular machine or groups of tasks belonging to a particular system.
Jinja2 is good at configuring settings which apply to the entire workflow rather than just a single task, as we can define variables then use them throughout the workflow.
Parameterization works best for describing tasks which are very similar but which have subtly different configurations (e.g. different arguments or environment variables).