Portable Workflows
A portable or interoperable workflow can run “out of the box” at different sites, or in different environments such as research and operations within a site. For convenience we just use the term site portability.
Lack of portability is a major barrier to collaborative development when sites need to run more or less the same workflow, because it is very difficult to translate changes manually between large, complicated workflows.
Most workflows are riddled with site-specific details such as local build configurations, file paths, host names, and batch scheduler directives, etc.; but it is possible to cleanly factor all this out to make a portable workflow. Significant variations in workflow structure can even be accommodated quite easily. If the site workflows are too different, however, you may decide that it is appropriate for each site to maintain separate workflows.
The recommended way to do this, which we expand on below, is:
Put all site-specific settings in include-files loaded at the end of a generic “core” workflow definition.
Use “optional” app config files for site-specific variations in the core workflow’s Rose apps.
(Make minimal use of inlined site switches too, if necessary).
When referencing files, reference them within the workflow structure and use an install task to link external files in.
The result should actually be tidier than the original in one respect: all the messy platform-specific resource directives etc., will be hidden away in the site include-files.
The Jinja2 SITE Variable
First a workflow Jinja2 variable called SITE
should be set to the site
name, either in rose-suite.conf
, or in the workflow definition itself
(perhaps automatically, by querying the local environment in some way).
#!Jinja2
{% set SITE = "niwa" %}
#...
This will be used to select site-specific configuration, as described below.
Site Include-Files
If a section heading in a flow.cylc
file is repeated the items
under it simply add to or override those defined under the same section earlier
in the file.
For example, this task definition:
[runtime]
[[foo]]
script = run-foo.sh
platform = niwa_hpc
can equally be written like this:
[runtime] # Part 1 (site-agnostic).
[[foo]]
script = run-foo.sh
[runtime] # Part 2 (site-specific).
[[foo]]
platform = niwa_hpc
Note
If Part 2 had also defined script
the new value would
override the original. It can sometimes be useful to set a widely used
default and override it in a few cases, but be aware that this can
make it more difficult to determine the origin of affected values.
In this way all site-specific [runtime]
settings, with their
respective sub-section headings, can be moved to the end of the file, and then
out into an include-file (file inclusion is essentially just literal inlining):
#...
{% set SITE = "niwa" %}
# Core site-agnostic settings:
#...
[runtime]
[[foo]]
script = run-foo.sh
#...
# Site-specific settings:
{% include 'site/' ~ SITE ~ '.cylc' %}
where the site include-file site/niwa.cylc
contains:
# site/niwa.cylc
[runtime]
[[foo]]
platform = niwa_hpc
Site-Specific Graphs
Repeated graph
strings under the same graph section headings are
always additive (graph strings are the only exception to the normal repeat item
override semantics). So, for instance, this graph:
[scheduling]
initial cycle point = 2025
[[graph]]
P1Y = "pre => model => post => niwa_archive"
can be written like this:
[scheduling]
initial cycle point = 2025
[[graph]]
P1Y = "pre => model => post"
P1Y = "post => niwa_archive"
and again, the site-specific part can be taken out to a site include-file:
#...
{% set SITE = "niwa" %}
# Core site-agnostic settings.
#...
[scheduling]
initial cycle point = 2025
[[graph]]
P1Y = "pre => model => post"
#...
# Site-specific settings:
{% include 'site/' ~ SITE ~ '.cylc' %}
where the site include-file site/niwa.cylc
contains:
# site/niwa.cylc
[scheduling]
[[graph]]
P1Y = "post => niwa_archive"
Note that the site-file graph needs to define the dependencies of the
site-specific tasks, and thus their points of connection to the core
workflow - which is why the core task post
appears in the graph here (if
post
had any site-specific runtime settings, to get it to run at
this site, they would also be in the site-file).
Inlined Site-Switching
It may be tempting to use inlined switch blocks throughout the workflow instead of site include-files, but this is not recommended - it is verbose and untidy (the greater the number of supported sites, the bigger the mess) and it exposes all site configuration to all users:
#...
[runtime]
[[model]]
script = run-model.sh
{# Site switch blocks not recommended:#}
{% if SITE == 'niwa' %}
platform = niwa_loadleveler_platform
[[[directives]]]
# NIWA Loadleveler directives...
{% elif SITE == 'metoffice' %}
platform = metoffice_pbs_platform
[[[directives]]]
# Met Office PBS directives...
{% elif SITE == ... %}
#...
{% else %}
{{raise('Unsupported site: ' ~ SITE)}}
{% endif %}
#...
Inlined switches can be used, however, to configure exceptional behaviour at one site without requiring the other sites to duplicate the default behaviour. But be wary of accumulating too many of these switches:
# (core flow.cylc file)
#...
{% if SITE == 'small' %}
{# We can't run 100 members... #}
{% set ENSEMBLE_SIZE = 25 %}
{% else %}
{# ...but everyone else can! #}
{% set ENSEMBLE_SIZE = 100 %}
{% endif %}
#...
Inlined switches can also be used to temporarily isolate a site-specific change to a hitherto non site-specific part of the workflow, thereby avoiding the need to update all site include-files before getting agreement from the workflow owner and collaborators.
Site-Specific Workflow Variables
It can sometimes be useful to set site-specific values of workflow variables that
aren’t exposed to users via rose-suite.conf
. For example, consider
a workflow that can run a special post-processing workflow of some kind at sites
where IDL is available. The IDL-dependence switch can be set per site like this:
#...
{% from SITE ~ '-vars.cylc' import HAVE_IDL, OTHER_VAR %}
R1 = """
pre => model => post
{% if HAVE_IDL %}
post => idl-1 => idl-2 => idl-3
{% endif %}
"""
where for SITE = niwa
the file niwa-vars.cylc
contains:
{# niwa-vars.cylc #}
{% set HAVE_IDL = True %}
{% set OTHER_VAR = "the quick brown fox" %}
Note we are assuming there are significantly fewer options (IDL or not, in this case) than sites, otherwise the IDL workflow should just go in the site include-files of the sites that need it.
Site-Specific Optional Workflow Configs
During development and testing of a portable workflow you can use an optional Rose
workflow config file to automatically set site-specific workflow inputs and thereby
avoid the need to make manual changes every time you check out and run a new
version. The site switch itself has to be set of course, but there may be other
settings too such as model parameters for a standard local test domain. Just
put these settings in opt/rose-suite-niwa.conf
(for site niwa
).
Site-Agnostic File Paths in App Configs
Where possible apps should be configured to reference files within the workflow structure itself rather than outside of it. This makes the apps themselves portable and it becomes the job of the install task to ensure all required source files are available within the workflow structure e.g. via symlink into the share directory. Additionally, by moving the responsibility of linking files into the workflow to an install task you gain the added benefit of knowing if a file is missing at the start of a workflow rather than part way into a run.
Site-Specific Optional App Configs
Typically a few but not all apps will need some site customization, e.g. for local archive configuration, local science options, or whatever. To avoid explicit site-customization of individual task-run command lines use Rose’s built-in optional app config capability:
[runtime]
[[root]]
script = rose task-run -v -O '({{SITE}})'
Normally a missing optional app config is considered to be an error, but the round parentheses here mean the named optional config is optional - i.e. use it if it exists, otherwise ignore.
With this setting in place we can simply add a opt/rose-app-niwa.conf
to
any app that needs customization at SITE = niwa
.
An Example
The following small workflow is not portable because all of its tasks are submitted to a NIWA HPC host; two task are entirely NIWA-specific in that they respectively install files from a local database and upload products to a local distribution system; and one task runs a somewhat NIWA-specific configuration of a model. The remaining tasks are site-agnostic apart from local job host and batch scheduler directives.
[scheduler]
UTC mode = True
[scheduling]
initial cycle point = 2017-01-01
[[graph]]
R1 = install_niwa => preproc
P1D = """
preproc & model[-P1D] => model => postproc => upload_niwa
postproc => idl-1 => idl-2 => idl-3
"""
[runtime]
[[root]]
script = rose task-run -v
[[HPC]] # NIWA job host and batch scheduler settings.
platform = niwa_loadleveler_platform
[[[directives]]]
account_no = NWP1623
class = General
job_type = serial # (most jobs in this workflow are serial)
[[install_niwa]] # NIWA-specific file installation task.
inherit = HPC
[[preproc]]
inherit = HPC
[[model]] # Run the model on a local test domain.
inherit = HPC
[[[directives]]] # Override the serial job_type setting.
job_type = parallel
[[[environment]]]
SPEED = fast
[[postproc]]
inherit = HPC
[[upload_niwa]] # NIWA-specific product upload.
inherit = HPC
To make this portable, refactor it into a core flow.cylc
file that
contains the clean site-independent workflow configuration and loads all
site-specific settings from an include-file at the end:
# flow.cylc: CORE SITE-INDEPENDENT CONFIGURATION.
{% set SITE = 'niwa' %}
{% from 'site/' ~ SITE ~ '-vars.cylc' import HAVE_IDL %}
[scheduler]
UTC mode = True
[scheduling]
initial cycle point = 2017-01-01
[[graph]]
P1D = """
preproc & model[-P1D] => model => postproc
{% if HAVE_IDL %}
postproc => idl-1 => idl-2 => idl-3
{% endif %}
"""
[runtime]
[[root]]
script = rose task-run -v -O '({{SITE}})'
[[preproc]]
inherit = HPC
[[preproc]]
inherit = HPC
[[model]]
inherit = HPC
[[[environment]]]
SPEED = fast
{% include 'site/' ~ SITE ~ '.cylc' %}
plus site files site/niwa-vars.cylc
:
# site/niwa-vars.cylc: NIWA SITE SETTINGS FOR THE EXAMPLE WORKFLOW.
{% set HAVE_IDL = True %}
and site/niwa.cylc
:
# site/niwa.cylc: NIWA SITE SETTINGS FOR THE EXAMPLE WORKFLOW.
[scheduling]
[[graph]]
R1 = install_niwa => preproc
P1D = postproc => upload_niwa
[runtime]
[[HPC]]
platform = niwa_loadleveler_platform
[[[directives]]]
account_no = NWP1623
class = General
job_type = serial # (most jobs in this workflow are serial)
[[install_niwa]] # NIWA-specific file installation.
[[model]]
[[[directives]]] # Override the serial job_type setting.
job_type = parallel
[[upload_niwa]] # NIWA-specific product upload.
and finally, an optional app config file for the local model domain:
app/model/rose-app.conf # Main app config.
app/model/opt/rose-app-niwa.conf # NIWA site settings.
Some points to note:
It is straightforward to extend support to a new site by copying an existing site file(s) and adapting it to the new job host and batch scheduler etc.
Batch system directives should be considered site-specific unless all supported sites have the same batch system and the same host architecture (including CPU clock speed and memory size etc.).
We’ve assumed that all tasks run on a single HPC host at both sites. If that’s not a valid assumption the
HPC
family inheritance relationships would have to become site-specific.Core task runtime configuration aren’t needed in site files at all if their job host and batch system settings can be defined in common families that are (
HPC
in this case).
Collaborative Development Model
Official releases of a portable workflow should be made from the workflow trunk.
Changes should be developed on feature branches so as not to affect other users of the workflow.
Site-specific changes shouldn’t touch the core flow.cylc
file,
just the relevant site include-file, and therefore should not need close
scrutiny from other sites.
Changes to the core flow.cylc
file should be agreed by all
stakeholders, and should be carefully checked for effects on site
include-files:
Changing the name of tasks or families in the core workflow may break sites that add configuration to the original runtime namespace.
Adding new tasks or families to the core workflow may require corresponding additions to the site files.
Deleting tasks or families from the core workflow may require corresponding parts of the site files to be removed. And also, check for site-specific triggering off of deleted tasks or families.
However, if the owner site has to get some changes into the trunk before all collaborating sites have time to test them, version control will of course protect those lagging behind from any immediate ill effects.
When a new feature is complete and tested at the developer’s site, the workflow owner should check out the branch, review and test it, and if necessary request that other sites do the same and report back. The owner can then merge the new feature to the trunk once satisfied.
All planning and discussion associated with the change should be documented on MOSRS Trac tickets associated with the workflow.
Research-To-Operations Transition
Under this collaborative development model it is possible to use the same workflow in research and operations, largely eliminating the difficult translation between the two environments. Where appropriate, this can save a lot of work.
Operations-specific parts of the workflow should be factored out (as for site portability) into include-files that are only loaded in the operational environment. Improvements and upgrades can be developed on feature branches in the research environment. Operations staff can check out completed feature branches for testing in the operational environment before merging to trunk or referring back to research if problems are found. After sufficient testing the new workflow version can be deployed into operations.
Note
This obviously glosses over the myriad complexities of the technical and scientific testing and validation of workflow upgrades; it merely describes what is possible from a workflow design and collaborative development perspective.