Task Configuration
Related Tutorial
The [runtime]
section of the flow.cylc
file
defines what job each task should run, and where and how to
submit each one to run.
It is an inheritance hierarchy that allows common settings to be factored out and defined once in task families (duplication of configuration is a maintenance risk in a complex workflow).
Task and Family Names
Task and family names must match in the graph and runtime sections of the workflow config file. They do not need to match the names of the external applications wrapped by the tasks.
- class cylc.flow.unicode_rules.TaskNameValidator
The rules for valid task and family names:
must start with: alphanumeric
can only contain: alphanumeric,
-
,+
,%
,@
cannot start with:
_cylc
cannot be:
root
Note
At runtime, tasks can access their own workflow task name as
$CYLC_TASK_NAME
in the job environment job environment if needed.
The following runtime configuration defines one family called FAM
and two
member tasks fm1
and fm2
that inherit settings from it. Members can
also override inherited settings and define their own private settings.
[runtime]
[[FAM]] # <-- a family
#... settings for all FAM members
[[fm1]] # <-- task
inherit = FAM
#... fm1-specific settings
[[fm2]] # <-- a task
inherit = FAM
#... fm2-specific settings
Note that families are not nested in terms of the file sub-heading structure. A runtime subsection defines a family if others inherit from it, otherwise it defines a task.
The Root Family
All tasks inherit implicitly from a family called root
that can provide
default settings for all tasks in the workflow (non-root families require an
explicit inherit
statement).
For example, if all tasks are to run on the same platform, that could
can be specified once for all tasks under root
:
[runtime]
[[root]]
# all tasks run on hpc1 (unless they override this setting)
platform = hpc1
Defining Multiple Tasks or Families at Once
Runtime sub-section headings can be a comma-separated list of task or family names, in which case the settings below it apply to each list member.
Here a group of three related tasks all run the same script on the same platform, but pass their own names to it on the command line:
[runtime]
[[ENSEMBLE]]
platform = hpc1
script = "run-model.sh $CYLC_TASK_NAME"
[[m1, m2, m3]]
inherit = ENSEMBLE
[[m1]]
#... m1-specific settings
Particular tasks (such as m1
above) can still be singled out to add
task-specific settings.
Note
Task parameters or template processing (see Jinja2 and EmPy) can be used to programmatically generate family members and associated dependencies.
Families of Families
Families can inherit from other families, to any depth.
[runtime]
[[HPC1]]
platform = hpc1
[[BIG-HPC1]]
inherit = HPC1
#... add in high memory batch system directives
[[model]] # a big task that runs on hpc1
inherit = BIG-HPC1
If the same item is defined (and redefined) at several levels in the family tree, the highest level (closest to the task) takes precedence.
Inheriting from Multiple Parents
Sometimes a multi-level single-parent tree is not sufficient to avoid all duplication of settings. Fortunately tasks can inherit from multiple parents at once [1]:
[runtime]
[[HPC1]]
platform = hpc1
[[BIG]] # high memory batch system directives
#...
[[model]] # a big task that runs on hpc1
inherit = BIG, HPC1
Tip
Use cylc config
to check exactly what settings a task or family ends up
with after inheritance processing:
$ cylc config --item "[runtime][model]environment" <workflow-id>
First-parent Family Hierarchy for Visualization
Tasks can be collapsed into first-parent families in the Cylc GUI, so first parents should reflect the logical purpose of a task where possible, rather than (say) shared technical settings:
[runtime]
[[HPC]]
# technical platform settings
[[MODEL]]
# atmospheric model tasks
[[atmos]]
inherit = MODEL, HPC # (not HPC, MODEL)
If this is not what you want, given that the primary purpose of the family
hierarchy is inheritance of runtime settings, a dummy first parent None
can
be used to disable the visualization usage without affecting inheritance:
[runtime]
[[BAR]]
#...
[[foo]]
# inherit from BAR but stay under root for visualization
inherit = None, BAR
Job Environment
Job scripts export various environment variables before
running script
blocks (see Job Submission and Management).
Scheduler-defined variables appear first to identify the workflow, the task,
and log directory locations. These are followed by user-defined variables from
[runtime][<namespace>][environment]
. Order of variable definition
is preserved so that new variable assignments can reference previous ones.
Note
Task environment variables are evaluated at runtime, by jobs, on the
job platform. So $HOME
in a task environment, for instance, evaluates at
runtime to the home directory on the job platform, not on the scheduler
platform.
In this example the task foo
ends up with SHAPE=circle
, COLOR=blue
,
and TEXTURE=rough
in its environment:
[runtime]
[[root]]
[[[environment]]]
COLOR = red
SHAPE = circle
[[foo]]
[[[environment]]]
COLOR = blue # root override
TEXTURE = rough # new variable
Job access to Cylc itself is configured first so that variable assignment expressions (as well as scripting) can use Cylc commands:
[runtime]
[[foo]]
[[[environment]]]
REFERENCE_TIME = $(cylc cyclepoint --offset-hours=6)
Overriding Inherited Environment Variables
Warning
If you override an inherited task environment variable the parent config item gets replaced before it is ever used to define the shell variable in the job script. Consequently the job cannot see the parent value as well as the task value:
[runtime]
[[FOO]]
[[[environment]]]
COLOR = red
[[bar]]
inherit = FOO
[[[environment]]]
tmp = $COLOR # !! ERROR: $COLOR is undefined here
COLOR = dark-$tmp # !! as this overrides COLOR in FOO.
The compressed variant of this, COLOR = dark-$COLOR
, is also an error for
the same reason. To achieve the desired result, use a different name for the
parent variable:
[runtime]
[[FOO]]
[[[environment]]]
FOO_COLOR = red
[[bar]]
inherit = FOO
[[[environment]]]
COLOR = dark-$FOO_COLOR # OK
Job Script Variables
These variables provided by the scheduler are available to job scripts:
CYLC_VERSION # Version of cylc installation used
CYLC_VERBOSE # Verbose mode, true or false
CYLC_DEBUG # Debug mode (even more verbose), true or false
CYLC_CYCLING_MODE # Cycling mode, e.g. gregorian
ISODATETIMECALENDAR # Calendar mode for the `isodatetime` command,
# defined with the value of CYLC_CYCLING_MODE
# when in any datetime cycling mode
CYLC_WORKFLOW_FINAL_CYCLE_POINT # Final cycle point
CYLC_WORKFLOW_INITIAL_CYCLE_POINT # Initial cycle point
CYLC_WORKFLOW_ID # Workflow ID
# e.g. "a/b/c/run1"
CYLC_WORKFLOW_NAME # Workflow ID with the run name removed
# (use CYLC_WORKFLOW_ID for most purposes)
# e.g. "a/b/c"
CYLC_WORKFLOW_NAME_BASE # The basename of the workflow name
# (use CYLC_WORKFLOW_ID for most purposes)
# e.g. "c"
CYLC_UTC # UTC mode, True or False
TZ # Set to "UTC" in UTC mode or not defined
CYLC_WORKFLOW_RUN_DIR # Location of the run directory in
# job host, e.g. ~/cylc-run/foo
CYLC_WORKFLOW_HOST # Host running the workflow process
CYLC_WORKFLOW_OWNER # User ID running the workflow process
CYLC_WORKFLOW_SHARE_DIR # Workflow (or task!) shared directory (see below)
CYLC_WORKFLOW_UUID # Workflow UUID string
CYLC_WORKFLOW_WORK_DIR # Workflow work directory (see below)
CYLC_TASK_JOB # Job identifier expressed as
# CYCLE-POINT/TASK-NAME/SUBMIT-NUMBER
# e.g. 20110511T1800Z/t1/01
CYLC_TASK_CYCLE_POINT # Cycle point, e.g. 20110511T1800Z
ISODATETIMEREF # Reference time for the `isodatetime` command,
# defined with the value of CYLC_TASK_CYCLE_POINT
# when in any datetime cycling mode
CYLC_TASK_NAME # Job's task name, e.g. t1
CYLC_TASK_ID # Task instance identifier CYCLE-POINT/TASK-NAME
# e.g. 20110511T1800Z/t1
CYLC_TASK_SUBMIT_NUMBER # Job's submit number, e.g. 1,
# increments with every submit
CYLC_TASK_TRY_NUMBER # Number of execution tries, e.g. 1
# increments with automatic execution retry delays.
CYLC_TASK_FLOW_NUMBERS # Flows this task belongs to, e.g. 1,2
CYLC_TASK_LOG_DIR # Location of the job log directory
# e.g. ~/cylc-run/foo/log/job/20110511T1800Z/t1/01/
CYLC_TASK_LOG_ROOT # The job script path
# e.g. ~/cylc-run/foo/log/job/20110511T1800Z/t1/01/job
CYLC_TASK_WORK_DIR # Location of task work directory (see below)
# e.g. ~/cylc-run/foo/work/20110511T1800Z/t1
CYLC_TASK_NAMESPACE_HIERARCHY # Linearised family namespace of the task,
# e.g. root postproc t1
CYLC_TASK_DEPENDENCIES # List of met dependencies that triggered the task
# e.g. 1/foo 1/bar
CYLC_TASK_COMMS_METHOD # Set to "ssh" if communication method is "ssh"
CYLC_TASK_SSH_LOGIN_SHELL # With "ssh" communication, if set to "True",
# use login shell on workflow host
Some global shell variables are also defined in the job script, but not exported to subshells:
CYLC_FAIL_SIGNALS # List of signals trapped by the error trap
CYLC_VACATION_SIGNALS # List of signals trapped by the vacation trap
CYLC_TASK_MESSAGE_STARTED_PID # PID of "cylc message" job started" command
CYLC_TASK_WORK_DIR_BASE # Alternate task work directory,
# relative to the workflow work directory
Task Work Directories
Job scripts are executed from within work directories created automatically under the workflow run directory. A task can
access its own work directory via $CYLC_TASK_WORK_DIR
(or simply $PWD
if it does not change to another location at runtime). By default the location
contains task name and cycle point, to provide a unique workspace for every
instance of every task.
The top level work directory location can be changed, e.g. to a large data
area, by global config settings under global.cylc[install][symlink dirs]
.
Remote Task Hosting
Job platforms are defined in global.cylc[platforms]
.
If a task declares a different platform to that where the scheduler is running,
Cylc uses non-interactive SSH to submit the job to the platform job
runner on one of the platform hosts. Workflow source files will be installed
on the platform, via the associated global.cylc[install targets]
, just
before the first job is submitted to run there.
[runtime]
[[foo]]
platform = orca
For this to work:
Non-interactive SSH is required from the scheduler host to the platform hosts
Cylc must be installed on the hosts of the destination platform
If polling task communication is used, there is no other requirement
If SSH task communication is configured, non-interactive SSH is required from the job platform to the scheduler platform
If TCP (default) task communication is configured, the task platform should have access to the Cylc ports on the scheduler host
Platforms, like other runtime settings, can be declared globally in the root family, or in other families, or for individual tasks.
Note
The platform known as localhost
is the platform where the scheduler
is running, in many cases a dedicated server and not your desktop.
Internal Platform and Host Selection
The [runtime][<namespace>]platform
item points to either a
platform
or a
platform group
.
Cylc platforms allow you to configure compute platforms you wish Cylc to run jobs on.
Platform groups allow you to group together platforms
any of which would be suitable for a given job.
Platform groups can improve robustness by allowing jobs to be submitted on
any platform in the group, as well as providing an interface for
basic load balancing
.
Platforms are selected from a platform group once, when a job is submitted.
Hosts within a platform are re-selected each time the scheduler needs to communicate with a job.
See also
Platform Configuration: For details of how Platforms and Platform Groups are set up and in-depth examples.
External Platform Selection Scripts
Deprecated since version 8.0.0: Cylc 8 can select hosts from a group of suitable hosts listed in the platform config, so in many cases this logic should no longer be necessary.
Instead of hardwiring platform names into the workflow configuration you can
give a command that prints a platform name, or an environment variable, as the
value of [runtime][<namespace>]platform
.
For example:
[runtime]
[[mytask]]
platform = $(script-which-returns-a-platform-name)
Job hosts are always selected dynamically, for the chosen platform or platform group.
Caution
If $(script-which-returns-a-platform-name)
returns a non-zero exit
code then the scheduler will assign the
submit-failed state to this job.
If you have submit retries set up for the job, the scheduler will retry
running your platform selection script in the same was is it would for
any other submission failure.
Remote Job Log Directories
Job stdout and stderr streams are written to log files under the workflow run directory (see Task stdout and stderr Logs). For remote tasks the same directory is used, on the job host.
Implicit Tasks
An implicit task is one that appears in the graph but is not defined under
flow.cylc[runtime]
.
Depending on the value of flow.cylc[scheduler]allow implicit tasks
,
Cylc can automatically create default task definitions for these, to submit
local dummy jobs that just return the standard job status messages.
Implicit tasks can be used to mock up functional workflows very quickly. A
default script
can be added to the root family, e.g. to slow job execution
down a little. Here is a complete workflow definition using implicit tasks:
[scheduler]
allow implicit tasks = True
[scheduling]
[[graph]]
R1 = "prep => run-a & run-b => done"
[runtime]
[[root]]
script = "sleep 10"
Warning
Implicit tasks are somewhat dangerous because they can easily be created by
mistake: misspelling a task’s name divorces it from its runtime
definition.
For this reason implicit tasks are not allowed by default, and if used they should be turned off once the real task definitions are complete.
You can get the convenience without the danger with a little more effort, by adding empty runtime placeholders instead of allowing implicit tasks:
[scheduling]
[[graph]]
R1 = "prep => run-a & run-b => done"
[runtime]
[[root]]
script = "sleep 10"
[[prep]]
[[run-a, run-b]]
[[done]]
Task Retry On Failure
Tasks can have a list of ISO8601 durations as retry
intervals. If the job fails the task will return to the waiting
state
with a clock-trigger configured with the next retry delay.
Note
Tasks only enter the submit-failed
state if job submission fails with no
retries left. Otherwise they return to the waiting state, to wait on the
next try.
Tasks only enter the failed
state if job execution fails with no retries
left. Otherwise they return to the waiting state, to wait on the next try.
In the following example, tasks bad
and flaky
each have 3 retries
configured, with a 10 second delay between. On the final try, bad
fails
again and goes to the failed
state, while flaky
succeeds and triggers
task whizz
downstream. The scheduler will then stall with
bad
retained as an incomplete task.
[scheduling]
[[graph]]
R1 = """
bad => cheese
flaky => whizz
"""
[runtime]
[[bad]]
# retry 3 times then fail
script = """
sleep 10
false
"""
execution retry delays = 3*PT10S
[[flaky]]
# retry 3 times then succeed
script = """
sleep 10
test $CYLC_TASK_TRY_NUMBER -gt 3
"""
execution retry delays = 3*PT10S
[[cheese, whizz]]
script = "sleep 10"
Event Handling
See also
Task events
[runtime][<namespace>][events]
Workflow events
[scheduler][events]
Custom event handler scripts can be called when certain workflow and task
events occur. Event handlers can be used to send a message, raise an alarm, or
whatever you like. They can even call cylc
commands to intervene in the
workflow.
Note
Task event handlers are called by the scheduler, not by the task jobs that generate the events - so they do not see the job environment.
Event handlers can be stored in the workflow bin
directory, or anywhere in
$PATH
in the scheduler environment.
They should return quickly to avoid tying up the scheduler process pool - see External Command Execution.
Event-Specific Handlers
Event-specific handlers are configured by <event> handlers
under [runtime][<namespace>][events]
, where <event>
can be:
Event |
Description |
---|---|
submitted |
job submitted |
submission retry |
job submission failed but will retry later |
submission failed |
job submission failed |
started |
job started running |
retry |
job failed but will retry later |
failed |
job failed |
succeeded |
job succeeded |
submission timeout |
job timed out in the |
execution timeout |
job timed out in the |
warning |
scheduler received a message of severity WARNING from job |
critical |
scheduler received a message of severity CRITICAL from job |
custom |
scheduler received a message of severity CUSTOM from job |
expired |
task expired and will not submit (too far behind) |
late |
task running later than expected |
Values should be a list of commands, command lines, or command line templates (see below) to call if the specified event is triggered.
General Event Handlers
Alternatively you can configure handler events
and handlers
under [runtime][<namespace>][events]
, where the former is a list
of events (as above), and the latter a list of commands, command lines, lines
or command line templates (see below) to call if any of the specified events
are triggered.
Task Event Template Variables
- event
Event name.
- workflow
Workflow ID.
- suite
Workflow ID.
Deprecated since version 8.0.0: Use “workflow”.
- uuid
The unique identification string for this workflow run.
This string is preserved for the lifetime of the scheduler and is restored from the database on restart.
- suite_uuid
The unique identification string for this workflow run.
Deprecated since version 8.0.0: Use ‘uuid_str’.
- point
The task’s cycle point.
- submit_num
The job’s submit number.
This starts at 1 and increments with each additional job submission.
- try_num
The job’s try number.
The number of execution attempts. It starts at 1 and increments with automatic
flow.cylc[runtime][<namespace>]execution retry delays
.- id
The task ID (i.e.
%(point)/%(name)
).- message
Events message, if any.
- job_runner_name
The job runner name.
- batch_sys_name
The job runner name.
Deprecated since version 8.0.0: Use “job_runner_name”.
- job_id
The job ID in the job runner.
I.E. The job submission ID. For background jobs this is the process ID.
- batch_sys_job_id
The job ID in the job runner.
Deprecated since version 8.0.0: Use “job_id”.
- submit_time
Date-time when the job was submitted, in ISO8601 format.
- start_time
Date-time when the job started, in ISO8601 format.
- finish_time
Date-time when the job finished, in ISO8601 format.
- platform_name
The name of the platform where the job is submitted.
- user@host
The name of the platform where the job is submitted.
Deprecated since version 8.0.0: Use “platform_name”.
Changed in version 8.0.0: This now provides the platform name rather than
user@host
.- name
The name of the task.
- task_url
The URL defined in the task’s metadata.
Deprecated since version 8.0.0: Use
URL
from<task metadata>
.- workflow_url
The URL defined in the workflow’s metadata.
Deprecated since version 8.0.0: Use
workflow_URL
fromworkflow_<workflow metadata>
.- <task metadata>
Any task metadata defined in
flow.cylc[runtime][<namespace>][meta]
can be used e.g:%(title)s
Task title
%(URL)s
Task URL
%(importance)s
Example custom task metadata
- workflow_<workflow metadata>
Any workflow metadata defined in
flow.cylc[meta]
can be used with theworkflow_
e.g. prefix:%(workflow_title)s
Workflow title
%(workflow_URL)s
Workflow URL.
%(workflow_rating)s
Example custom workflow metadata.
The following variables are available to task event handlers.
They can be templated into event handlers with Python percent style string formatting e.g:
%(workflow)s is running on %(host)s
The %(event)s
string, for instance, will be replaced by the actual
event name when the handler is invoked.
If no templates or arguments are specified the following default command line will be used:
<event-handler> %(event)s %(workflow)s %(id)s %(message)s
Note
Substitution patterns should not be quoted in the template strings. This is done automatically where required.
For an explanation of the substitution syntax, see String Formatting Operations in the Python documentation.
Examples
The following flow.cylc
snippets illustrate the two (general and
task-specific) ways to configure event handlers:
[runtime]
[[foo]]
script = test ${CYLC_TASK_TRY_NUMBER} -eq 2
execution retry delays = PT0S, PT30S
[[[events]]] # event-specific handlers:
retry handlers = notify-retry.py
failed handlers = notify-failed.py
[runtime]
[[foo]]
script = """
test ${CYLC_TASK_TRY_NUMBER} -eq 2
cylc message -- "${CYLC_WORKFLOW_ID}" "${CYLC_TASK_JOB}" 'oopsy daisy'
"""
execution retry delays = PT0S, PT30S
[[[events]]] # general handlers:
handlers = notify-events.py
# Note: task output name can be used as an event in this method
handler events = retry, failed, oops
[[[outputs]]]
oops = oopsy daisy
Built-in Email Event Handler
To send an email on task events, configure relevant tasks with a list of events to handle by email. Custom task output names can also be used as event names, in which case the event triggers when the output message is received.
E.g. to send an email on task failed, retry, and a custom message event:
[runtime]
[[foo]]
script = """
test ${CYLC_TASK_TRY_NUMBER} -eq 3
cylc message -- "${CYLC_WORKFLOW_ID}" "${CYLC_TASK_JOB}" 'oopsy daisy'
"""
execution retry delays = PT0S, PT30S
[[[events]]]
mail events = failed, retry, oops
[[[outputs]]]
oops = oopsy daisy
By default, event emails will be sent to the current user with:
to:
set as$USER
from:
set asnotifications@$(hostname)
SMTP server at
localhost:25
These can be configured using the settings:
[mail]to
(list of email addresses)
The scheduler batches events over a 5 minute interval, by default, to avoid
flooding your Inbox if many events occur in a short time. The batching interval
can be configured with [scheduler][mail]task event batch interval
.
Late Events
Warning
The scheduler can only check for lateness once a task has appeared in its active task window. In Cylc 8 this is usually when the task is actually ready to run, which severely limits the usefulness of late events as currently implemented.
If a real time (clock-triggered) workflow performs fairly consistently from one cycle to the next, you may want to be notified when certain tasks are running late with respect the time they normally trigger in each cycle.
Cylc can generate a late event if a task has not triggered by a given offset
from its cycle point in real time. For example, if a task forecast
normally
triggers at 30 minutes after cycle point, a late event could be configured like this:
[runtime]
[[forecast]]
script = run-model.sh
[[[events]]]
late offset = PT40M # allow a 10 minute delay
late handlers = my-handler %(message)s
Warning
Late offset intervals are not computed automatically so be careful to update them after any workflow change that affects triggering times.