Scheduler Configuration
The flow.cylc[scheduler]
section configures certain aspects of
scheduler behaviour at the workflow level.
Many of these configurations can also be defined at the site or user level in
the global.cylc[scheduler]
section where it applies to all
workflows.
Workflow Events
There are two types of event in Cylc:
workflow events e.g.
startup
andshutdown
, which pertain to the schedulertask events e.g.
submitted
andfailed
, which pertain to tasks.
This section covers workflow events, for task events see Task Event Handling.
Event Handlers
Workflow events have “handlers” (i.e. hooks) which allow configured commands to run when workflow events occur. These can be configured by:
flow.cylc[scheduler][events]
(per workflow)global.cylc[scheduler][events]
(user/site defaults)
Abort On Event
As well as event handlers, you can tell the scheduler to abort (i.e., shut down
immediately with error status) on certain workflow events, using the
abort on ...
configurations.
Configuration
Some workflow events have related configurations e.g. for setting the timeout.
List of workflow events:
- startup
- Event Handler:
The scheduler was started or restarted.
E.G. using one of these commands
cylc play
,cylc vip
orcylc vr
.
- shutdown
- Event Handler:
The scheduler was shut down.
E.G. using the
cylc stop
command.
- abort
- Event Handler:
The scheduler shut down early with error status, due to a fatal error condition or a configured timeout.
- workflow timeout
- Configuration:
- Event Handler:
- Abort On Event:
The workflow run timed out.
The timer starts counting down at scheduler startup. It resets on workflow restart.
Note, the
abort
event is not raised by “Abort On Event” handlers.
- stall
- Event Handler:
The workflow stalled (i.e. the scheduler cannot make any further progress due to runtime events).
E.G. a task failure is blocking the pathway through the graph.
- stall timeout
- Configuration:
- Event Handler:
- Abort On Event:
The workflow timed out after stalling.
- inactivity timeout
- Configuration:
- Event Handler:
- Abort On Event:
The workflow timed out with no activity (i.e. a period with no job submissions or task messages).
This can be useful for system administrators to help catch workflows which have become stalled on external conditions or system issues.
- restart timeout
- Configuration:
If a workflow that has run to completion is restarted, the scheduler will have nothing to do so will shut down. This timeout gives the user a grace period in which to trigger new tasks to continue the workflow run.
Mail Events
Cylc can send emails for workflow events, these are configured by
flow.cylc[scheduler][events]mail events
.
For example with the following configuration, emails will be sent if a scheduler stalls or shuts down for an unexpected reason.
[scheduler]
[[events]]
mail events = stall, abort
Email addresses and servers are configured by
global.cylc[scheduler][mail]
.
Workflow event emails can be customised using
flow.cylc[scheduler][mail]footer
,
Workflow Event Template Variables can be used.
For example to integrate with the Cylc 7 web interface (Cylc Review) the mail footer could be configured with a URL:
[scheduler]
[[events]]
mail footer = http://cylc-review/taskjobs/%(owner)s/?suite=%(workflow)s
Custom Event Handlers
Cylc can also be configured to invoke scripts on workflow events.
Event handler scripts can be stored in the workflow bin
directory, or
anywhere in $PATH
in the scheduler environment.
They should return quickly to avoid tying up the scheduler process pool - see External Command Execution.
Contextual information can be passed to the event handler via Workflow Event Template Variables.
For example the following configuration will write some information to a file when a workflow is started:
#!/bin/bash
echo "Workflow $1 is running on $2:$3" > info
[scheduler]
[[events]]
startup handlers = my-handler %(workflow)s %(host) %(port)
Note
If you wish to use custom Python Libraries in an event handler you
need to add these to CYLC_PYTHONPATH
rather than PYTHONPATH
.
Workflow Event Template Variables
- event
The type of workflow event that has occurred e.g.
stall
.- message
Additional information about the event.
- workflow
The workflow ID
- host
The host where the workflow is running.
- port
The port where the workflow is running.
- owner
The user account under which the workflow is running.
- uuid
The unique identification string for this workflow run.
This string is preserved for the lifetime of the scheduler and is restored from the database on restart.
- workflow_url
The URL defined in
flow.cylc[meta]URL
.- suite
The workflow ID
Deprecated since version 8.0.0: Use “workflow”.
- suite_uuid
The unique identification string for this workflow run.
Deprecated since version 8.0.0: Use “uuid”.
- suite_url
The URL defined in
flow.cylc[meta]URL
.Deprecated since version 8.0.0: Use “workflow_url”.
The following variables are available to workflow event handlers.
They can be templated into event handlers with Python percent style string formatting e.g:
%(workflow)s is running on %(host)s
Note
Substitution patterns should not be quoted in the template strings. This is done automatically where required.
For an explanation of the substitution syntax, see String Formatting Operations in the Python documentation.
External Command Execution
Job submission commands, event handlers, and job poll and kill commands, are
executed by the scheduler in a subprocess pool. The pool is size can
be configured with global.cylc[scheduler]process pool size
.
Event handlers should be lightweight and quick-running because they tie up a process pool member until complete, and the workflow will appear to stall if the pool is saturated with long-running processes.
To protect the scheduler, processes are killed on a timeout
(global.cylc[scheduler]process pool timeout
). This will be
logged by the scheduler. If a job submission gets killed, the
associated task goes to the submit-failed
state.
Submitting Workflows To a Pool Of Hosts
- Configured by:
By default cylc play
will run workflows on the machine where the command
was invoked.
Cylc supports configuring a pool of hosts for workflows to run on,
cylc play
will automatically pick a host and submit the workflow to it.
Host Pool
- Configured by:
The hosts must:
Share a common
$HOME
directory and therefore a common file system (with each other and anywhere thecylc play
command is run).Share a common Cylc global config (
global.cylc
).Be set up to allow passwordless SSH between them.
Example:
[scheduler]
[[run hosts]]
available = host_1, host_2, host_3
Load Balancing
- Configured by:
Cylc can balance the load on the configured “run hosts” by ranking them in order of available resource or by excluding hosts which fail to meet certain criterion.
Example:
[scheduler]
[[run hosts]]
available = host_1, host_2, host_3
ranking = """
# filter out hosts with high server load
getloadavg()[2] < 5
# pick the host with the most available memory
virtual_memory().available
"""
For more information see global.cylc[scheduler][run hosts]ranking
.
Workflow Migration
- Configured by:
Cylc has the ability to automatically stop workflows running on a particular host and optionally, restart them on a different host. This can be useful if a host needs to be taken off-line, e.g. for scheduled maintenance.
Example:
[scheduler]
[[run hosts]]
available = host_1, host_2, host_3
# tell workflows on host_1 to move to another available host
condemned = host_1
Note
This feature requires the [auto restart]
plugin to be enabled, e.g. in the configured list of
plugins
.
For more information see: global.cylc[scheduler][run hosts]ranking
.
Platform Configuration
From the perspective of a running scheduler localhost
is the
scheduler host.
The localhost
platform is configured by
global.cylc[platforms][localhost]
.
It configures:
Jobs that run on the
localhost
platform, i.e. any jobs which have[runtime][<namespace>]platform=localhost
or which don’t have a platform configured.Connections to the scheduler hosts (e.g. the
ssh command
).