Scheduler Configuration

The flow.cylc[scheduler] section configures certain aspects of scheduler behaviour at the workflow level.

Many of these configurations can also be defined at the site or user level in the global.cylc[scheduler] section where it applies to all workflows.

Workflow Events

There are two types of event in Cylc:

  • workflow events e.g. startup and shutdown, which pertain to the scheduler

  • task events e.g. submitted and failed, which pertain to tasks.

This section covers workflow events, for task events see Task Event Handling.

Event Handlers

Workflow events have “handlers” (i.e. hooks) which allow configured commands to run when workflow events occur. These can be configured by:

Abort On Event

As well as event handlers, you can tell the scheduler to abort (i.e., shut down immediately with error status) on certain workflow events, using the abort on ... configurations.

Configuration

Some workflow events have related configurations e.g. for setting the timeout.

List of workflow events:

startup
Event Handler:

startup handlers

The scheduler was started or restarted.

E.G. using one of these commands cylc play, cylc vip or cylc vr.

shutdown
Event Handler:

shutdown handlers

The scheduler was shut down.

E.G. using the cylc stop command.

abort
Event Handler:

abort handlers

The scheduler shut down early with error status, due to a fatal error condition or a configured timeout.

workflow timeout
Configuration:

workflow timeout

Event Handler:

workflow timeout handlers

Abort On Event:

abort on workflow timeout

The workflow run timed out.

The timer starts counting down at scheduler startup. It resets on workflow restart.

Note, the abort event is not raised by “Abort On Event” handlers.

stall
Event Handler:

stall handlers

The workflow stalled (i.e. the scheduler cannot make any further progress due to runtime events).

E.G. a task failure is blocking the pathway through the graph.

stall timeout
Configuration:

stall timeout

Event Handler:

stall timeout handlers

Abort On Event:

abort on stall timeout

The workflow timed out after stalling.

inactivity timeout
Configuration:

inactivity timeout

Event Handler:

inactivity timeout handlers

Abort On Event:

abort on inactivity timeout

The workflow timed out with no activity (i.e. a period with no job submissions or task messages).

This can be useful for system administrators to help catch workflows which have become stalled on external conditions or system issues.

restart timeout
Configuration:

restart timeout

If a workflow that has run to completion is restarted, the scheduler will have nothing to do so will shut down. This timeout gives the user a grace period in which to trigger new tasks to continue the workflow run.

Mail Events

Cylc can send emails for workflow events, these are configured by flow.cylc[scheduler][events]mail events.

For example with the following configuration, emails will be sent if a scheduler stalls or shuts down for an unexpected reason.

[scheduler]
    [[events]]
        mail events = stall, abort

Email addresses and servers are configured by global.cylc[scheduler][mail].

Workflow event emails can be customised using flow.cylc[scheduler][mail]footer, Workflow Event Template Variables can be used.

For example to integrate with the Cylc 7 web interface (Cylc Review) the mail footer could be configured with a URL:

[scheduler]
    [[events]]
        mail footer = http://cylc-review/taskjobs/%(owner)s/?suite=%(workflow)s

Custom Event Handlers

Cylc can also be configured to invoke scripts on workflow events.

Event handler scripts can be stored in the workflow bin directory, or anywhere in $PATH in the scheduler environment.

They should return quickly to avoid tying up the scheduler process pool - see External Command Execution.

Contextual information can be passed to the event handler via Workflow Event Template Variables.

For example the following configuration will write some information to a file when a workflow is started:

~/cylc-run/<workflow-id>/bin/my-handler
#!/bin/bash

echo "Workflow $1 is running on $2:$3" > info
flow.cylc or global.cylc
[scheduler]
    [[events]]
        startup handlers = my-handler %(workflow)s %(host) %(port)

Note

If you wish to use custom Python Libraries in an event handler you need to add these to CYLC_PYTHONPATH rather than PYTHONPATH.

Workflow Event Template Variables

The following variables are available to workflow event handlers.

They can be templated into event handlers with Python percent style string formatting e.g:

%(workflow)s is running on %(host)s

Note

Substitution patterns should not be quoted in the template strings. This is done automatically where required.

For an explanation of the substitution syntax, see String Formatting Operations in the Python documentation.

event

The type of workflow event that has occurred e.g. stall.

message

Additional information about the event.

workflow

The workflow ID

host

The host where the workflow is running.

port

The port where the workflow is running.

owner

The user account under which the workflow is running.

uuid

The unique identification string for this workflow run.

This string is preserved for the lifetime of the scheduler and is restored from the database on restart.

workflow_url

The URL defined in flow.cylc[meta]URL.

suite

The workflow ID

Deprecated since version 8.0.0: Use “workflow”.

suite_uuid

The unique identification string for this workflow run.

Deprecated since version 8.0.0: Use “uuid”.

suite_url

The URL defined in flow.cylc[meta]URL.

Deprecated since version 8.0.0: Use “workflow_url”.

External Command Execution

Job submission commands, event handlers, and job poll and kill commands, are executed by the scheduler in a subprocess pool. The pool is size can be configured with global.cylc[scheduler]process pool size.

Event handlers should be lightweight and quick-running because they tie up a process pool member until complete, and the workflow will appear to stall if the pool is saturated with long-running processes.

To protect the scheduler, processes are killed on a timeout (global.cylc[scheduler]process pool timeout). This will be logged by the scheduler. If a job submission gets killed, the associated task goes to the submit-failed state.

Submitting Workflows To a Pool Of Hosts

Configured by:

global.cylc[scheduler][run hosts].

By default cylc play will run workflows on the machine where the command was invoked.

Cylc supports configuring a pool of hosts for workflows to run on, cylc play will automatically pick a host and submit the workflow to it.

Host Pool

Configured by:

global.cylc[scheduler][run hosts]available.

The hosts must:

  1. Share a common $HOME directory and therefore a common file system (with each other and anywhere the cylc play command is run).

  2. Share a common Cylc global config (global.cylc).

  3. Be set up to allow passwordless SSH between them.

Example:

[scheduler]
    [[run hosts]]
        available = host_1, host_2, host_3

Load Balancing

Configured by:

global.cylc[scheduler][run hosts]ranking.

Cylc can balance the load on the configured “run hosts” by ranking them in order of available resource or by excluding hosts which fail to meet certain criterion.

Example:

[scheduler]
    [[run hosts]]
        available = host_1, host_2, host_3
        ranking = """
            # filter out hosts with high server load
            getloadavg()[2] < 5

            # pick the host with the most available memory
            virtual_memory().available
        """

For more information see global.cylc[scheduler][run hosts]ranking.

Workflow Migration

Configured by:

global.cylc[scheduler][run hosts]condemned.

Cylc has the ability to automatically stop workflows running on a particular host and optionally, restart them on a different host. This can be useful if a host needs to be taken off-line, e.g. for scheduled maintenance.

Example:

[scheduler]
    [[run hosts]]
        available = host_1, host_2, host_3
        # tell workflows on host_1 to move to another available host
        condemned = host_1

Note

This feature requires the [auto restart] plugin to be enabled, e.g. in the configured list of plugins.

For more information see: global.cylc[scheduler][run hosts]ranking.

Platform Configuration

From the perspective of a running scheduler localhost is the scheduler host.

The localhost platform is configured by global.cylc[platforms][localhost].

It configures: