Scheduler Configuration

The flow.cylc[scheduler] section configures certain aspects of scheduler behaviour at the workflow level.

Many of these configurations can also be defined at the site or user level in the global.cylc[scheduler] section where it applies to all workflows.

Event handlers

Note

Workflow event handlers are configured by:

Workflow event handlers allow configurable actions to be performed when workflow events occur.

Workflow Events

The list of events is:

startup

The scheduler started running the workflow.

shutdown

The workflow finished and the scheduler will shut down.

abort

The scheduler shut down early with error status, due to a fatal error condition or a configured timeout.

workflow timeout

The workflow run timed out.

stall

The workflow stalled.

stall timeout

The workflow timed out after stalling.

inactivity timeout

The workflow timed out with no activity.

You can tell the scheduler to abort (i.e., shut down immediately with error status) on certain workflow events, with the following settings:

  • abort on stall timeout

  • abort on inactivity timeout

  • abort on workflow timeout

Mail Events

Cylc can send emails for workflow events, these are configured by flow.cylc[scheduler][events]mail events.

For example with the following configuration, emails will be sent if a scheduler stalls or shuts down for an unexpected reason.

[scheduler]
    [[events]]
        mail events = stall, abort

Email addresses and servers are configured by global.cylc[scheduler][mail].

Workflow event emails can be customised using flow.cylc[scheduler][mail]footer, Workflow Event Template Variables can be used.

For example to integrate with the Cylc 7 web interface (Cylc Review) the mail footer could be configured with a URL:

[scheduler]
    [[events]]
        mail footer = http://cylc-review/taskjobs/%(owner)s/?suite=%(workflow)s

Custom Event Handlers

Cylc can also be configured to invoke scripts on workflow events.

Event handler scripts can be stored in the workflow bin directory, or anywhere in $PATH in the scheduler environment.

They should return quickly to avoid tying up the scheduler process pool - see External Command Execution.

Contextual information can be passed to the event handler via Workflow Event Template Variables.

For example the following configuration will write some information to a file when a workflow is started:

~/cylc-run/<workflow-id>/bin/my-handler
#!/bin/bash

echo "Workflow $1 is running on $2:$3" > info
flow.cylc or global.cylc
[scheduler]
    [[events]]
        startup handlers = my-handler %(workflow)s %(host) %(port)

Workflow Event Template Variables

The following variables are available to workflow event handlers.

They can be templated into event handlers with Python percent style string formatting e.g:

%(workflow)s is running on %(host)s

Note

Substitution patterns should not be quoted in the template strings. This is done automatically where required.

For an explanation of the substitution syntax, see String Formatting Operations in the Python documentation.

event

The type of workflow event that has occurred e.g. stall.

message

Additional information about the event.

workflow

The workflow ID

host

The host where the workflow is running.

port

The port where the workflow is running.

owner

The user account under which the workflow is running.

uuid

The unique identification string for this workflow run.

This string is preserved for the lifetime of the scheduler and is restored from the database on restart.

workflow_url

The URL defined in flow.cylc[meta]URL.

suite

The workflow ID

Deprecated since version 8.0.0: Use “workflow”.

suite_uuid

The unique identification string for this workflow run.

Deprecated since version 8.0.0: Use “uuid_str”.

suite_url

The URL defined in flow.cylc[meta]URL.

Deprecated since version 8.0.0: Use “workflow_url”.

External Command Execution

Job submission commands, event handlers, and job poll and kill commands, are executed by the scheduler in a subprocess pool. The pool is size can be configured with global.cylc[scheduler]process pool size.

Event handlers should be lightweight and quick-running because they tie up a process pool member until complete, and the workflow will appear to stall if the pool is saturated with long-running processes.

To protect the scheduler, processes are killed on a timeout (global.cylc[scheduler]process pool timeout). This will be logged by the scheduler. If a job submission gets killed, the associated task goes to the submit-failed state.

Submitting Workflows To a Pool Of Hosts

Configured by

global.cylc[scheduler][run hosts].

By default cylc play will run workflows on the machine where the command was invoked.

Cylc supports configuring a pool of hosts for workflows to run on, cylc play will automatically pick a host and submit the workflow to it.

Host Pool

Configured by

global.cylc[scheduler][run hosts]available.

The hosts must:

  1. Share a common $HOME directory and therefore a common file system (with each other and anywhere the cylc play command is run).

  2. Share a common Cylc global config (global.cylc).

  3. Be set up to allow passwordless SSH between them.

Example:

[scheduler]
    [[run hosts]]
        available = host_1, host_2, host_3

Load Balancing

Configured by

global.cylc[scheduler][run hosts]ranking.

Cylc can balance the load on the configured “run hosts” by ranking them in order of available resource or by excluding hosts which fail to meet certain criterion.

Example:

[scheduler]
    [[run hosts]]
        available = host_1, host_2, host_3
        ranking = """
            # filter out hosts with high server load
            getloadavg()[2] < 5

            # pick the host with the most available memory
            virtual_memory().available
        """

For more information see global.cylc[scheduler][run hosts]ranking.

Workflow Migration

Configured by

global.cylc[scheduler][run hosts]condemned.

Cylc has the ability to automatically stop workflows running on a particular host and optionally, restart them on a different host. This can be useful if a host needs to be taken off-line, e.g. for scheduled maintenance.

Example:

[scheduler]
    [[run hosts]]
        available = host_1, host_2, host_3
        # tell workflows on host_1 to move to another available host
        condemned = host_1

Note

This feature requires the [auto restart] plugin to be enabled, e.g. in the configured list of plugins.

For more information see: global.cylc[scheduler][run hosts]ranking.