Runtime Configuration

In the last section we associated tasks with scripts and ran a simple workflow. In this section we will look at how to configure these tasks.

Environment Variables

We can define environment variables in task [environment] sections, to be provided to task jobs at runtime.

[runtime]
    [[countdown]]
        script = seq $START_NUMBER
        [[[environment]]]
            START_NUMBER = 5

Each job is also provided with some standard environment variables e.g:

CYLC_WORKFLOW_RUN_DIR

The path to the run directory (e.g. ~/cylc-run/workflow).

CYLC_TASK_WORK_DIR

The path to the associated task’s work directory (e.g. run-directory/work/cycle/task).

CYLC_TASK_CYCLE_POINT

The cycle point for the associated task (e.g. 20171009T0950).

See also

There are many more environment variables - see Job Script Variables for more information.

Job Submission

By default Cylc runs jobs on the same machine as the scheduler. It can run them on other machines too if we set the platform like this:

[runtime]
    [[hello_computehost]]
        script = echo "Hello Compute Host"
        platform = powerful_computer

By default Cylc also executes jobs as background processes. We often want to submit jobs to a job runner instead, particularly on shared compute resources. Cylc supports the following job runners:

  • at

  • loadleveler

  • lsf

  • pbs

  • sge

  • slurm

  • moab

Job runners typically require directives in some form, to specify job requirements such as memory use and number of CPUs to run on. For example:

[runtime]
    [[big_task]]
        script = big-executable

        # Submit to the host "big-computer".
        platform = slurm_platform

        # job requires 500MB of RAM & 4 CPUs
        [[[directives]]]
            --mem = 500
            --ntasks = 4

Time Limits

We can specify an execution time limit, as an ISO8601 duration, after which a job will be terminated. Cylc automatically translates this to the correct job runner directives.

[runtime]
    [[some_task]]
        script = some-executable
        execution time limit = PT15M  # 15 minutes.

Retries

Jobs can fail for several reasons:

  • Something went wrong with job submission, e.g:

    • A network problem;

    • The job host became unavailable or overloaded;

    • The job runner rejected your job directives.

  • Something went wrong with job execution, e.g:

    • A bug;

    • A system error;

    • The job hitting the execution time limit.

We can configure Cylc to automatically retry tasks that fail, by setting submission retry delays and/or execution retry delays to a list of ISO8601 durations. For example, setting execution retry delays = PT10M will cause the job to retry every 10 minutes on execution failure.

Use a multiplier to limit the number of retries:

[runtime]
   [[some-task]]
      script = some-script

      # On execution failure
      #   retry up to 3 times every 15 minutes.
      execution retry delays = 3*PT15M
      # On submission failure
      #   retry up to 2 times every 10 min,
      #   then every 30 mins thereafter.
      submission retry delays = 2*PT10M, PT30M

Note

Tasks only enter the submit-failed state if job submission fails with no retries left. Otherwise they return to the waiting state, to wait on the next try.

Tasks only enter the failed state if job execution fails with no retries left. Otherwise they return to the waiting state, to wait on the next try.

Start, Stop, Restart

We have seen how to start and stop Cylc workflows with cylc play and cylc stop. By default cylc stop causes the scheduler to wait for running jobs to finish before it shuts down. There are several other stop options, however. For example:

cylc stop --kill

Kill all running jobs before stopping. (Cylc can kill jobs on remote hosts, via the configured job runner).

cylc stop --now --now

stop right now, leaving any jobs running.

Once a workflow has stopped you can restart it with cylc play. The scheduler will pick up where it left off, and carry on as normal.

# Run the workflow "name".
cylc play <id>
# Stop the workflow "name", killing any running tasks.
cylc stop <id> --kill
# Restart the workflow "name", picking up where it left off.
cylc play <id>

Practical

In this practical we will add runtime configuration to the weather-forecasting workflow from the scheduling tutorial.

  1. Create A New Workflow.

    Create a new workflow by running the command:

    cylc get-resources tutorial/runtime-tutorial
    cd ~/cylc-src/runtime-tutorial
    

    You will now have a copy of the weather-forecasting workflow along with some executables and python modules.

  2. Set The Initial And Final Cycle Points.

    We want the workflow to run for 6 hours, starting at least 7 hours ago, on the hour.

    We could work out the dates and times manually, or we could let Cylc do the maths for us.

    Set the initial cycle point:

    initial cycle point = previous(T-00) - PT7H
    
    • previous(T-00) returns the current time ignoring minutes and seconds.

      e.g. if the current time is 12:34 this will return 12:00

    • -PT7H subtracts 7 hours from this value.

    Set the final cycle point:

    final cycle point = +PT6H
    

    This sets the final cycle point six hours after the initial cycle point.

    Run cylc validate to check for any errors:

    cylc validate .
    
  3. Add Runtime Configuration For The get_observations Tasks.

    In the bin directory is a script called get-observations. This script gets weather data from the MetOffice DataPoint service. It requires two environment variables:

    SITE_ID:

    A four digit numerical code which is used to identify a weather station, e.g. 3772 is Heathrow Airport.

    API_KEY:

    An authentication key required for access to the service.

    Generate a Datapoint API key:

    cylc get-resources api-key
    

    Add the following lines to the bottom of the flow.cylc file replacing xxx... with your API key:

    [runtime]
        [[get_observations_heathrow]]
            script = get-observations
            [[[environment]]]
                SITE_ID = 3772
                API_KEY = xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
    

    Add three more get_observations tasks for each of the remaining weather stations.

    You will need the codes for the other three weather stations, which are:

    • Camborne - 3808

    • Shetland - 3005

    • Aldergrove - 3917

    Solution

    [runtime]
        [[get_observations_heathrow]]
            script = get-observations
            [[[environment]]]
                SITE_ID = 3772
                API_KEY = xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
        [[get_observations_camborne]]
            script = get-observations
            [[[environment]]]
                SITE_ID = 3808
                API_KEY = xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
        [[get_observations_shetland]]
            script = get-observations
            [[[environment]]]
                SITE_ID = 3005
                API_KEY = xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
        [[get_observations_aldergrove]]
            script = get-observations
            [[[environment]]]
                SITE_ID = 3917
                API_KEY = xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
    

    Check the flow.cylc file is valid by running the command:

    cylc validate .
    
  4. Test the get_observations tasks.

    Next we will test the get_observations tasks.

    Install the workflow:

    cylc install
    

    Open a user interface (The Cylc TUI or The Cylc GUI) to view your workflow. Run the workflow either from the UI or command line.

    Hint

    cylc tui runtime-tutorial
    # or
    cylc gui  # If you haven't already got an instance running.
    

    Run the workflow either by:

    • GUI: selecting the runtime-tutorial workflow in the sidebar and pressing the play button.

    • Tui: pressing enter on the workflow and selecting “play”.

    • Command line: running cylc play runtime-tutorial.

    If all goes well the workflow will startup and the tasks will run and succeed. Note that the tasks which do not have a [runtime] section will still run though they will not do anything as they do not call any scripts.

    Once the workflow has reached the final cycle point and all tasks have succeeded the scheduler will shut down automatically.

    The get-observations script produces a file called wind.csv which specifies the wind speed and direction. This file is written in the task’s work directory.

    Try and open one of the wind.csv files. Note that the path to the work directory is:

    work/<cycle-point>/<task-name>
    

    You should find a file containing four numbers:

    • The longitude of the weather station;

    • The latitude of the weather station;

    • The wind direction (the direction the wind is blowing towards) in degrees;

    • The wind speed in miles per hour.

    Hint

    If you run ls work you should see a list of cycles. Pick one of them and open the file:

    work/<cycle-point>/get_observations_heathrow/wind.csv
    
  5. Add runtime configuration for the other tasks.

    The runtime configuration for the remaining tasks has been written out for you in the runtime file which you will find in the run directory. Copy the code in the runtime file to the bottom of the flow.cylc file.

    Now that all the tasks have runtime sections, remove [scheduler]allow implicit tasks:

     [scheduler]
    -    allow implicit tasks = True  # TODO: remove at end of exercise
    

    Removing this ensures that any tasks in the graph without a corresponding runtime section will cause validation to fail (e.g. due to a typo).

    Check the flow.cylc file is valid by running the command:

    cylc validate .
    

    Now install a new run of the workflow:

    cylc install
    
  6. Run The Workflow.

    Open a user interface (The Cylc TUI or The Cylc GUI) and run the workflow.

    Tip

    Make sure you play the latest run that was newly installed.

  7. View The Forecast Summary.

    The post_process_exeter task will produce a one-line summary of the weather in Exeter, as forecast two hours ahead of time. This summary can be found in the summary.txt file in the work directory.

    Try opening the summary file - it will be in the last cycle. The path to the work directory is:

    work/<cycle-point>/<task-name>
    

    Hint

    • cycle-point - this will be the last cycle of the workflow, i.e. the final cycle point.

    • task-name - set this to “post_process_exeter”.

  8. View The Rainfall Data.

    The forecast task will produce a html page where the rainfall data is rendered on a map. This html file is called job-map.html and is saved alongside the job log.

    Try opening this file in a web browser, e.g via:

    firefox <filename> &
    

    The path to the job log directory is:

    log/job/<cycle-point>/<task-name>/<submission-number>
    

    Hint

    • cycle-point - this will be the last cycle of the workflow, i.e. the final cycle point.

    • task-name - set this to “forecast”.

    • submission-number - set this to “01”.