Runtime Configuration
In the last section we associated tasks with scripts and ran a simple workflow. In this section we will look at how to configure these tasks.
Environment Variables
We can define environment variables in task [environment]
sections,
to be provided to task jobs at runtime.
[runtime]
[[countdown]]
script = seq $START_NUMBER
[[[environment]]]
START_NUMBER = 5
Each job is also provided with some standard environment variables e.g:
CYLC_WORKFLOW_RUN_DIR
The path to the run directory (e.g. ~/cylc-run/workflow).
CYLC_TASK_WORK_DIR
The path to the associated task’s work directory (e.g. run-directory/work/cycle/task).
CYLC_TASK_CYCLE_POINT
The cycle point for the associated task (e.g. 20171009T0950).
See also
There are many more environment variables - see Job Script Variables for more information.
Job Submission
By default Cylc runs jobs on the same machine as the scheduler. It can run them on other machines too if we set the platform like this:
[runtime]
[[hello_computehost]]
script = echo "Hello Compute Host"
platform = powerful_computer
By default Cylc also executes jobs as background processes. We often want to submit jobs to a job runner instead, particularly on shared compute resources. Cylc supports the following job runners:
at
loadleveler
lsf
pbs
sge
slurm
moab
Job runners typically require directives in some form, to specify job requirements such as memory use and number of CPUs to run on. For example:
[runtime]
[[big_task]]
script = big-executable
# Submit to the host "big-computer".
platform = slurm_platform
# job requires 500MB of RAM & 4 CPUs
[[[directives]]]
--mem = 500
--ntasks = 4
Time Limits
We can specify an execution time limit, as an ISO8601 duration, after which a job will be terminated. Cylc automatically translates this to the correct job runner directives.
[runtime]
[[some_task]]
script = some-executable
execution time limit = PT15M # 15 minutes.
Retries
Jobs can fail for several reasons:
Something went wrong with job submission, e.g:
A network problem;
The job host became unavailable or overloaded;
The job runner rejected your job directives.
Something went wrong with job execution, e.g:
A bug;
A system error;
The job hitting the
execution time limit
.
We can configure Cylc to automatically retry tasks that fail,
by setting submission retry delays
and/or execution retry delays
to a list of ISO8601 durations.
For example, setting execution retry delays = PT10M
will cause the job to retry every 10 minutes on execution failure.
Use a multiplier to limit the number of retries:
[runtime]
[[some-task]]
script = some-script
# On execution failure
# retry up to 3 times every 15 minutes.
execution retry delays = 3*PT15M
# On submission failure
# retry up to 2 times every 10 min,
# then every 30 mins thereafter.
submission retry delays = 2*PT10M, PT30M
Note
Tasks only enter the submit-failed
state if job submission fails with no
retries left. Otherwise they return to the waiting state, to wait on the
next try.
Tasks only enter the failed
state if job execution fails with no retries
left. Otherwise they return to the waiting state, to wait on the next try.
Start, Stop, Restart
We have seen how to start and stop Cylc workflows with cylc play
and
cylc stop
. By default cylc stop
causes the scheduler to wait
for running jobs to finish before it shuts down. There are several
other stop options, however. For example:
cylc stop --kill
Kill all running jobs before stopping. (Cylc can kill jobs on remote hosts, via the configured job runner).
cylc stop --now --now
stop right now, leaving any jobs running.
Once a workflow has stopped you can restart it with cylc play
.
The scheduler will pick up where it left off, and carry on as normal.
# Run the workflow "name".
cylc play <id>
# Stop the workflow "name", killing any running tasks.
cylc stop <id> --kill
# Restart the workflow "name", picking up where it left off.
cylc play <id>
Practical
In this practical we will add runtime configuration to the weather-forecasting workflow from the scheduling tutorial.
Create A New Workflow.
Create a new workflow by running the command:
cylc get-resources tutorial/runtime-tutorial cd ~/cylc-src/runtime-tutorial
You will now have a copy of the weather-forecasting workflow along with some executables and python modules.
Set The Initial And Final Cycle Points.
We want the workflow to run for 6 hours, starting at least 7 hours ago, on the hour.
We could work out the dates and times manually, or we could let Cylc do the maths for us.
Set the initial cycle point:
initial cycle point = previous(T-00) - PT7H
previous(T-00)
returns the current time ignoring minutes and seconds.e.g. if the current time is 12:34 this will return 12:00
-PT7H
subtracts 7 hours from this value.
Set the final cycle point:
final cycle point = +PT6H
This sets the final cycle point six hours after the initial cycle point.
Run
cylc validate
to check for any errors:cylc validate .
Add Runtime Configuration For The
get_observations
Tasks.In the
bin
directory is a script calledget-observations
. This script gets weather data from the MetOffice DataPoint service. It requires two environment variables:SITE_ID
:A four digit numerical code which is used to identify a weather station, e.g.
3772
is Heathrow Airport.API_KEY
:An authentication key required for access to the service.
Generate a Datapoint API key:
cylc get-resources api-key
Add the following lines to the bottom of the
flow.cylc
file replacingxxx...
with your API key:[runtime] [[get_observations_heathrow]] script = get-observations [[[environment]]] SITE_ID = 3772 API_KEY = xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
Add three more
get_observations
tasks for each of the remaining weather stations.You will need the codes for the other three weather stations, which are:
Camborne -
3808
Shetland -
3005
Aldergrove -
3917
Solution
[runtime] [[get_observations_heathrow]] script = get-observations [[[environment]]] SITE_ID = 3772 API_KEY = xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx [[get_observations_camborne]] script = get-observations [[[environment]]] SITE_ID = 3808 API_KEY = xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx [[get_observations_shetland]] script = get-observations [[[environment]]] SITE_ID = 3005 API_KEY = xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx [[get_observations_aldergrove]] script = get-observations [[[environment]]] SITE_ID = 3917 API_KEY = xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
Check the
flow.cylc
file is valid by running the command:cylc validate .
Test the
get_observations
tasks.Next we will test the
get_observations
tasks.Install the workflow:
cylc install
Open a user interface (The Cylc TUI or The Cylc GUI) to view your workflow. Run the workflow either from the UI or command line.
Hint
cylc tui runtime-tutorial # or cylc gui # If you haven't already got an instance running.
Run the workflow either by:
GUI: selecting the runtime-tutorial workflow in the sidebar and pressing the play button.
Tui: pressing enter on the workflow and selecting “play”.
Command line: running
cylc play runtime-tutorial
.
If all goes well the workflow will startup and the tasks will run and succeed. Note that the tasks which do not have a
[runtime]
section will still run though they will not do anything as they do not call any scripts.Once the workflow has reached the final cycle point and all tasks have succeeded the scheduler will shut down automatically.
The
get-observations
script produces a file calledwind.csv
which specifies the wind speed and direction. This file is written in the task’s work directory.Try and open one of the
wind.csv
files. Note that the path to the work directory is:work/<cycle-point>/<task-name>
You should find a file containing four numbers:
The longitude of the weather station;
The latitude of the weather station;
The wind direction (the direction the wind is blowing towards) in degrees;
The wind speed in miles per hour.
Hint
If you run
ls work
you should see a list of cycles. Pick one of them and open the file:work/<cycle-point>/get_observations_heathrow/wind.csv
Add runtime configuration for the other tasks.
The runtime configuration for the remaining tasks has been written out for you in the
runtime
file which you will find in the run directory. Copy the code in theruntime
file to the bottom of theflow.cylc
file.Now that all the tasks have runtime sections, remove
[scheduler]allow implicit tasks
:[scheduler] - allow implicit tasks = True # TODO: remove at end of exercise
Removing this ensures that any tasks in the graph without a corresponding runtime section will cause validation to fail (e.g. due to a typo).
Check the
flow.cylc
file is valid by running the command:cylc validate .
Now install a new run of the workflow:
cylc install
Run The Workflow.
Open a user interface (The Cylc TUI or The Cylc GUI) and run the workflow.
Tip
Make sure you play the latest run that was newly installed.
View The Forecast Summary.
The
post_process_exeter
task will produce a one-line summary of the weather in Exeter, as forecast two hours ahead of time. This summary can be found in thesummary.txt
file in the work directory.Try opening the summary file - it will be in the last cycle. The path to the work directory is:
work/<cycle-point>/<task-name>
Hint
cycle-point
- this will be the last cycle of the workflow, i.e. the final cycle point.task-name
- set this to “post_process_exeter”.
View The Rainfall Data.
The
forecast
task will produce a html page where the rainfall data is rendered on a map. This html file is calledjob-map.html
and is saved alongside the job log.Try opening this file in a web browser, e.g via:
firefox <filename> &
The path to the job log directory is:
log/job/<cycle-point>/<task-name>/<submission-number>
Hint
cycle-point
- this will be the last cycle of the workflow, i.e. the final cycle point.task-name
- set this to “forecast”.submission-number
- set this to “01”.