Parameterized Tasks

Parameterized tasks (see parameterization) provide a way of implicitly looping over tasks without the need for Jinja2.

Cylc Parameters

Parameters are defined in their own section, e.g:

[task parameters]
    world = Mercury, Venus, Earth

They can then be referenced by writing the name of the parameter in angle brackets, e.g:

[scheduling]
    [[graph]]
        R1 = start => hello<world> => end
[runtime]
    [[hello<world>]]
        script = echo 'Hello World!'

When the flow.cylc file is read by Cylc, the parameters will be expanded. For example the code above is equivalent to:

[scheduling]
    [[graph]]
        R1 = """
            start => hello_Mercury => end
            start => hello_Venus => end
            start => hello_Earth => end
        """
[runtime]
    [[hello_Mercury]]
        script = echo 'Hello World!'
    [[hello_Venus]]
        script = echo 'Hello World!'
    [[hello_Earth]]
        script = echo 'Hello World!'

We can refer to a specific parameter by writing it after an = sign:

[runtime]
    [[hello<world=Earth>]]
        script = echo 'Greetings Earth!'

Environment Variables

The name of the parameter is provided to the job as an environment variable called CYLC_TASK_PARAM_<parameter> where <parameter> is the name of the parameter (in the present case world):

[runtime]
    [[hello<world>]]
        script = echo "Hello ${CYLC_TASK_PARAM_world}!"

Parameter Types

Parameters can be either strings or integers:

[task parameters]
    foo = 1..5
    bar = 1..5..2
    baz = pub, qux, bol

Hint

Remember that by default Cylc automatically inserts an underscore between the task and the parameter, e.g. the following lines are equivalent:

task<baz=pub>
task_pub

Hint

When using integer parameters, to prevent confusion, Cylc prefixes the parameter value with the parameter name. For example:

[scheduling]
    [[graph]]
        R1 = """
            # task<bar> would result in:
            task_bar1
            task_bar3
            task_bar5

            # task<baz> would result in:
            task_pub
            task_qux
            task_bol
        """

Using parameters the get_observations configuration could be written like so:

[scheduling]
   [[graph]]
       T00/PT3H = """
           get_observations<station> => consolidate_observations
       """
[runtime]
    [[get_observations<station>]]
        script = get-observations
        [[[environment]]]
            API_KEY = xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

    [[get_observations<station=aldergrove>]]
        [[[environment]]]
            SITE_ID = 3917
    [[get_observations<station=camborne>]]
        [[[environment]]]
            SITE_ID = 3808
    [[get_observations<station=heathrow>]]
        [[[environment]]]
            SITE_ID = 3772
    [[get_observations<station=shetland>]]
        [[[environment]]]
            SITE_ID = 3005

For more information see the Cylc User Guide.

Practical

This practical continues on from the Jinja2 practical.

  1. Use Parameterization To Consolidate The get_observations Tasks.

    Next we will parameterize the get_observations tasks.

    Add a parameter called station:

    +[task parameters]
    +    station = aldergrove, camborne, heathrow, shetland
    
     [scheduler]
         UTC mode = True
    

    Remove the four get_observations tasks and insert the following code in their place:

    [[get_observations<station>]]
        script = get-observations
        [[[environment]]]
            API_KEY = {{ API_KEY }}
    

    Using cylc config you should see that Cylc replaces the <station> with each of the stations in turn, creating a new task for each:

    cylc config . -i "[runtime]"
    

    The get_observations tasks are now missing the SITE_ID environment variable. Add a new section for each station with a SITE_ID:

    [[get_observations<station=heathrow>]]
        [[[environment]]]
            SITE_ID = 3772
    

    Hint

    The relevant IDs are:

    • Aldergrove - 3917

    • Camborne - 3808

    • Heathrow - 3772

    • Shetland - 3005

    Solution

    [[get_observations<station=aldergrove>]]
        [[[environment]]]
            SITE_ID = 3917
    [[get_observations<station=camborne>]]
        [[[environment]]]
            SITE_ID = 3808
    [[get_observations<station=heathrow>]]
        [[[environment]]]
            SITE_ID = 3772
    [[get_observations<station=shetland>]]
        [[[environment]]]
            SITE_ID = 3005
    

    Using cylc config you should now see four get_observations tasks, each with a script, an API_KEY and a SITE_ID:

    cylc config . -i "[runtime]"
    

    Finally we can use this parameterization to simplify the workflow’s graphing. Replace the get_observations lines in the graph with get_observations<station>:

     # Repeat every three hours starting at the initial cycle point.
     PT3H = """
    -    get_observations_aldergrove => consolidate_observations
    -    get_observations_camborne => consolidate_observations
    -    get_observations_heathrow => consolidate_observations
    -    get_observations_shetland => consolidate_observations
    +    get_observations<station> => consolidate_observations
     """
    

    Hint

    The cylc config command does not expand parameters or families in the graph so you must use cylc graph to inspect changes to the graphing.

  2. Use Parameterization To Consolidate The post_process Tasks.

    At the moment we only have one post_process task (post_process_exeter), but suppose we wanted to add a second task for Edinburgh.

    Create a new parameter called site and set it to contain exeter and edinburgh. Parameterize the post_process task using this parameter.

    Hint

    The first argument to the post-process task is the name of the site. We can use the CYLC_TASK_PARAM_site environment variable to avoid having to write out this section twice.

    Solution

    First we must create the site parameter:

     [scheduler]
         UTC mode = True
     [task parameters]
         station = aldergrove, camborne, heathrow, shetland
    +        site = exeter, edinburgh
    

    Next we parameterize the task in the graph:

    -get_rainfall => forecast => post_process_exeter
    +get_rainfall => forecast => post_process<site>
    

    And also the runtime:

    -[[post_process_exeter]]
    +[[post_process<site>]]
         # Generate a forecast for Exeter 60 minutes in the future.
    -    script = post-process exeter 60
    +    script = post-process $CYLC_TASK_PARAM_site 60