.. _SuiteDefinition: Suite Configuration =================== Cylc suites are defined in structured, validated, *suite.rc* files that concisely specify the properties of, and the relationships between, the various tasks managed by the suite. This section of the User Guide deals with the format and content of the suite.rc file, including task definition. Task implementation - what's required of the real commands, scripts, or programs that do the processing that the tasks represent - is covered in :ref:`TaskImplementation`; and task job submission - how tasks are submitted to run - is in :ref:`TaskJobSubmission`. .. _SuiteDefinitionDirectories: Suite Configuration Directories ------------------------------- A cylc *suite configuration directory* contains: - **A suite.rc file**: this is the suite configuration. - And any include-files used in it (see below; may be kept in sub-directories). - **A** ``bin/`` **sub-directory** (optional) - For scripts and executables that implement, or are used by, suite tasks. - Automatically added to ``$PATH`` in task execution environments. - Alternatively, tasks can call external commands, scripts, or programs; or they can be scripted entirely within the suite.rc file. - **A** ``lib/python/`` **sub-directory** (optional) - For custom job submission modules (see :ref:`CustomJobSubmissionMethods`) and local Python modules imported by custom Jinja2 filters, tests and globals (see :ref:`CustomJinja2Filters`). - **Any other sub-directories and files** - documentation, control files, etc. (optional) - Holding everything in one place makes proper suite revision control possible. - Portable access to files here, for running tasks, is provided through ``$CYLC_SUITE_DEF_PATH`` (see :ref:`TaskExecutionEnvironment`). - Ignored by cylc, but the entire suite configuration directory tree is copied when you copy a suite using cylc commands. A typical example: .. code-block:: bash /path/to/my/suite # suite configuration directory suite.rc # THE SUITE CONFIGURATION FILE bin/ # scripts and executables used by tasks foo.sh bar.sh ... # (OPTIONAL) any other suite-related files, for example: inc/ # suite.rc include-files nwp-tasks.rc globals.rc ... doc/ # documentation control/ # control files ancil/ # ancillary files ... .. _SuiteRCFile: Suite.rc File Overview ---------------------- Suite.rc files are an extended-INI format with section nesting. Embedded template processor expressions may also be used in the file, to programatically generate the final suite configuration seen by cylc. Currently the `Jinja2 `_ and `EmPy `_ template processors are supported; see :ref:`Jinja` and :ref:`EmPylabel` for examples. In the future cylc may provide a plug-in interface to allow use of other template engines too. .. _Syntax: Syntax ^^^^^^ The following defines legal suite.rc syntax: - **Items** are of the form ``item = value``. - **[Section]** headings are enclosed in square brackets. - **Sub-section [[nesting]]** is defined by repeated square brackets. - Sections are **closed** by the next section heading. - **Comments** (line and trailing) follow a hash character: ``#`` - **List values** are comma-separated. - **Single-line string values** can be single-, double-, or un-quoted. - **Multi-line string values** are triple-quoted (using single or double quote characters). - **Boolean values** are capitalized: True, False. - **Leading and trailing whitespace** is ignored. - **Indentation** is optional but should be used for clarity. - **Continuation lines** follow a trailing backslash: ``\`` - **Duplicate sections** add their items to those previously defined under the same section. - **Duplicate items** override, *except for dependency ``graph`` strings, which are additive*. - **Include-files** ``%include inc/foo.rc`` can be used as a verbatim inlining mechanism. Suites that embed templating code (see :ref:`Jinja` and :ref:`EmPylabel`) must process to raw suite.rc syntax. Include-Files ^^^^^^^^^^^^^ Cylc has native support for suite.rc include-files, which may help to organize large suites. Inclusion boundaries are completely arbitrary - you can think of include-files as chunks of the suite.rc file simply cut-and-pasted into another file. Include-files may be included multiple times in the same file, and even nested. Include-file paths can be specified portably relative to the suite configuration directory, e.g.: .. code-block:: cylc # include the file $CYLC_SUITE_DEF_PATH/inc/foo.rc: %include inc/foo.rc Editing Temporarily Inlined Suites """""""""""""""""""""""""""""""""" Cylc's native file inclusion mechanism supports optional inlined editing: .. code-block:: bash $ cylc edit --inline SUITE The suite will be split back into its constituent include-files when you exit the edit session. While editing, the inlined file becomes the official suite configuration so that changes take effect whenever you save the file. See ``cylc prep edit --help`` for more information. Include-Files via Jinja2 """""""""""""""""""""""" Jinja2 (:ref:`Jinja`) also has template inclusion functionality. .. _SyntaxHighlighting: Syntax Highlighting For Suite Configuration ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Cylc comes with syntax files for a number of text editors: .. code-block:: bash /etc/syntax/cylc.vim # vim /etc/syntax/cylc-mode.el # emacs /etc/syntax/cylc.lang # gedit (and other gtksourceview programs) /etc/syntax/cylc.xml # kate Refer to comments at the top of each file to see how to use them. Gross File Structure ^^^^^^^^^^^^^^^^^^^^ Cylc suite.rc files consist of a suite title and description followed by configuration items grouped under several top level section headings: - **[cylc]** - *non task-specific suite configuration* - **[scheduling]** - *determines when tasks are ready to run* - tasks with special behaviour, e.g. clock-trigger tasks - the dependency graph, which defines the relationships between tasks - **[runtime]** - *determines how, where, and what to execute when tasks are ready* - script, environment, job submission, remote hosting, etc. - suite-wide defaults in the *root* namespace - a nested family hierarchy with common properties inherited by related tasks - **[visualization]** - suite graph styling .. _Validation: Validation ^^^^^^^^^^ Cylc suite.rc files are automatically validated against a specification that defines all legal entries, values, options, and defaults. This detects formatting errors, typographic errors, illegal items and illegal values prior to run time. Some values are complex strings that require further parsing by cylc to determine their correctness (this is also done during validation). All legal entries are documented in (:ref:`SuiteRCReference`). The validator reports the line numbers of detected errors. Here's an example showing a section heading with a missing right bracket: .. code-block:: bash $ cylc validate my.suite [[special tasks] 'Section bracket mismatch, line 19' If the suite.rc file uses include-files ``cylc view`` will show an inlined copy of the suite with correct line numbers (you can also edit suites in a temporarily inlined state with ``cylc edit --inline``). Validation does not check the validity of chosen batch systems. .. todo:: This is to allow users to extend cylc with their own job submission methods, which are by definition unknown to the suite.rc spec. .. _ConfiguringScheduling: Scheduling - Dependency Graphs ------------------------------ The ``[scheduling]`` section of a suite.rc file defines the relationships between tasks in a suite - the information that allows cylc to determine when tasks are ready to run. The most important component of this is the suite dependency graph. Cylc graph notation makes clear textual graph representations that are very concise because sections of the graph that repeat at different hours of the day, say, only have to be defined once. Here's an example with dependencies that vary depending on the particular cycle point: .. code-block:: cylc [scheduling] initial cycle point = 20200401 final cycle point = 20200405 [[dependencies]] [[[T00,T06,T12,T18]]] # validity (hours) graph = """ A => B & C # B and C trigger off A A[-PT6H] => A # Model A restart trigger """ [[[T06,T18]]] # hours graph = "C => X" :numref:`fig-dep-eg-1` shows the complete suite.rc listing alongside the suite graph. This is a complete, valid, runnable suite (it will use default task runtime properties such as ``script``). .. Need to use a 'container' directive to get centered image with left-aligned caption (as required for code block text). .. container:: twocol .. container:: image .. _fig-dep-eg-1: .. figure:: graphics/png/orig/dep-eg-1.png :align: center .. container:: caption Example Suite .. code-block:: cylc [meta] title = "Dependency Example 1" [cylc] UTC mode = True [scheduling] initial cycle point = 20200401 final cycle point = 20200405 [[dependencies]] [[[T00,T06,T12,T18]]] # validity (hours) graph = """ A => B & C # B and C trigger off A A[-PT6H] => A # Model A restart trigger """ [[[T06,T18]]] # hours graph = "C => X" [visualization] initial cycle point = 20200401 final cycle point = 20200401T06 [[node attributes]] X = "color=red" Graph String Syntax ^^^^^^^^^^^^^^^^^^^ Multiline graph strings may contain: - **blank lines** - **arbitrary white space** - **internal comments**: following the ``#`` character - **conditional task trigger expressions** - see below. Interpreting Graph Strings ^^^^^^^^^^^^^^^^^^^^^^^^^^ Suite dependency graphs can be broken down into pairs in which the left side (which may be a single task or family, or several that are conditionally related) defines a trigger for the task or family on the right. For instance the "word graph" *C triggers off B which triggers off A* can be deconstructed into pairs *C triggers off B* and *B triggers off A*. In this section we use only the default trigger type, which is to trigger off the upstream task succeeding; see :ref:`TriggerTypes` for other available triggers. In the case of cycling tasks, the triggers defined by a graph string are valid for cycle points matching the list of hours specified for the graph section. For example this graph: .. code-block:: cylc [scheduling] [[dependencies]] [[[T00,T12]]] graph = "A => B" implies that B triggers off A for cycle points in which the hour matches ``00`` or ``12``. To define inter-cycle dependencies, attach an offset indicator to the left side of a pair: .. code-block:: cylc [scheduling] [[dependencies]] [[[T00,T12]]] graph = "A[-PT12H] => B" This means B[time] triggers off A[time-PT12H] (12 hours before) for cycle points with hours matching ``00`` or ``12``. ``time`` is implicit because this keeps graphs clean and concise, given that the majority of tasks will typically depend only on others with the same cycle point. Cycle point offsets can only appear on the left of a pair, because a pairs define triggers for the right task at cycle point ``time``. However, ``A => B[-PT6H]``, which is illegal, can be reformulated as a *future trigger* ``A[+PT6H] => B`` (see :ref:`InterCyclePointTriggers`). It is also possible to combine multiple offsets within a cycle point offset e.g. .. code-block:: cylc [scheduling] [[dependencies]] [[[T00,T12]]] graph = "A[-P1D-PT12H] => B" This means that B[Time] triggers off A[time-P1D-PT12H] (1 day and 12 hours before). Triggers can be chained together. This graph: .. code-block:: cylc graph = """A => B # B triggers off A B => C # C triggers off B""" is equivalent to this: .. code-block:: cylc graph = "A => B => C" *Each trigger in the graph must be unique* but *the same task can appear in multiple pairs or chains*. Separately defined triggers for the same task have an AND relationship. So this: .. code-block:: cylc graph = """A => X # X triggers off A B => X # X also triggers off B""" is equivalent to this: .. code-block:: cylc graph = "A & B => X" # X triggers off A AND B In summary, the branching tree structure of a dependency graph can be partitioned into lines (in the suite.rc graph string) of pairs or chains, in any way you like, with liberal use of internal white space and comments to make the graph structure as clear as possible. .. code-block:: cylc # B triggers if A succeeds, then C and D trigger if B succeeds: graph = "A => B => C & D" # which is equivalent to this: graph = """A => B => C B => D""" # and to this: graph = """A => B => D B => C""" # and to this: graph = """A => B B => C B => D""" # and it can even be written like this: graph = """A => B # blank line follows: B => C # comment ... B => D""" Splitting Up Long Graph Lines """"""""""""""""""""""""""""" It is not necessary to use the general line continuation marker ``\`` to split long graph lines. Just break at dependency arrows, or split long chains into smaller ones. This graph: .. code-block:: cylc graph = "A => B => C" is equivalent to this: .. code-block:: cylc graph = """A => B => C""" and also to this: .. code-block:: cylc graph = """A => B B => C""" .. _GraphTypes: Graph Types ^^^^^^^^^^^ A suite configuration can contain multiple graph strings that are combined to generate the final graph. One-off (Non-Cycling) """"""""""""""""""""" :numref:`fig-test1` shows a small suite of one-off non-cycling tasks; these all share a single cycle point (``1``) and don't spawn successors (once they're all finished the suite just exits). The integer ``1`` attached to each graph node is just an arbitrary label here. .. Need to use a 'container' directive to get centered image with left-aligned caption (as required for code block text). .. container:: twocol .. container:: image .. _fig-test1: .. figure:: graphics/png/orig/test1.png :align: center .. container:: caption One-off (Non-Cycling) Tasks. .. code-block:: cylc [meta] title = some one-off tasks [scheduling] [[dependencies]] graph = "foo => bar & baz => qux" Cycling Graphs """""""""""""" For cycling tasks the graph section heading defines a sequence of cycle points for which the subsequent graph section is valid. :numref:`fig-test2` shows a small suite of cycling tasks. .. Need to use a 'container' directive to get centered image with left-aligned caption (as required for code block text). .. container:: twocol .. container:: image .. _fig-test2: .. figure:: graphics/png/orig/test2.png :align: center .. container:: caption Cycling Tasks. .. code-block:: cylc [meta] title = some cycling tasks # (no dependence between cycle points) [scheduling] [[dependencies]] [[[T00,T12]]] graph = "foo => bar & baz => qux" Graph Section Headings ^^^^^^^^^^^^^^^^^^^^^^ Graph section headings define recurrence expressions, the graph within a graph section heading defines a workflow at each point of the recurrence. For example in the following scenario: .. code-block:: cylc [scheduling] [[dependencies]] [[[ T06 ]]] # A graph section heading graph = foo => bar ``T06`` means "Run every day starting at 06:00 after the initial cycle point". Cylc allows you to start (or end) at any particular time, repeat at whatever frequency you like, and even optionally limit the number of repetitions. Graph section heading can also be used with integer cycling see :ref:`IntegerCycling`. Syntax Rules """""""""""" Date-time cycling information is made up of a starting *date-time*, an *interval*, and an optional *limit*. The time is assumed to be in the local time zone unless you set ``[cylc]cycle point time zone`` or ``[cylc]UTC mode``. The calendar is assumed to be the proleptic Gregorian calendar unless you set ``[scheduling]cycling mode``. The syntax for representations is based on the ISO 8601 date-time standard. This includes the representation of *date-time*, *interval*. What we define for cylc's cycling syntax is our own optionally-heavily-condensed form of ISO 8601 recurrence syntax. The most common full form is: ``R[limit?]/[date-time]/[interval]``. However, we allow omitting information that can be guessed from the context (rules below). This means that it can be written as: .. code-block:: none R[limit?]/[date-time] R[limit?]//[interval] [date-time]/[interval] R[limit?] # Special limit of 1 case [date-time] [interval] with example graph headings for each form being: .. code-block:: cylc [[[ R5/T00 ]]] # Run 5 times at 00:00 every day [[[ R//PT1H ]]] # Run every hour (Note the R// is redundant) [[[ 20000101T00Z/P1D ]]] # Run every day starting at 00:00 1st Jan 2000 [[[ R1 ]]] # Run once at the initial cycle point [[[ R1/20000101T00Z ]]] # Run once at 00:00 1st Jan 2000 [[[ P1Y ]]] # Run every year .. note:: ``T00`` is an example of ``[date-time]``, with an inferred 1 day period and no limit. Where some or all *date-time* information is omitted, it is inferred to be relative to the initial date-time cycle point. For example, ``T00`` by itself would mean the next occurrence of midnight that follows, or is, the initial cycle point. Entering ``+PT6H`` would mean 6 hours after the initial cycle point. Entering ``-P1D`` would mean 1 day before the initial cycle point. Entering no information for the *date-time* implies the initial cycle point date-time itself. Where the *interval* is omitted and some (but not all) *date-time* information is omitted, it is inferred to be a single unit above the largest given specific *date-time* unit. For example, the largest given specific unit in ``T00`` is hours, so the inferred interval is 1 day (daily), ``P1D``. Where the *limit* is omitted, unlimited cycling is assumed. This will be bounded by the final cycle point's date-time if given. Another supported form of ISO 8601 recurrence is: ``R[limit?]/[interval]/[date-time]``. This form uses the *date-time* as the end of the cycling sequence rather than the start. For example, ``R3/P5D/20140430T06`` means: .. code-block:: none 20140420T06 20140425T06 20140430T06 This kind of form can be used for specifying special behaviour near the end of the suite, at the final cycle point's date-time. We can also represent this in cylc with a collapsed form: .. code-block:: none R[limit?]/[interval] R[limit?]//[date-time] [interval]/[date-time] So, for example, you can write: .. code-block:: cylc [[[ R1//+P0D ]]] # Run once at the final cycle point [[[ R5/P1D ]]] # Run 5 times, every 1 day, ending at the final # cycle point [[[ P2W/T00 ]]] # Run every 2 weeks ending at 00:00 following # the final cycle point [[[ R//T00 ]]] # Run every 1 day ending at 00:00 following the # final cycle point .. _referencing-the-initial-and-final-cycle-points: Referencing The Initial And Final Cycle Points """""""""""""""""""""""""""""""""""""""""""""" For convenience the caret and dollar symbols may be used as shorthand for the initial and final cycle points. Using this shorthand you can write: .. code-block:: cylc [[[ R1/^+PT12H ]]] # Repeat once 12 hours after the initial cycle point # R[limit]/[date-time] # Equivalent to [[[ R1/+PT12H ]]] [[[ R1/$ ]]] # Repeat once at the final cycle point # R[limit]/[date-time] # Equivalent to [[[ R1//+P0D ]]] [[[ $-P2D/PT3H ]]] # Repeat 3 hourly starting two days before the # [date-time]/[interval] # final cycle point .. note:: There can be multiple ways to write the same headings, for instance the following all run once at the final cycle point: .. code-block:: cylc [[[ R1/P0Y ]]] # R[limit]/[interval] [[[ R1/P0Y/$ ]]] # R[limit]/[interval]/[date-time] [[[ R1/$ ]]] # R[limit]/[date-time] .. _excluding-dates: Excluding Dates """"""""""""""" Date-times can be excluded from a recurrence by an exclamation mark for example ``[[[ PT1D!20000101 ]]]`` means run daily except on the first of January 2000. This syntax can be used to exclude one or multiple date-times from a recurrence. Multiple date-times are excluded using the syntax ``[[[ PT1D!(20000101,20000102,...) ]]]``. All date-times listed within the parentheses after the exclamation mark will be excluded. .. note:: The ``^`` and ``$`` symbols (shorthand for the initial and final cycle points) are both date-times so ``[[[ T12!$-PT1D ]]]`` is valid. If using a run limit in combination with an exclusion, the heading might not run the number of times specified in the limit. For example in the following suite ``foo`` will only run once as its second run has been excluded. .. code-block:: cylc [scheduling] initial cycle point = 20000101T00Z final cycle point = 20000105T00Z [[dependencies]] [[[ R2/P1D!20000102 ]]] graph = foo Advanced exclusion syntax """"""""""""""""""""""""" In addition to excluding isolated date-time points or lists of date-time points from recurrences, exclusions themselves may be date-time recurrence sequences. Any partial date-time or sequence given after the exclamation mark will be excluded from the main sequence. For example, partial date-times can be excluded using the syntax: .. code-block:: cylc [[[ PT1H ! T12 ]]] # Run hourly but not at 12:00 from the initial # cycle point. [[[ T-00 ! (T00, T06, T12, T18) ]]] # Run hourly but not at 00:00, 06:00, # 12:00, 18:00. [[[ PT5M ! T-15 ]]] # Run 5-minutely but not at 15 minutes past the # hour from the initial cycle point. [[[ T00 ! W-1T00 ]]] # Run daily at 00:00 except on Mondays. It is also valid to use sequences for exclusions. For example: .. code-block:: cylc [[[ PT1H ! PT6H ]]] # Run hourly from the initial cycle point but # not 6-hourly from the initial cycle point. [[[ T-00 ! PT6H ]]] # Run hourly on the hour but not 6-hourly # on the hour. # Same as [[[ T-00 ! T-00/PT6H ]]] (T-00 context is implied) # Same as [[[ T-00 ! (T00, T06, T12, T18) ]]] # Same as [[[ PT1H ! (T00, T06, T12, T18) ]]] Initial cycle point dependent [[[ T12 ! T12/P15D ]]] # Run daily at 12:00 except every 15th day. [[[ R/^/P1H ! R5/20000101T00/P1D ]]] # Any valid recurrence may be used to # determine exclusions. This example # translates to: Repeat every hour from # the initial cycle point, but exclude # 00:00 for 5 days from the 1st January # 2000. You can combine exclusion sequences and single point exclusions within a comma separated list enclosed in parentheses: .. code-block:: cylc [[[ T-00 ! (20000101T07, PT2H) ]]] # Run hourly on the hour but not at 07:00 # on the 1st Jan, 2000 and not 2-hourly # on the hour. .. _HowMultipleGraphStringsCombine: How Multiple Graph Strings Combine """""""""""""""""""""""""""""""""" For a cycling graph with multiple validity sections for different hours of the day, the different sections *add* to generate the complete graph. Different graph sections can overlap (i.e. the same hours may appear in multiple section headings) and the same tasks may appear in multiple sections, but individual dependencies should be unique across the entire graph. For example, the following graph defines a duplicate prerequisite for task C: .. code-block:: cylc [scheduling] [[dependencies]] [[[T00,T06,T12,T18]]] graph = "A => B => C" [[[T06,T18]]] graph = "B => C => X" # duplicate prerequisite: B => C already defined at T06, T18 This does not affect scheduling, but for the sake of clarity and brevity the graph should be written like this: .. code-block:: cylc [scheduling] [[dependencies]] [[[T00,T06,T12,T18]]] graph = "A => B => C" [[[T06,T18]]] # X triggers off C only at 6 and 18 hours graph = "C => X" .. _AdvancedCycling: Advanced Examples """"""""""""""""" The following examples show the various ways of writing graph headings in cylc. .. code-block:: cylc [[[ R1 ]]] # Run once at the initial cycle point [[[ P1D ]]] # Run every day starting at the initial cycle point [[[ PT5M ]]] # Run every 5 minutes starting at the initial cycle # point [[[ T00/P2W ]]] # Run every 2 weeks starting at 00:00 after the # initial cycle point [[[ +P5D/P1M ]]] # Run every month, starting 5 days after the initial # cycle point [[[ R1/T06 ]]] # Run once at 06:00 after the initial cycle point [[[ R1/P0Y ]]] # Run once at the final cycle point [[[ R1/$ ]]] # Run once at the final cycle point (alternative # form) [[[ R1/$-P3D ]]] # Run once three days before the final cycle point [[[ R3/T0830 ]]] # Run 3 times, every day at 08:30 after the initial # cycle point [[[ R3/01T00 ]]] # Run 3 times, every month at 00:00 on the first # of the month after the initial cycle point [[[ R5/W-1/P1M ]]] # Run 5 times, every month starting on Monday # following the initial cycle point [[[ T00!^ ]]] # Run at the first occurrence of T00 that isn't the # initial cycle point [[[ PT1D!20000101 ]]] # Run every day days excluding 1st Jan 2000 [[[ 20140201T06/P1D ]]] # Run every day starting at 20140201T06 [[[ R1/min(T00,T06,T12,T18) ]]] # Run once at the first instance # of either T00, T06, T12 or T18 # starting at the initial cycle # point .. _AdvancedStartingUp: Advanced Starting Up """""""""""""""""""" Dependencies that are only valid at the initial cycle point can be written using the ``R1`` notation (e.g. as in :ref:`initial-non-repeating-r1-tasks`. For example: .. code-block:: cylc [cylc] UTC mode = True [scheduling] initial cycle point = 20130808T00 final cycle point = 20130812T00 [[dependencies]] [[[R1]]] graph = "prep => foo" [[[T00]]] graph = "foo[-P1D] => foo => bar" In the example above, ``R1`` implies ``R1/20130808T00``, so ``prep`` only runs once at that cycle point (the initial cycle point). At that cycle point, ``foo`` will have a dependence on ``prep`` - but not at subsequent cycle points. However, it is possible to have a suite that has multiple effective initial cycles - for example, one starting at ``T00`` and another starting at ``T12``. What if they need to share an initial task? Let's suppose that we add the following section to the suite example above: .. code-block:: cylc [cylc] UTC mode = True [scheduling] initial cycle point = 20130808T00 final cycle point = 20130812T00 [[dependencies]] [[[R1]]] graph = "prep => foo" [[[T00]]] graph = "foo[-P1D] => foo => bar" [[[T12]]] graph = "baz[-P1D] => baz => qux" We'll also say that there should be a starting dependence between ``prep`` and our new task ``baz`` - but we still want to have a single ``prep`` task, at a single cycle. We can write this using a special case of the ``task[-interval]`` syntax - if the interval is null, this implies the task at the initial cycle point. For example, we can write our suite like :numref:`fig-test4`. .. Need to use a 'container' directive to get centered image with left-aligned caption (as required for code block text). .. container:: twocol .. container:: image .. _fig-test4: .. figure:: graphics/png/orig/test4.png :align: center .. container:: caption Staggered Start Suite .. code-block:: cylc [cylc] UTC mode = True [scheduling] initial cycle point = 20130808T00 final cycle point = 20130812T00 [[dependencies]] [[[R1]]] graph = "prep" [[[R1/T00]]] # ^ implies the initial cycle point: graph = "prep[^] => foo" [[[R1/T12]]] # ^ is initial cycle point, as above: graph = "prep[^] => baz" [[[T00]]] graph = "foo[-P1D] => foo => bar" [[[T12]]] graph = "baz[-P1D] => baz => qux" [visualization] initial cycle point = 20130808T00 final cycle point = 20130810T00 [[node attributes]] foo = "color=red" bar = "color=orange" baz = "color=green" qux = "color=blue" This neatly expresses what we want - a task running at the initial cycle point that has one-off dependencies with other task sets at different cycles. .. Need to use a 'container' directive to get centered image with left-aligned caption (as required for code block text). .. container:: twocol .. container:: image .. _fig-test5: .. figure:: graphics/png/orig/test5.png :align: center .. container:: caption Restricted First Cycle Point Suite .. code-block:: cylc [cylc] UTC mode = True [scheduling] initial cycle point = 20130808T00 final cycle point = 20130808T18 [[dependencies]] [[[R1]]] graph = "setup_foo => foo" [[[+PT6H/PT6H]]] graph = """ foo[-PT6H] => foo foo => bar """ [visualization] initial cycle point = 20130808T00 final cycle point = 20130808T18 [[node attributes]] foo = "color=red" bar = "color=orange" A different kind of requirement is displayed in :numref:`fig-test5`. Usually, we want to specify additional tasks and dependencies at the initial cycle point. What if we want our first cycle point to be entirely special, with some tasks missing compared to subsequent cycle points? In :numref:`fig-test5`, ``bar`` will not be run at the initial cycle point, but will still run at subsequent cycle points. ``[[[+PT6H/PT6H]]]`` means start at ``+PT6H`` (6 hours after the initial cycle point) and then repeat every ``PT6H`` (6 hours). Some suites may have staggered start-up sequences where different tasks need running once but only at specific cycle points, potentially due to differing data sources at different cycle points with different possible initial cycle points. To allow this cylc provides a ``min( )`` function that can be used as follows: .. code-block:: cylc [cylc] UTC mode = True [scheduling] initial cycle point = 20100101T03 [[dependencies]] [[[R1/min(T00,T12)]]] graph = "prep1 => foo" [[[R1/min(T06,T18)]]] graph = "prep2 => foo" [[[T00,T06,T12,T18]]] graph = "foo => bar" In this example the initial cycle point is ``20100101T03``, so the ``prep1`` task will run once at ``20100101T12`` and the ``prep2`` task will run once at ``20100101T06`` as these are the first cycle points after the initial cycle point in the respective ``min( )`` entries. .. _IntegerCycling: Integer Cycling """"""""""""""" In addition to non-repeating and date-time cycling workflows, cylc can do integer cycling for repeating workflows that are not date-time based. To construct an integer cycling suite, set ``[scheduling]cycling mode = integer``, and specify integer values for the initial and (optional) final cycle points. The notation for intervals, offsets, and recurrences (sequences) is similar to the date-time cycling notation, except for the simple integer values. The full integer recurrence expressions supported are: - ``Rn/start-point/interval # e.g. R3/1/P2`` - ``Rn/interval/end-point # e.g. R3/P2/9`` But, as for date-time cycling, sequence start and end points can be omitted where suite initial and final cycle points can be assumed. Some examples: .. code-block:: cylc [[[ R1 ]]] # Run once at the initial cycle point # (short for R1/initial-point/?) [[[ P1 ]]] # Repeat with step 1 from the initial cycle point # (short for R/initial-point/P1) [[[ P5 ]]] # Repeat with step 5 from the initial cycle point # (short for R/initial-point/P5) [[[ R2//P2 ]]] # Run twice with step 3 from the initial cycle point # (short for R2/initial-point/P2) [[[ R/+P1/P2 ]]] # Repeat with step 2, from 1 after the initial cycle point [[[ R2/P2 ]]] # Run twice with step 2, to the final cycle point # (short for R2/P2/final-point) [[[ R1/P0 ]]] # Run once at the final cycle point # (short for R1/P0/final-point) Example ''''''' The tutorial illustrates integer cycling in :ref:`TutInteger`, and ``/etc/examples/satellite/`` is a self-contained example of a realistic use for integer cycling. It simulates the processing of incoming satellite data: each new dataset arrives after a random (as far as the suite is concerned) interval, and is labeled by an arbitrary (as far as the suite is concerned) ID in the filename. A task called ``get_data`` at the top of the repeating workflow waits on the next dataset and, when it finds one, moves it to a cycle-point-specific shared workspace for processing by the downstream tasks. When ``get_data.1`` finishes, ``get_data.2`` triggers and begins waiting for the next dataset at the same time as the downstream tasks in cycle point 1 are processing the first one, and so on. In this way multiple datasets can be processed at once if they happen to come in quickly. A single shutdown task runs at the end of the final cycle to collate results. The suite graph is shown in :numref:`fig-satellite`. .. _fig-satellite: .. figure:: graphics/png/orig/satellite.png :align: center The ``etc/examples/satellite`` integer suite. Advanced Integer Cycling Syntax ''''''''''''''''''''''''''''''' The same syntax used to reference the initial and final cycle points (introduced in :ref:`referencing-the-initial-and-final-cycle-points`) for use with date-time cycling can also be used for integer cycling. For example you can write: .. code-block:: cylc [[[ R1/^ ]]] # Run once at the initial cycle point [[[ R1/$ ]]] # Run once at the final cycle point [[[ R3/^/P2 ]]] # Run three times with step two starting at the # initial cycle point Likewise the syntax introduced in :ref:`excluding-dates` for excluding a particular point from a recurrence also works for integer cycling. For example: .. code-block:: cylc [[[ R/P4!8 ]]] # Run with step 4, to the final cycle point # but not at point 8 [[[ R3/3/P2!5 ]]] # Run with step 2 from point 3 but not at # point 5 [[[ R/+P1/P6!14 ]]] # Run with step 6 from 1 step after the # initial cycle point but not at point 14 Multiple integer exclusions are also valid in the same way as the syntax in :ref:`excluding-dates`. Integer exclusions may be a list of single integer points, an integer sequence, or a combination of both: .. code-block:: cylc [[[ R/P1!(2,3,7) ]]] # Run with step 1 to the final cycle point, # but not at points 2, 3, or 7. [[[ P1 ! P2 ]]] # Run with step 1 from the initial to final # cycle point, skipping every other step from # the initial cycle point. [[[ P1 ! +P1/P2 ]]] # Run with step 1 from the initial cycle point, # excluding every other step beginning one step # after the initial cycle point. [[[ P1 !(P2,6,8) ]]] # Run with step 1 from the initial cycle point, # excluding every other step, and also excluding # steps 6 and 8. .. _TriggerTypes: Task Triggering ^^^^^^^^^^^^^^^ A task is said to "trigger" when it submits its job to run, as soon as all of its dependencies (also known as its separate "triggers") are met. Tasks can be made to trigger off of the state of other tasks (indicated by a ``:state`` qualifier on the upstream task (or family) name in the graph) and, and off the clock, and arbitrary external events. External triggering is relatively more complicated, and is documented separately in :ref:`External Triggers`. Success Triggers """""""""""""""" The default, with no trigger type specified, is to trigger off the upstream task succeeding: .. code-block:: cylc # B triggers if A SUCCEEDS: graph = "A => B" For consistency and completeness, however, the success trigger can be explicit: .. code-block:: cylc # B triggers if A SUCCEEDS: graph = "A => B" # or: graph = "A:succeed => B" Failure Triggers """""""""""""""" To trigger off the upstream task reporting failure: .. code-block:: cylc # B triggers if A FAILS: graph = "A:fail => B" *Suicide triggers* can be used to remove task ``B`` here if ``A`` does not fail, see :ref:`SuicideTriggers`. Start Triggers """""""""""""" To trigger off the upstream task starting to execute: .. code-block:: cylc # B triggers if A STARTS EXECUTING: graph = "A:start => B" This can be used to trigger tasks that monitor other tasks once they (the target tasks) start executing. Consider a long-running forecast model, for instance, that generates a sequence of output files as it runs. A postprocessing task could be launched with a start trigger on the model (``model:start => post``) to process the model output as it becomes available. Note, however, that there are several alternative ways of handling this scenario: both tasks could be triggered at the same time (``foo => model & post``), but depending on external queue delays this could result in the monitoring task starting to execute first; or a different postprocessing task could be triggered off a message output for each data file (``model:out1 => post1`` etc.; see :ref:`MessageTriggers`), but this may not be practical if the number of output files is large or if it is difficult to add cylc messaging calls to the model. Finish Triggers """"""""""""""" To trigger off the upstream task succeeding or failing, i.e. finishing one way or the other: .. code-block:: cylc # B triggers if A either SUCCEEDS or FAILS: graph = "A | A:fail => B" # or graph = "A:finish => B" .. _MessageTriggers: Message Triggers """""""""""""""" Tasks can also trigger off custom output messages. These must be registered in the ``[runtime]`` section of the emitting task, and reported using the ``cylc message`` command in task scripting. The graph trigger notation refers to the item name of the registered output message. The example suite ``/etc/examples/message-triggers`` illustrates message triggering. .. literalinclude:: ../../etc/examples/message-triggers/suite.rc :language: cylc Job Submission Triggers """"""""""""""""""""""" It is also possible to trigger off a task submitting, or failing to submit: .. code-block:: cylc # B triggers if A submits successfully: graph = "A:submit => B" # D triggers if C fails to submit successfully: graph = "C:submit-fail => D" A possible use case for submit-fail triggers: if a task goes into the submit-failed state, possibly after several job submission retries, another task that inherits the same runtime but sets a different job submission method and/or host could be triggered to, in effect, run the same job on a different platform. Conditional Triggers """""""""""""""""""" AND operators (``&``) can appear on both sides of an arrow. They provide a concise alternative to defining multiple triggers separately: .. code-block:: cylc # 1/ this: graph = "A & B => C" # is equivalent to: graph = """A => C B => C""" # 2/ this: graph = "A => B & C" # is equivalent to: graph = """A => B A => C""" # 3/ and this: graph = "A & B => C & D" # is equivalent to this: graph = """A => C B => C A => D B => D""" OR operators (``|``) which result in true conditional triggers, can only appear on the left [1]_ : .. code-block:: cylc # C triggers when either A or B finishes: graph = "A | B => C" Forecasting suites typically have simple conditional triggering requirements, but any valid conditional expression can be used, as shown in :numref:`fig-conditional` (conditional triggers are plotted with open arrow heads). .. Need to use a 'container' directive to get centered image with left-aligned caption (as required for code block text). .. container:: twocol .. container:: image .. _fig-conditional: .. figure:: graphics/png/orig/conditional-triggers.png :align: center .. container:: caption Conditional triggers, which are plotted with open arrow heads. .. code-block:: cylc graph = """ # D triggers if A or (B and C) succeed A | B & C => D # just to align the two graph sections D => W # Z triggers if (W or X) and Y succeed (W|X) & Y => Z """ .. _SuicideTriggers: Suicide Triggers """""""""""""""" Suicide triggers take tasks out of the suite. This can be used for automated failure recovery. The suite.rc listing and accompanying graph in :numref:`fig-suicide` show how to define a chain of failure recovery tasks that trigger if they're needed but otherwise remove themselves from the suite (you can run the *AutoRecover.async* example suite to see how this works). The dashed graph edges ending in solid dots indicate suicide triggers, and the open arrowheads indicate conditional triggers as usual. Suicide triggers are ignored by default in the graph view, unless you toggle them on with *View* ``->`` *Options* ``->`` *Ignore Suicide Triggers*. .. Need to use a 'container' directive to get centered image with left-aligned caption (as required for code block text). .. container:: twocol .. container:: image .. _fig-suicide: .. figure:: graphics/png/orig/suicide.png :align: center .. container:: caption Automated failure recovery via suicide triggers. .. code-block:: cylc [meta] title = automated failure recovery description = """ Model task failure triggers diagnosis and recovery tasks, which take themselves out of the suite if model succeeds. Model post processing triggers off model OR recovery tasks. """ [scheduling] [[dependencies]] graph = """ pre => model model:fail => diagnose => recover model => !diagnose & !recover model | recover => post """ [runtime] [[model]] # UNCOMMENT TO TEST FAILURE: # script = /bin/false .. note:: Multiple suicide triggers combine in the same way as other triggers, so this: .. code-block:: cylc foo => !baz bar => !baz is equivalent to this: .. code-block:: cylc foo & bar => !baz i.e. both ``foo`` and ``bar`` must succeed for ``baz`` to be taken out of the suite. If you really want a task to be taken out if any one of several events occurs then be careful to write it that way: .. code-block:: cylc foo | bar => !baz .. warning:: A word of warning on the meaning of "bare suicide triggers". Consider the following suite: .. code-block:: cylc [scheduling] [[dependencies]] graph = "foo => !bar" Task ``bar`` has a suicide trigger but no normal prerequisites (a suicide trigger is not a task triggering prerequisite, it is a task removal prerequisite) so this is entirely equivalent to: .. code-block:: cylc [scheduling] [[dependencies]] graph = """ foo & bar foo => !bar """ In other words both tasks will trigger immediately, at the same time, and then ``bar`` will be removed if ``foo`` succeeds. If an active task proxy (currently in the submitted or running states) is removed from the suite by a suicide trigger, a warning will be logged. .. _FamilyTriggers: Family Triggers """"""""""""""" Families defined by the namespace inheritance hierarchy (:ref:`NIORP`) can be used in the graph trigger whole groups of tasks at the same time (e.g. forecast model ensembles and groups of tasks for processing different observation types at the same time) and for triggering downstream tasks off families as a whole. Higher level families, i.e. families of families, can also be used, and are reduced to the lowest level member tasks. .. note:: Tasks can also trigger off individual family members if necessary. To trigger an entire task family at once: .. code-block:: cylc [scheduling] [[dependencies]] graph = "foo => FAM" [runtime] [[FAM]] # a family (because others inherit from it) [[m1,m2]] # family members (inherit from namespace FAM) inherit = FAM This is equivalent to: .. code-block:: cylc [scheduling] [[dependencies]] graph = "foo => m1 & m2" [runtime] [[FAM]] [[m1,m2]] inherit = FAM To trigger other tasks off families we have to specify whether to triggering off *all members* starting, succeeding, failing, or finishing, or off *any* members (doing the same). Legal family triggers are thus: .. code-block:: cylc [scheduling] [[dependencies]] graph = """ # all-member triggers: FAM:start-all => one FAM:succeed-all => one FAM:fail-all => one FAM:finish-all => one # any-member triggers: FAM:start-any => one FAM:succeed-any => one FAM:fail-any => one FAM:finish-any => one """ Here's how to trigger downstream processing after if one or more family members succeed, but only after all members have finished (succeeded or failed): .. code-block:: cylc [scheduling] [[dependencies]] graph = """ FAM:finish-all & FAM:succeed-any => foo """ .. _EfficientInterFamilyTriggering: Efficient Inter-Family Triggering """"""""""""""""""""""""""""""""" While cylc allows writing dependencies between two families it is important to consider the number of dependencies this will generate. In the following example, each member of ``FAM2`` has dependencies pointing at all the members of ``FAM1``. .. code-block:: cylc [scheduling] [[dependencies]] graph = """ FAM1:succeed-any => FAM2 """ Expanding this out, you generate ``N * M`` dependencies, where ``N`` is the number of members of ``FAM1`` and ``M`` is the number of members of ``FAM2``. This can result in high memory use as the number of members of these families grows, potentially rendering the suite impractical for running on some systems. You can greatly reduce the number of dependencies generated in these situations by putting dummy tasks in the graphing to represent the state of the family you want to trigger off. For example, if ``FAM2`` should trigger off any member of ``FAM1`` succeeding you can create a dummy task ``FAM1_succeed_any_marker`` and place a dependency on it as follows: .. code-block:: cylc [scheduling] [[dependencies]] graph = """ FAM1:succeed-any => FAM1_succeed_any_marker => FAM2 """ [runtime] # ... [[FAM1_succeed_any_marker]] script = true # ... This graph generates only ``N + M`` dependencies, which takes significantly less memory and CPU to store and evaluate. .. _InterCyclePointTriggers: Inter-Cycle Triggers """""""""""""""""""" Typically most tasks in a suite will trigger off others in the same cycle point, but some may depend on others with other cycle points. This notably applies to warm-cycled forecast models, which depend on their own previous instances (see below); but other kinds of inter-cycle dependence are possible too [2]_ . Here's how to express this kind of relationship in cylc: .. code-block:: cylc [dependencies] [[PT6H]] # B triggers off A in the previous cycle point graph = "A[-PT6H] => B" inter-cycle and trigger type (or message trigger) notation can be combined: .. code-block:: cylc # B triggers if A in the previous cycle point fails: graph = "A[-PT6H]:fail => B" At suite start-up inter-cycle triggers refer to a previous cycle point that does not exist. This does not cause the dependent task to wait indefinitely, however, because cylc ignores triggers that reach back beyond the initial cycle point. That said, the presence of an inter-cycle trigger does normally imply that something special has to happen at start-up. If a model depends on its own previous instance for restart files, for instance, then an initial set of restart files has to be generated somehow or the first model task will presumably fail with missing input files. There are several ways to handle this in cylc using different kinds of one-off (non-cycling) tasks that run at suite start-up. They are illustrated in :ref:`TutInterCyclePointTriggers`; to summarize here briefly: - ``R1`` tasks (recommended): .. code-block:: cylc [scheduling] [[dependencies]] [[[R1]]] graph = "prep" [[[R1/T00,R1/T12]]] graph = "prep[^] => foo" [[[T00,T12]]] graph = "foo[-PT12H] => foo => bar" ``R1``, or ``R1/date-time`` tasks are the recommended way to specify unusual start up conditions. They allow you to specify a clean distinction between the dependencies of initial cycles and the dependencies of the subsequent cycles. Initial tasks can be used for real model cold-start processes, whereby a warm-cycled model at any given cycle point can in principle have its inputs satisfied by a previous instance of itself, *or* by an initial task with (nominally) the same cycle point. In effect, the ``R1`` task masquerades as the previous-cycle-point trigger of its associated cycling task. At suite start-up initial tasks will trigger the first cycling tasks, and thereafter the inter-cycle trigger will take effect. If a task has a dependency on another task in a different cycle point, the dependency can be written using the ``[offset]`` syntax such as ``[-PT12H]`` in ``foo[-PT12H] => foo``. This means that ``foo`` at the current cycle point depends on a previous instance of ``foo`` at 12 hours before the current cycle point. Unlike the cycling section headings (e.g. ``[[[T00,T12]]]``), dependencies assume that relative times are relative to the current cycle point, not the initial cycle point. However, it can be useful to have specific dependencies on tasks at or near the initial cycle point. You can switch the context of the offset to be the initial cycle point by using the caret symbol: ``^``. For example, you can write ``foo[^]`` to mean foo at the initial cycle point, and ``foo[^+PT6H]`` to mean foo 6 hours after the initial cycle point. Usually, this kind of dependency will only apply in a limited number of cycle points near the start of the suite, so you may want to write it in ``R1``-based cycling sections. Here's the example inter-cycle ``R1`` suite from above again. .. code-block:: cylc [scheduling] [[dependencies]] [[[R1]]] graph = "prep" [[[R1/T00,R1/T12]]] graph = "prep[^] => foo" [[[T00,T12]]] graph = "foo[-PT12H] => foo => bar" You can see there is a dependence on the initial ``R1`` task ``prep`` for ``foo`` at the first ``T00`` cycle point, and at the first ``T12`` cycle point. Thereafter, ``foo`` just depends on its previous (12 hours ago) instance. Finally, it is also possible to have a dependency on a task at a specific cycle point. .. code-block:: cylc [scheduling] [[dependencies]] [[[R1/20200202]]] graph = "baz[20200101] => qux" However, in a long running suite, a repeating cycle should avoid having a dependency on a task with a specific cycle point (including the initial cycle point) - as it can currently cause performance issue. In the following example, all instances of ``qux`` will depend on ``baz.20200101``, which will never be removed from the task pool: .. code-block:: cylc [scheduling] initial cycle point = 2010 [[dependencies]] # Can cause performance issue! [[[P1D]]] graph = "baz[20200101] => qux" .. _SequentialTasks: Special Sequential Tasks """""""""""""""""""""""" Tasks that depend on their own previous-cycle instance can be declared as *sequential*: .. code-block:: cylc [scheduling] [[special tasks]] # foo depends on its previous instance: sequential = foo # deprecated - see below! [[dependencies]] [[[T00,T12]]] graph = "foo => bar" *The sequential declaration is deprecated* however, in favor of explicit inter-cycle triggers which clearly expose the same scheduling behaviour in the graph: .. code-block:: cylc [scheduling] [[dependencies]] [[[T00,T12]]] # foo depends on its previous instance: graph = "foo[-PT12H] => foo => bar" The sequential declaration is arguably convenient in one unusual situation though: if a task has a non-uniform cycling sequence then multiple explicit triggers, .. code-block:: cylc [scheduling] [[dependencies]] [[[T00,T03,T11]]] graph = "foo => bar" [[[T00]]] graph = "foo[-PT13H] => foo" [[[T03]]] graph = "foo[-PT3H] => foo" [[[T11]]] graph = "foo[-PT8H] => foo" can be replaced by a single sequential declaration, .. code-block:: cylc [scheduling] [[special tasks]] sequential = foo [[dependencies]] [[[T00,T03,T11]]] graph = "foo => bar" Future Triggers """"""""""""""" Cylc also supports inter-cycle triggering off tasks "in the future" (with respect to cycle point - which has no bearing on wall-clock job submission time unless the task has a clock trigger): .. code-block:: cylc [[dependencies]] [[[T00,T06,T12,T18]]] graph = """ # A runs in this cycle: A # B in this cycle triggers off A in the next cycle. A[PT6H] => B """ Future triggers present a problem at suite shutdown rather than at start-up. Here, ``B`` at the final cycle point wants to trigger off an instance of ``A`` that will never exist because it is beyond the suite stop point. Consequently Cylc prevents tasks from spawning successors that depend on other tasks beyond the final point. .. _ClockTriggerTasks: Clock Triggers """""""""""""" .. note:: Please read External Triggers (:ref:`External Triggers`) before using the older clock triggers described in this section. By default, date-time cycle points are not connected to the real time "wall clock". They are just labels that are passed to task jobs (e.g. to initialize an atmospheric model run with a particular date-time value). In real time cycling systems, however, some tasks - typically those near the top of the graph in each cycle - need to trigger at or near the time when their cycle point is equal to the real clock date-time. So *clock triggers* allow tasks to trigger at (or after, depending on other triggers) a wall clock time expressed as an offset from cycle point: .. code-block:: cylc [scheduling] [[special tasks]] clock-trigger = foo(PT2H) [[dependencies]] [[[T00]]] graph = foo Here, ``foo[2015-08-23T00]`` would trigger (other dependencies allowing) when the wall clock time reaches ``2015-08-23T02``. Clock-trigger offsets are normally positive, to trigger some time *after* the wall-clock time is equal to task cycle point. Clock-triggers have no effect on scheduling if a suite is running sufficiently far behind the clock (e.g. after a delay, or because it is processing archived historical data) that the trigger times, which are relative to task cycle point, have already passed. .. _ClockExpireTasks: Clock-Expire Triggers """"""""""""""""""""" Tasks can be configured to *expire* - i.e. to skip job submission and enter the *expired* state - if they are too far behind the wall clock when they become ready to run, and other tasks can trigger off this. As a possible use case, consider a cycling task that copies the latest of a set of files to overwrite the previous set: if the task is delayed by more than one cycle there may be no point in running it because the freshly copied files will just be overwritten immediately by the next task instance as the suite catches back up to real time operation. Clock-expire tasks are configured like clock-trigger tasks, with a date-time offset relative to cycle point (:ref:`ClockExpireRef`). The offset should be positive to make the task expire if the wall-clock time has gone beyond the cycle point. Triggering off an expired task typically requires suicide triggers to remove the workflow that runs if the task has not expired. Here a task called ``copy`` expires, and its downstream workflow is skipped, if it is more than one day behind the wall-clock (see also ``etc/examples/clock-expire``): .. code-block:: cylc [cylc] cycle point format = %Y-%m-%dT%H [scheduling] initial cycle point = 2015-08-15T00 [[special tasks]] clock-expire = copy(-P1D) [[dependencies]] [[[P1D]]] graph = """ model[-P1D] => model => copy => proc copy:expired => !proc""" External Event Triggers """"""""""""""""""""""" This is a substantial topic, documented in :ref:`External Triggers`. .. _ModelRestartDependencies: Model Restart Dependencies ^^^^^^^^^^^^^^^^^^^^^^^^^^ Warm-cycled forecast models generate *restart files*, e.g. model background fields, to initialize the next forecast. This kind of dependence requires an inter-cycle trigger: .. code-block:: cylc [scheduling] [[dependencies]] [[[T00,T06,T12,T18]]] graph = "A[-PT6H] => A" If your model is configured to write out additional restart files to allow one or more cycle points to be skipped in an emergency *do not represent these potential dependencies in the suite graph* as they should not be used under normal circumstances. For example, the following graph would result in task ``A`` erroneously triggering off ``A[T-24]`` as a matter of course, instead of off ``A[T-6]``, because ``A[T-24]`` will always be finished first: .. code-block:: cylc [scheduling] [[dependencies]] [[[T00,T06,T12,T18]]] # DO NOT DO THIS (SEE ACCOMPANYING TEXT): graph = "A[-PT24H] | A[-PT18H] | A[-PT12H] | A[-PT6H] => A" How The Graph Determines Task Instantiation ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ A graph trigger pair like ``foo => bar`` determines the existence and prerequisites (dependencies) of the downstream task ``bar``, for the cycle points defined by the associated graph section heading. In general it does not say anything about the dependencies or existence of the upstream task ``foo``. However *if the trigger has no cycle point offset* Cylc will infer that ``bar`` must exist at the same cycle points as ``foo``. This is a convenience to allow this: .. code-block:: cylc graph = "foo => bar" to be written as shorthand for this: .. code-block:: cylc graph = """foo foo => bar""" (where ``foo`` by itself means `` => foo``, i.e. the task exists at these cycle points but has no prerequisites - although other prerequisites may be defined for it in other parts of the graph). *Cylc does not infer the existence of the upstream task in offset triggers* like ``foo[-P1D] => bar`` because, as explained in :ref:`cylc-6-migration-implicit-cycling`, a typo in the offset interval should generate an error rather than silently creating tasks on an erroneous cycling sequence. As a result you need to be careful not to define inter-cycle dependencies that cannot be satisfied at run time. Suite validation catches this kind of error if the existence of the cycle offset task is not defined anywhere at all: .. code-block:: cylc [scheduling] initial cycle point = 2020 [[dependencies]] [[[P1Y]]] # ERROR graph = "foo[-P1Y] => bar" .. code-block:: bash $ cylc validate SUITE 'ERROR: No cycling sequences defined for foo' To fix this, use another line in the graph to tell Cylc to define ``foo`` at each cycle point: .. code-block:: cylc [scheduling] initial cycle point = 2020 [[dependencies]] [[[P1Y]]] graph = """ foo foo[-P1Y] => bar""" But validation does not catch this kind of error if the offset task is defined only on a different cycling sequence: .. code-block:: cylc [scheduling] initial cycle point = 2020 [[dependencies]] [[[P2Y]]] graph = """foo # ERROR foo[-P1Y] => bar""" This suite will validate OK, but it will stall at runtime with ``bar`` waiting on ``foo[-P1Y]`` at the intermediate years where it does not exist. The offset ``[-P1Y]`` is presumably an error (it should be ``[-P2Y]``), or else another graph line is needed to generate ``foo`` instances on the yearly sequence: .. code-block:: cylc [scheduling] initial cycle point = 2020 [[dependencies]] [[[P1Y]]] graph = "foo" [[[P2Y]]] graph = "foo[-P1Y] => bar" Similarly the following suite will validate OK, but it will stall at runtime with ``bar`` waiting on ``foo[-P1Y]`` in every cycle point, when only a single instance of it exists, at the initial cycle point: .. code-block:: cylc [scheduling] initial cycle point = 2020 [[dependencies]] [[[R1]]] graph = foo [[[P1Y]]] # ERROR graph = foo[-P1Y] => bar .. note:: ``cylc graph`` will display un-satisfiable inter-cycle dependencies as "ghost nodes". :numref:`ghost-node-screenshot` is a screenshot of cylc graph displaying the above example with the un-satisfiable task (foo) displayed as a "ghost node". .. _ghost-node-screenshot: .. figure:: graphics/png/orig/ghost-node-example.png :align: center Screenshot of ``cylc graph`` showing one task as a "ghost node". .. _NIORP: Runtime - Task Configuration ---------------------------- The ``[runtime]`` section of a suite configuration configures what to execute (and where and how to execute it) when each task is ready to run, in a *multiple inheritance hierarchy* of *namespaces* culminating in individual tasks. This allows all common configuration detail to be factored out and defined in one place. Any namespace can configure any or all of the items defined in :ref:`SuiteRCReference`. Namespaces that do not explicitly inherit from others automatically inherit from the *root* namespace (below). Nested namespaces define *task families* that can be used in the graph as convenient shorthand for triggering all member tasks at once, or for triggering other tasks off all members at once - see :ref:`FamilyTriggers`. Nested namespaces can be progressively expanded and collapsed in the dependency graph viewer, and in the gcylc graph and text views. Only the first parent of each namespace (as for single-inheritance) is used for suite visualization purposes. .. _TaskNames: Task and Namespace Names ^^^^^^^^^^^^^^^^^^^^^^^^ There are restrictions on names that can be used for tasks and namespaces to ensure all tasks are processed and implemented correctly. Valid names must: 1. *begin with* (or consist only of, as single-character task names are allowed) either: - an alphanumeric character, i.e. a letter in either upper or lower case (``a-z`` or ``A-Z``), or a digit (``0-9``); - an underscore (``_``). 2. *otherwise* contain *only* characters from the following options: - alphanumeric characters or underscores (as above); - any of these additional character symbols: - hyphens (``-``); - plus characters (``+``); - percent signs (``%``); - "at" signs (``@``). 3. not be so long, typically over ``255`` characters, as to raise errors from exceeding maximum filename length on the operating system for generated outputs e.g. directories named (in part) after the task they concern. .. warning:: Task and namespace names may not contain colons (``:``), which would preclude use of directory paths involving the registration name in ``$PATH`` variables). They also may not contain the dot (``.``) character, as it will be interpreted as the delimiter separating the task name from an appended cycle point (see :ref:`TaskIdentifiers`). Invalid names for tasks or namespaces will be raised as errors by ``cylc validate``. .. note:: *Task names need not be hardwired into task implementations* because task and suite identity can be extracted portably from the task execution environment supplied by the suite server program (:ref:`TaskExecutionEnvironment`) - then to rename a task you can just change its name in the suite configuration. Root - Runtime Defaults ^^^^^^^^^^^^^^^^^^^^^^^ The root namespace, at the base of the inheritance hierarchy, provides default configuration for all tasks in the suite. Most root items are unset by default, but some have default values sufficient to allow test suites to be defined by dependency graph alone. The *script* item, for example, defaults to code that prints a message then sleeps for between 1 and 15 seconds and exits. Default values are documented with each item in :ref:`SuiteRCReference`. You can override the defaults or provide your own defaults by explicitly configuring the root namespace. .. _MultiTaskDef: Defining Multiple Namespaces At Once ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ If a namespace section heading is a comma-separated list of names then the subsequent configuration applies to each list member. Particular tasks can be singled out at run time using the ``$CYLC_TASK_NAME`` variable. As an example, consider a suite containing an ensemble of closely related tasks that each invokes the same script but with a unique argument that identifies the calling task name: .. code-block:: cylc [runtime] [[ENSEMBLE]] script = "run-model.sh $CYLC_TASK_NAME" [[m1, m2, m3]] inherit = ENSEMBLE For large ensembles template processing can be used to automatically generate the member names and associated dependencies (see :ref:`Jinja` and :ref:`EmPylabel`). Runtime Inheritance - Single ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The following listing of the *inherit.single.one* example suite illustrates basic runtime inheritance with single parents. .. literalinclude:: ../../etc/examples/inherit/single/one/suite.rc :language: cylc Runtime Inheritance - Multiple ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ If a namespace inherits from multiple parents the linear order of precedence (which namespace overrides which) is determined by the so-called *C3 algorithm* used to find the linear *method resolution order* for class hierarchies in Python and several other object oriented programming languages. The result of this should be fairly obvious for typical use of multiple inheritance in cylc suites, but for detailed documentation of how the algorithm works refer to the `official Python documentation `_. The *inherit.multi.one* example suite, listed here, makes use of multiple inheritance: .. literalinclude:: ../../etc/examples/inherit/multi/one/suite.rc :language: cylc ``cylc get-suite-config`` provides an easy way to check the result of inheritance in a suite. You can extract specific items, e.g.: .. code-block:: bash $ cylc get-suite-config --item '[runtime][var_p2]script' \ inherit.multi.one echo ``RUN: run-var.sh'' or use the ``--sparse`` option to print entire namespaces without obscuring the result with the dense runtime structure obtained from the root namespace: .. code-block:: bash $ cylc get-suite-config --sparse --item '[runtime]ops_s1' inherit.multi.one script = echo ``RUN: run-ops.sh'' inherit = ['OPS', 'SERIAL'] [directives] job_type = serial Suite Visualization And Multiple Inheritance """""""""""""""""""""""""""""""""""""""""""" The first parent inherited by a namespace is also used as the collapsible family group when visualizing the suite. If this is not what you want, you can demote the first parent for visualization purposes, without affecting the order of inheritance of runtime properties: .. code-block:: cylc [runtime] [[BAR]] # ... [[foo]] # inherit properties from BAR, but stay under root for visualization: inherit = None, BAR How Runtime Inheritance Works ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The linear precedence order of ancestors is computed for each namespace using the C3 algorithm. Then any runtime items that are explicitly configured in the suite configuration are "inherited" up the linearized hierarchy for each task, starting at the root namespace: if a particular item is defined at multiple levels in the hierarchy, the level nearest the final task namespace takes precedence. Finally, root namespace defaults are applied for every item that has not been configured in the inheritance process (this is more efficient than carrying the full dense namespace structure through from root from the beginning). .. _TaskExecutionEnvironment: Task Execution Environment ^^^^^^^^^^^^^^^^^^^^^^^^^^ The task execution environment contains suite and task identity variables provided by the suite server program, and user-defined environment variables. The environment is explicitly exported (by the task job script) prior to executing the task ``script`` (see :ref:`TaskJobSubmission`). Suite and task identity are exported first, so that user-defined variables can refer to them. Order of definition is preserved throughout so that variable assignment expressions can safely refer to previously defined variables. Additionally, access to cylc itself is configured prior to the user-defined environment, so that variable assignment expressions can make use of cylc utility commands: .. code-block:: cylc [runtime] [[foo]] [[[environment]]] REFERENCE_TIME = $( cylc util cycletime --offset-hours=6 ) User Environment Variables """""""""""""""""""""""""" A task's user-defined environment results from its inherited ``[[[environment]]]`` sections: .. code-block:: cylc [runtime] [[root]] [[[environment]]] COLOR = red SHAPE = circle [[foo]] [[[environment]]] COLOR = blue # root override TEXTURE = rough # new variable This results in a task *foo* with ``SHAPE=circle``, ``COLOR=blue``, and ``TEXTURE=rough`` in its environment. Overriding Environment Variables """""""""""""""""""""""""""""""" When you override inherited namespace items the original parent item definition is *replaced* by the new definition. This applies to all items including those in the environment sub-sections which, strictly speaking, are not "environment variables" until they are written, post inheritance processing, to the task job script that executes the associated task. Consequently, if you override an environment variable you cannot also access the original parent value: .. code-block:: cylc [runtime] [[FOO]] [[[environment]]] COLOR = red [[bar]] inherit = FOO [[[environment]]] tmp = $COLOR # !! ERROR: $COLOR is undefined here COLOR = dark-$tmp # !! as this overrides COLOR in FOO. The compressed variant of this, ``COLOR = dark-$COLOR``, is also in error for the same reason. To achieve the desired result you must use a different name for the parent variable: .. code-block:: cylc [runtime] [[FOO]] [[[environment]]] FOO_COLOR = red [[bar]] inherit = FOO [[[environment]]] COLOR = dark-$FOO_COLOR # OK .. _Task Job Script Variables: Task Job Script Variables """"""""""""""""""""""""" These are variables that can be referenced (but should not be modified) in a task job script. The task job script may export the following environment variables: .. code-block:: bash CYLC_DEBUG # Debug mode, true or not defined CYLC_DIR # Location of cylc installation used CYLC_VERSION # Version of cylc installation used CYLC_CYCLING_MODE # Cycling mode, e.g. gregorian CYLC_SUITE_FINAL_CYCLE_POINT # Final cycle point CYLC_SUITE_INITIAL_CYCLE_POINT # Initial cycle point CYLC_SUITE_NAME # Suite name CYLC_UTC # UTC mode, True or False CYLC_VERBOSE # Verbose mode, True or False TZ # Set to "UTC" in UTC mode or not defined CYLC_SUITE_RUN_DIR # Location of the suite run directory in # job host, e.g. ~/cylc-run/foo CYLC_SUITE_DEF_PATH # Location of the suite configuration directory in # job host, e.g. ~/cylc-run/foo CYLC_SUITE_HOST # Host running the suite process CYLC_SUITE_OWNER # User ID running the suite process CYLC_SUITE_DEF_PATH_ON_SUITE_HOST # Location of the suite configuration directory in # suite host, e.g. ~/cylc-run/foo CYLC_SUITE_SHARE_DIR # Suite (or task!) shared directory (see below) CYLC_SUITE_UUID # Suite UUID string CYLC_SUITE_WORK_DIR # Suite work directory (see below) CYLC_TASK_JOB # Task job identifier expressed as # CYCLE-POINT/TASK-NAME/SUBMIT-NUM # e.g. 20110511T1800Z/t1/01 CYLC_TASK_CYCLE_POINT # Cycle point, e.g. 20110511T1800Z CYLC_TASK_NAME # Job's task name, e.g. t1 CYLC_TASK_SUBMIT_NUMBER # Job's submit number, e.g. 1, # increments with every submit CYLC_TASK_TRY_NUMBER # Number of execution tries, e.g. 1 # increments with automatic retry-on-fail CYLC_TASK_ID # Task instance identifier expressed as # TASK-NAME.CYCLE-POINT # e.g. t1.20110511T1800Z CYLC_TASK_LOG_DIR # Location of the job log directory # e.g. ~/cylc-run/foo/log/job/20110511T1800Z/t1/01/ CYLC_TASK_LOG_ROOT # The task job file path # e.g. ~/cylc-run/foo/log/job/20110511T1800Z/t1/01/job CYLC_TASK_WORK_DIR # Location of task work directory (see below) # e.g. ~/cylc-run/foo/work/20110511T1800Z/t1 CYLC_TASK_NAMESPACE_HIERARCHY # Linearised family namespace of the task, # e.g. root postproc t1 CYLC_TASK_DEPENDENCIES # List of met dependencies that triggered the task # e.g. foo.1 bar.1 CYLC_TASK_COMMS_METHOD # Set to "ssh" if communication method is "ssh" CYLC_TASK_SSH_LOGIN_SHELL # With "ssh" communication, if set to "True", # use login shell on suite host There are also some global shell variables that may be defined in the task job script (but not exported to the environment). These include: .. code-block:: bash CYLC_FAIL_SIGNALS # List of signals trapped by the error trap CYLC_VACATION_SIGNALS # List of signals trapped by the vacation trap CYLC_SUITE_WORK_DIR_ROOT # Root directory above the suite work directory # in the job host CYLC_TASK_MESSAGE_STARTED_PID # PID of "cylc message" job started" command CYLC_TASK_WORK_DIR_BASE # Alternate task work directory, # relative to the suite work directory Suite Share Directories """"""""""""""""""""""" A *suite share directory* is created automatically under the suite run directory as a share space for tasks. The location is available to tasks as ``$CYLC_SUITE_SHARE_DIR``. In a cycling suite, output files are typically held in cycle point sub-directories of the suite share directory. The top level share and work directory (below) location can be changed (e.g. to a large data area) by a global config setting (see :ref:`workdirectory`). Task Work Directories """"""""""""""""""""" Task job scripts are executed from within *work directories* created automatically under the suite run directory. A task can get its own work directory from ``$CYLC_TASK_WORK_DIR`` (or simply ``$PWD`` if it does not ``cd`` elsewhere at runtime). By default the location contains task name and cycle point, to provide a unique workspace for every instance of every task. This can be overridden in the suite configuration, however, to get several tasks to share the same work directory (see :ref:`worksubdirectory`). The top level work and share directory (above) location can be changed (e.g. to a large data area) by a global config setting (see :ref:`workdirectory`). Environment Variable Evaluation """"""""""""""""""""""""""""""" Variables in the task execution environment are not evaluated in the shell in which the suite is running prior to submitting the task. They are written in unevaluated form to the job script that is submitted by cylc to run the task (:ref:`JobScripts`) and are therefore evaluated when the task begins executing under the task owner account on the task host. Thus ``$HOME``, for instance, evaluates at run time to the home directory of task owner on the task host. How Tasks Get Access To The Suite Directory ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Tasks can use ``$CYLC_SUITE_DEF_PATH`` to access suite files on the task host, and the suite bin directory is automatically added ``$PATH``. If a remote suite configuration directory is not specified the local (suite host) path will be assumed with the local home directory, if present, swapped for literal ``$HOME`` for evaluation on the task host. .. _RunningTasksOnARemoteHost: Remote Task Hosting ^^^^^^^^^^^^^^^^^^^ If a task declares an owner other than the suite owner and/or a host other than the suite host, cylc will use non-interactive ssh to execute the task on the ``owner@host`` account by the configured batch system: .. code-block:: cylc [runtime] [[foo]] [[[remote]]] host = orca.niwa.co.nz owner = bob [[[job]]] batch system = pbs For this to work: - non-interactive ssh is required from the suite host to the remote task accounts. - cylc must be installed on task hosts. - Optional software dependencies such as graphviz and Jinja2 are not needed on task hosts. - If polling task communication is used, there is no other requirement. - If SSH task communication is configured, non-interactive ssh is required from the task host to the suite host. - If (default) task communication is configured, the task host should have access to the port on the suite host. - the suite configuration directory, or some fraction of its content, can be installed on the task host, if needed. To learn how to give remote tasks access to cylc, see :ref:`HowTasksGetAccessToCylc`. Tasks running on the suite host under another user account are treated as remote tasks. Remote hosting, like all namespace settings, can be declared globally in the root namespace, or per family, or for individual tasks. Dynamic Host Selection """""""""""""""""""""" Instead of hardwiring host names into the suite configuration you can specify a shell command that prints a hostname, or an environment variable that holds a hostname, as the value of the host config item. See :ref:`DynamicHostSelection`. Remote Task Log Directories """"""""""""""""""""""""""" Task stdout and stderr streams are written to log files in a suite-specific sub-directory of the *suite run directory*, as explained in :ref:`WhitherStdoutAndStderr`. For remote tasks the same directory is used, but *on the task host*. Remote task log directories, like local ones, are created on the fly, if necessary, during job submission. .. _viso: Visualization ------------- The visualization section of a suite configuration is used to configure suite graphing, principally graph node (task) and edge (dependency arrow) style attributes. Tasks can be grouped for the purpose of applying common style attributes. See :ref:`SuiteRCReference` for details. Collapsible Families In Suite Graphs ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: cylc [visualization] collapsed families = family1, family2 Nested families from the runtime inheritance hierarchy can be expanded and collapsed in suite graphs and the gcylc graph view. All families are displayed in the collapsed state at first, unless ``[visualization]collapsed families`` is used to single out specific families for initial collapsing. In the gcylc graph view, nodes outside of the main graph (such as the members of collapsed families) are plotted as rectangular nodes to the right if they are doing anything interesting (submitted, running, failed). :numref:`fig-namespaces` illustrates successive expansion of nested task families in the *namespaces* example suite. .. todo:: Create sub-figures if possible: for now hacked as separate figures with link and caption on final displayed figure. .. figure:: graphics/png/orig/inherit-2.png :align: center .. figure:: graphics/png/orig/inherit-3.png :align: center .. figure:: graphics/png/orig/inherit-4.png :align: center .. figure:: graphics/png/orig/inherit-5.png :align: center .. figure:: graphics/png/orig/inherit-6.png :align: center .. _fig-namespaces: .. figure:: graphics/png/orig/inherit-7.png :align: center Graphs of the *namespaces* example suite showing various states of expansion of the nested namespace family hierarchy, from all families collapsed (top left) through to all expanded (bottom right). This can also be done by right-clicking on tasks in the gcylc graph view. .. _Parameterized Tasks Label: Parameterized Tasks ------------------- Cylc can automatically generate tasks and dependencies by expanding parameterized task names over lists of parameter values. Uses for this include: - generating an ensemble of similar model runs - generating chains of tasks to process similar datasets - replicating an entire workflow, or part thereof, over several runs - splitting a long model run into smaller steps or ``chunks`` (parameterized cycling) .. note:: This can be done with Jinja2 loops too (:ref:`Jinja`) but parameterization is much cleaner (nested loops can seriously reduce the clarity of a suite configuration).* Parameter Expansion ^^^^^^^^^^^^^^^^^^^ Parameter values can be lists of strings, or lists of integers and integer ranges (with inclusive bounds). Numeric values in a list of strings are considered strings. It is not possible to mix strings with integer ranges. For example: .. code-block:: cylc [cylc] [[parameters]] # parameters: "ship", "buoy", "plane" # default task suffixes: _ship, _buoy, _plane obs = ship, buoy, plane # parameters: 1, 2, 3, 4, 5 # default task suffixes: _run1, _run2, _run3, _run4, _run5 run = 1..5 # parameters: 1, 3, 5, 7, 9 # default task suffixes: _idx1, _idx3, _idx5, _idx7, _idx9 idx = 1..9..2 # parameters: -11, -1, 9 # default task suffixes: _idx-11, _idx-01, _idx+09 idx = -11..9..10 # parameters: 1, 3, 5, 10, 11, 12, 13 # default task suffixes: _i01, _i03, _i05, _i10, _i11, _i12, _i13 i = 1..5..2, 10, 11..13 # parameters: "0", "1", "e", "pi", "i" # default task suffixes: _0, _1, _e, _pi, _i item = 0, 1, e, pi, i # ERROR: mix strings with int range p = one, two, 3..5 Then angle brackets denote use of these parameters throughout the suite configuration. For the values above, this parameterized name: .. code-block:: none model # for run = 1..5 expands to these concrete task names: .. code-block:: none model_run1, model_run2, model_run3, model_run4, model_run5 and this parameterized name: .. code-block:: none proc # for obs = ship, buoy, plane expands to these concrete task names: .. code-block:: none proc_ship, proc_buoy, proc_plane By default, to avoid any ambiguity, the parameter name appears in the expanded task names for integer values, but not for string values. For example, ``model_run1`` for ``run = 1``, but ``proc_ship`` for ``obs = ship``. However, the default expansion templates can be overridden if need be: .. code-block:: cylc [cylc] [[parameters]] obs = ship, buoy, plane run = 1..5 [[parameter templates]] run = -R%(run)s # Make foo expand to foo-R1 etc. (See :ref:`RefParameterTemplates` for more on the string template syntax.) Any number of parameters can be used at once. This parameterization: .. code-block:: none model # for run = 1..2 and obs = ship, buoy, plane expands to these tasks names: .. code-block:: none model_run1_ship, model_run1_buoy, model_run1_plane, model_run2_ship, model_run2_buoy, model_run2_plane Here's a simple but complete example suite: .. code-block:: cylc [cylc] [[parameters]] run = 1..2 [scheduling] [[dependencies]] graph = "prep => model" [runtime] [[model]] # ... The result, post parameter expansion, is this: .. code-block:: cylc [scheduling] [[dependencies]] graph = "prep => model_run1 & model_run2" [runtime] [[model_run1]] # ... [[model_run2]] # ... Here's a more complex graph using two parameters (``[runtime]`` omitted): .. code-block:: cylc [cylc] [[parameters]] run = 1..2 mem = cat, dog [scheduling] [[dependencies]] graph = """prep => init => model => post => wrap => done""" .. todo:: \.\.\. which expands to: [scheduling] [[dependencies]] graph = """ prep => init_run1 => model_run1_cat => post_run1_cat => wrap_run1 => done init_run1 => model_run1_dog => post_run2_dog => wrap_run1 prep => init_run2 => model_run2_cat => post_run2_cat => wrap_run2 => done init_run2 => model_run2_dog => post_run2_dog => wrap_run2""" :numref:`fig-params-1` shows the result as visualized by ``cylc graph``. .. _fig-params-1: .. figure:: graphics/png/orig/params1.png :align: center Parameter expansion example. Zero-Padded Integer Values """""""""""""""""""""""""" Integer parameter values are given a default template for generating task suffixes that are zero-padded according to the longest size of their values. For example, the default template for ``p = 9..10`` would be ``_p%(p)02d``, so that ``foo

`` would become ``foo_p09, foo_p10``. If negative values are present in the parameter list, the default template will include the sign. For example, the default template for ``p = -1..1`` would be ``_p%(p)+02d``, so that ``foo

`` would become ``foo_p-1, foo_p+0, foo_p+1``. To get thicker padding and/or alternate suffixes, use a template. E.g.: .. code-block:: cylc [cylc] [[parameters]] i = 1..9 p = 3..14 [[parameter templates]] i = _i%(i)02d # suffixes = _i01, _i02, ..., _i09 # A double-percent gives a literal percent character p = %%p%(p)03d # suffixes = %p003, %p004, ..., %p013, %p014 Parameters as Full Task Names """"""""""""""""""""""""""""" Parameter values can be used as full task names, but the default template should be overridden to remove the initial underscore. For example: .. code-block:: cylc [cylc] [[parameters]] i = 1..4 obs = ship, buoy, plane [[parameter templates]] i = i%(i)d # task name must begin with an alphabet obs = %(obs)s [scheduling] [[dependencies]] graph = """ foo => # foo => i1 & i2 & i3 & i4 => bar # ship & buoy & plane => bar """ Passing Parameter Values To Tasks ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Parameter values are passed as environment variables to tasks generated by parameter expansion. For example, if we have: .. code-block:: cylc [cylc] [[parameters]] obs = ship, buoy, plane run = 1..5 [scheduling] [[dependencies]] graph = model Then task ``model_run2_ship`` would get the following standard environment variables: .. code-block:: bash # In a job script of an instance of the "model_run2_ship" task: export CYLC_TASK_PARAM_run="2" export CYLC_TASK_PARAM_obs="ship" These variables allow tasks to determine which member of a parameterized group they are, and so to vary their behaviour accordingly. You can also define custom variables and string templates for parameter value substitution. For example, if we add this to the above configuration: .. code-block:: cylc [runtime] [[model]] [[[environment]]] MYNAME = %(obs)sy-mc%(obs)sface MYFILE = /path/to/run%(run)03d/%(obs)s Then task ``model_run2_ship`` would get the following custom environment variables: .. code-block:: bash # In a job script of an instance of the "model_run2_ship" task: export MYNAME MYFILE MYNAME=shipy-mcshipface MYFILE=/path/to/run002/ship Selecting Specific Parameter Values ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Specific parameter values can be singled out in the graph and under ``[runtime]`` with the notation ```` (for example). Here's how to make a special task trigger off just the first of a set of model runs: .. code-block:: cylc [cylc] [[parameters]] run = 1..5 [scheduling] [[dependencies]] graph = """ model => post_proc # general case model => check_first_run """ # special case [runtime] [[model]] # config for all "model" runs... [[model]] # special config (if any) for the first model run... #... Selecting Partial Parameter Ranges ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The parameter notation does not currently support partial range selection such as ``foo``, but you can achieve the same result by defining a second parameter that covers the partial range and giving it the same expansion template as the full-range parameter. For example: .. code-block:: cylc [cylc] [[parameters]] run = 1..10 # 1, 2, ..., 10 runx = 1..3 # 1, 2, 3 [[parameter templates]] run = _R%(run)02d # _R01, _R02, ..., _R10 runx = _R%(runx)02d # _R01, _R02, _R03 [scheduling] [[dependencies]] graph = """model => post model => checkx""" [runtime] [[model]] # ... #... Parameter Offsets In The Graph ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ A negative offset notation ```` is interpreted as the previous value in the ordered list of parameter values, while a positive offset is interpreted as the next value. For example, to split a model run into multiple steps with each step depending on the previous one, either of these graphs: .. code-block:: cylc graph = "model => model" # for run = 1, 2, 3 graph = "model => model" # for run = 1, 2, 3 expands to: .. code-block:: cylc graph = """model_run1 => model_run2 model_run2 => model_run3""" # or equivalently: graph = "model_run1 => model_run2 => model_run3" And this graph: .. code-block:: cylc graph = "proc => proc" # for size = small, big, huge expands to: .. code-block:: cylc graph = """proc_small => proc_big proc_big => proc_huge""" # or equivalently: graph = "proc_small => proc_big => proc_huge" However, a quirk in the current system means that you should avoid mixing conditional logic in these statements. For example, the following will do the unexpected: .. code-block:: cylc graph = foo & baz => foo # for m = cat, dog currently expands to: .. code-block:: cylc graph = foo_cat & baz => foo_dog # when users may expect it to be: # graph = foo_cat => foo_dog # graph = baz => foo_cat & foo_dog For the time being, writing out the logic explicitly will give you the correct graph. .. code-block:: cylc graph = """foo => foo # for m = cat, dog baz => foo""" Task Families And Parameterization ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Task family members can be generated by parameter expansion: .. code-block:: cylc [runtime] [[FAM]] [[member]] inherit = FAM # Result: family FAM contains member_r1, member_r2, etc. Family names can be parameterized too, just like task names: .. code-block:: cylc [runtime] [[RUN]] [[model]] inherit = RUN [[post_proc]] inherit = RUN # Result: family RUN_r1 contains model_r1 and post_proc_r1, # family RUN_r2 contains model_r2 and post_proc_r1, etc. As described in :ref:`FamilyTriggers` family names can be used to trigger all members at once: .. code-block:: cylc graph = "foo => FAMILY" or to trigger off all members: .. code-block:: cylc graph = "FAMILY:succeed-all => bar" or to trigger off any members: .. code-block:: cylc graph = "FAMILY:succeed-any => bar" If the members of ``FAMILY`` were generated with parameters, you can also trigger them all at once with parameter notation: .. code-block:: cylc graph = "foo => member" Similarly, to trigger off all members: .. code-block:: cylc graph = "member => bar" # (member:fail etc., for other trigger types) Family names are still needed in the graph, however, to succinctly express "succeed-any" triggering semantics, and all-to-all or any-to-all triggering: .. code-block:: cylc graph = "FAM1:succeed-any => FAM2" (Direct all-to-all and any-to-all family triggering is not recommended for efficiency reasons though - see :ref:`EfficientInterFamilyTriggering`). For family *member-to-member* triggering use parameterized members. For example, if family ``OBS_GET`` has members ``get`` and family ``OBS_PROC`` has members ``proc`` then this graph: .. code-block:: cylc graph = "get => proc" # for obs = ship, buoy, plane expands to: .. code-block:: none get_ship => proc_ship get_buoy => proc_buoy get_plane => proc_plane .. _Parameterized Cycling: Parameterized Cycling ^^^^^^^^^^^^^^^^^^^^^ Two ways of constructing cycling systems are described and contrasted in :ref:`Workflows For Cycling Systems`. For most purposes use of a proper *cycling workflow* is recommended, wherein cylc incrementally generates the date-time sequence and extends the workflow, potentially indefinitely, at run time. For smaller systems of finite duration, however, parameter expansion can be used to generate a sequence of pre-defined tasks as a proxy for cycling. Here's a cycling workflow of two-monthly model runs for one year, with previous-instance model dependence (e.g. for model restart files): .. code-block:: cylc [scheduling] initial cycle point = 2020-01 final cycle point = 2020-12 [[dependencies]] [[[R1]]] # Run once, at the initial point. graph = "prep => model" [[[P2M]]] # Run at 2-month intervals between the initial and final points. graph = "model[-P2M] => model => post_proc & archive" [runtime] [[model]] script = "run-model $CYLC_TASK_CYCLE_POINT" And here's how to do the same thing with parameterized tasks: .. code-block:: cylc [cylc] [[parameters]] chunk = 1..6 [scheduling] [[dependencies]] graph = """prep => model model => model => post_proc & archive""" [runtime] [[model]] script = """ # Compute start date from chunk index and interval, then run the model. INITIAL_POINT=2020-01 INTERVAL_MONTHS=2 OFFSET_MONTHS=(( (CYLC_TASK_PARAM_chunk - 1)*INTERVAL_MONTHS )) OFFSET=P${OFFSET_MONTHS}M # e.g. P4M for chunk=3 run-model $(cylc cyclepoint --offset=$OFFSET $INITIAL_POINT)""" The two workflows are shown together in :numref:`fig-eg2`. They both achieve the same result, and both can include special tasks at the start, end, or anywhere in between. But as noted earlier the parameterized version has several disadvantages: it must be finite in extent and not too large; the date-time arithmetic has to be done by the user; and the full extent of the workflow will be visible at all times as the suite runs. .. todo:: Create sub-figures if possible: for now hacked as separate figures with link and caption on final displayed figure. .. figure:: graphics/png/orig/eg2-static.png :align: center .. _fig-eg2: .. figure:: graphics/png/orig/eg2-dynamic.png :align: center Parameterized (top) and cycling (bottom) versions of the same workflow. The first three cycle points are shown in the cycling case. The parameterized case does not have "cycle points". Here's a yearly-cycling suite with four parameterized chunks in each cycle point: .. code-block:: cylc [cylc] [[parameters]] chunk = 1..4 [scheduling] initial cycle point = 2020-01 [[dependencies]] [[[P1Y]]] graph = """model => model model[-P1Y] => model""" .. note:: The inter-cycle trigger connects the first chunk in each cycle point to the last chunk in the previous cycle point. Of course it would be simpler to just use 3-monthly cycling: .. code-block:: cylc [scheduling] initial cycle point = 2020-01 [[dependencies]] [[[P3M]]] graph = "model[-P3M] => model" Here's a possible valid use-case for mixed cycling: consider a portable date-time cycling workflow of model jobs that can each take too long to run on some supported platforms. This could be handled without changing the cycling structure of the suite by splitting the run (at each cycle point) into a variable number of shorter steps, using more steps on less powerful hosts. Cycle Point And Parameter Offsets At Start-Up """"""""""""""""""""""""""""""""""""""""""""" In cycling workflows cylc ignores anything earlier than the suite initial cycle point. So this graph: .. code-block:: cylc graph = "model[-P1D] => model" simplifies at the initial cycle point to this: .. code-block:: cylc graph = "model" Similarly, parameter offsets are ignored if they extend beyond the start of the parameter value list. So this graph: .. code-block:: cylc graph = "model => model" simplifies for ``chunk=1`` to this: .. code-block:: cylc graph = "model_chunk1" .. note:: The initial cut-off applies to every parameter list, but only to cycle point sequences that start at the suite initial cycle point. Therefore it may be somewhat easier to use parameterized cycling if you need multiple date-time sequences *with different start points* in the same suite. We plan to allow this sequence-start simplification for any date-time sequence in the future, not just at the suite initial point, but it needs to be optional because delayed-start cycling tasks sometimes need to trigger off earlier cycling tasks. .. _Jinja: Jinja2 ------ .. note:: This section needs to be revised - the Parameterized Task feature introduced in cylc-6.11.0 (see :ref:`Parameterized Tasks Label`) provides a cleaner way to auto-generate tasks without coding messy Jinja2 loops. Cylc has built in support for the Jinja2 template processor in suite configurations. Jinja2 variables, mathematical expressions, loop control structures, conditional logic, etc., are automatically processed to generate the final suite configuration seen by cylc. The need for Jinja2 processing must be declared with a hash-bang comment as the first line of the suite.rc file: .. code-block:: cylc #!jinja2 # ... Potential uses for this include automatic generation of repeated groups of similar tasks and dependencies, and inclusion or exclusion of entire suite sections according to the value of a single flag. Consider a large complicated operational suite and several related parallel test suites with slightly different task content and structure (the parallel suites, for instance, might take certain large input files from the operation or the archive rather than downloading them again) - these can now be maintained as a single master suite configuration that reconfigures itself according to the value of a flag variable indicating the intended use. Template processing is the first thing done on parsing a suite configuration so Jinja2 expressions can appear anywhere in the file (inside strings and namespace headings, for example). Jinja2 is `well documented `_, so here we just provide an example suite that uses it. The meaning of the embedded Jinja2 code should be reasonably self-evident to anyone familiar with standard programming techniques. .. _fig-jinja2-ensemble: .. figure:: graphics/png/orig/jinja2-ensemble-graph.png :align: center The Jinja2 ensemble example suite graph. The ``jinja2.ensemble`` example, graphed in :numref:`fig-jinja2-ensemble`, shows an ensemble of similar tasks generated using Jinja2: .. code-block:: cylc #!jinja2 {% set N_MEMBERS = 5 %} [scheduling] [[dependencies]] graph = """{# generate ensemble dependencies #} {% for I in range( 0, N_MEMBERS ) %} foo => mem_{{ I }} => post_{{ I }} => bar {% endfor %}""" Here is the generated suite configuration, after Jinja2 processing: .. code-block:: cylc #!jinja2 [scheduling] [[dependencies]] graph = """ foo => mem_0 => post_0 => bar foo => mem_1 => post_1 => bar foo => mem_2 => post_2 => bar foo => mem_3 => post_3 => bar foo => mem_4 => post_4 => bar """ And finally, the ``jinja2.cities`` example uses variables, includes or excludes special cleanup tasks according to the value of a logical flag, and it automatically generates all dependencies and family relationships for a group of tasks that is repeated for each city in the suite. To add a new city and associated tasks and dependencies simply add the city name to list at the top of the file. The suite is graphed, with the New York City task family expanded, in :numref:`fig-jinja2-cities`. .. literalinclude:: ../../etc/examples/jinja2/cities/suite.rc :language: cylc .. _fig-jinja2-cities: .. figure:: graphics/png/orig/jinja2-suite-graph.png :align: center The Jinja2 cities example suite graph, with the New York City task family expanded. Accessing Environment Variables With Jinja2 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This functionality is not provided by Jinja2 by default, but cylc automatically imports the user environment to template's global namespace (see :ref:`CustomJinja2Filters`) in a dictionary structure called *environ*. A usage example: .. code-block:: cylc #!Jinja2 #... [runtime] [[root]] [[[environment]]] SUITE_OWNER_HOME_DIR_ON_SUITE_HOST = {{environ['HOME']}} In addition, the following variables are exported to this environment prior to configuration parsing to provide suite context: .. code-block:: bash CYLC_DEBUG # Debug mode, true or not defined CYLC_DIR # Location of cylc installation used CYLC_VERBOSE # Verbose mode, True or False CYLC_VERSION # Version of cylc installation used CYLC_SUITE_NAME # Suite name CYLC_SUITE_DEF_PATH # Location of the suite configuration # source path on suite host, # e.g. ~/cylc-run/foo CYLC_SUITE_LOG_DIR # Suite log directory. CYLC_SUITE_RUN_DIR # Location of the suite run directory in # suite host, e.g. ~/cylc-run/foo CYLC_SUITE_SHARE_DIR # Suite (or task post parsing!) # shared directory. CYLC_SUITE_WORK_DIR # Suite work directory. .. note:: The example above emphasizes that *the environment - including the suite context variables - is read on the suite host when the suite configuration is parsed*, not at task run time on job hosts. .. _CustomJinja2Filters: Custom Jinja2 Filters, Tests and Globals ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Jinja2 has three different namespaces used to separate "globals", "filters" and "tests". Globals are template-wide accessible variables and functions. Cylc extends this namespace with "environ" dictionary and "raise" and "assert" functions for raising exceptions (see :ref:`Jinja2RaisingExceptions`). Filters can be used to modify variable values and are applied using pipe notation. For example, the built-in ``trim`` filter strips leading and trailing white space from a string: .. code-block:: cylc {% set MyString = " dog " %} {{ MyString | trim() }} # "dog" Additionally, variable values can be tested using "is" keyword followed by the name of the test, e.g. ``VARIABLE is defined``. See official Jinja2 documentation for available built-in globals, filters and tests. Cylc also supports custom Jinja2 globals, filters and tests. A custom global, filter or test is a single Python function in a source file with the same name as the function (plus ".py" extension) and stored in one of the following locations: - ``/lib/Jinja2[namespace]/`` - ``[suite configuration directory]/Jinja2[namespace]/`` - ``$HOME/.cylc/Jinja2[namespace]/`` where ``[namespace]/`` is one of ``Globals/``, ``Filters/`` or ``Tests/``. In the argument list of filter or test function, the first argument is the variable value to be "filtered" or "tested", respectively, and subsequent arguments can be whatever else is needed. Currently there are three custom filters: pad """ The "pad" filter is for padding string values to some constant length with a fill character - useful for generating task names and related values in ensemble suites: .. code-block:: cylc {% for i in range(0,100) %} # 0, 1, ..., 99 {% set j = i | pad(2,'0') %} [[A_{{j}}]] # [[A_00]], [[A_01]], ..., [[A_99]] {% endfor %} strftime """""""" The "strftime" filter can be used to format ISO8601 date-time strings using an strftime string. .. code-block:: cylc {% set START_CYCLE = '10661004T08+01' %} {{ START_CYCLE | strftime('%H') }} # 00 Examples: - ``{{START_CYCLE | strftime('%Y')}}`` - 1066 - ``{{START_CYCLE | strftime('%m')}}`` - 10 - ``{{START_CYCLE | strftime('%d')}}`` - 14 - ``{{START_CYCLE | strftime('%H:%M:%S %z')}}`` - 08:00:00 +01 It is also possible to parse non-standard date-time strings by passing a strptime string as the second argument. Examples: - ``{{'12,30,2000' | strftime('%m', '%m,%d,%Y')}}`` - 12 - ``{{'1066/10/14 08:00:00' | strftime('%Y%m%dT%H', '%Y/%m/%d %H:%M:%S')}}`` - 10661014T08 duration\_as """""""""""" The "duration\_as" filter can be used to format ISO8601 duration strings as a floating-point number of several different units. Units for the conversion can be specified in a case-insensitive short or long form: - Seconds - "s" or "seconds" - Minutes - "m" or "minutes" - Hours - "h" or "hours" - Days - "d" or "days" - Weeks - "w" or "weeks" Within the suite, this becomes: .. code-block:: cylc {% set CYCLE_INTERVAL = 'PT1D' %} {{ CYCLE_INTERVAL | duration_as('h') }} # 24.0 {% set CYCLE_SUBINTERVAL = 'PT30M' %} {{ CYCLE_SUBINTERVAL | duration_as('hours') }} # 0.5 {% set CYCLE_INTERVAL = 'PT1D' %} {{ CYCLE_INTERVAL | duration_as('s') }} # 86400.0 {% set CYCLE_SUBINTERVAL = 'PT30M' %} {{ CYCLE_SUBINTERVAL | duration_as('seconds') }} # 1800.0 While the filtered value is a floating-point number, it is often required to supply an integer to suite entities (e.g. environment variables) that require it. This is accomplished by chaining filters: - ``{{CYCLE_INTERVAL | duration_as('h') | int}}`` - 24 - ``{{CYCLE_SUBINTERVAL | duration_as('h') | int}}`` - 0 - ``{{CYCLE_INTERVAL | duration_as('s') | int}}`` - 86400 - ``{{CYCLE_SUBINTERVAL | duration_as('s') | int}}`` - 1800 Associative Arrays In Jinja2 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Associative arrays (*dicts* in Python) can be very useful. Here's an example, from ``/etc/examples/jinja2/dict``: .. code-block:: cylc #!Jinja2 {% set obs_types = ['airs', 'iasi'] %} {% set resource = { 'airs':'ncpus=9', 'iasi':'ncpus=20' } %} [scheduling] [[dependencies]] graph = OBS [runtime] [[OBS]] [[[job]]] batch system = pbs {% for i in obs_types %} [[ {{i}} ]] inherit = OBS [[[directives]]] -I = {{ resource[i] }} {% endfor %} Here's the result: .. code-block:: bash $ cylc get-suite-config -i [runtime][airs]directives SUITE -I = ncpus=9 Jinja2 Default Values And Template Inputs ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The values of Jinja2 variables can be passed in from the cylc command line rather than hardwired in the suite configuration. Here's an example, from ``/etc/examples/jinja2/defaults``: .. code-block:: cylc #!Jinja2 [meta] title = "Jinja2 example: use of defaults and external input" description = """ The template variable FIRST_TASK must be given on the cylc command line using --set or --set-file=FILE; two other variables, LAST_TASK and N_MEMBERS can be set similarly, but if not they have default values.""" {% set LAST_TASK = LAST_TASK | default( 'baz' ) %} {% set N_MEMBERS = N_MEMBERS | default( 3 ) | int %} {# input of FIRST_TASK is required - no default #} [scheduling] initial cycle point = 20100808T00 final cycle point = 20100816T00 [[dependencies]] [[[0]]] graph = """{{ FIRST_TASK }} => ENS ENS:succeed-all => {{ LAST_TASK }}""" [runtime] [[ENS]] {% for I in range( 0, N_MEMBERS ) %} [[ mem_{{ I }} ]] inherit = ENS {% endfor %} Here's the result: .. code-block:: bash $ cylc list SUITE Jinja2 Template Error 'FIRST_TASK' is undefined cylc-list foo failed: 1 $ cylc list --set FIRST_TASK=bob foo bob baz mem_2 mem_1 mem_0 $ cylc list --set FIRST_TASK=bob --set LAST_TASK=alice foo bob alice mem_2 mem_1 mem_0 $ cylc list --set FIRST_TASK=bob --set N_MEMBERS=10 foo mem_9 mem_8 mem_7 mem_6 mem_5 mem_4 mem_3 mem_2 mem_1 mem_0 baz bob Note also that ``cylc view --set FIRST_TASK=bob --jinja2 SUITE`` will show the suite with the Jinja2 variables as set. .. note:: Suites started with template variables set on the command line will *restart* with the same settings. However, you can set them again on the ``cylc restart`` command line if they need to be overridden. Jinja2 Variable Scope ^^^^^^^^^^^^^^^^^^^^^ Jinja2 variable scoping rules may be surprising. Variables set inside a *for loop* block, for instance, are not accessible outside of the block, so the following will print ``# FOO is 0``, not ``# FOO is 9``: .. code-block:: cylc {% set FOO = false %} {% for item in items %} {% if item.check_something() %} {% set FOO = true %} {% endif %} {% endfor %} # FOO is {{FOO}} Jinja2 documentation suggests using alternative constructs like the loop else block or the special ``loop`` variable. More complex use cases can be handled using ``namespace`` objects which allow propagating of changes across scopes: .. code-block:: cylc {% set ns = namespace(foo=false) %} {% for item in items %} {% if item.check_something() %} {% set ns.foo = true %} {% endif %} {% endfor %} # FOO is {{ns.foo}} For detail, see `Jinja2 Template Designer Documentation \> Assignments `_ .. _Jinja2RaisingExceptions: Raising Exceptions ^^^^^^^^^^^^^^^^^^ Cylc provides two functions for raising exceptions using Jinja2. These exceptions are raised when the suite.rc file is loaded and will prevent a suite from running. .. note:: These functions must be contained within ``{{`` Jinja2 blocks as opposed to ``{%`` blocks. Raise """"" The "raise" function will result in an error containing the provided text. .. code-block:: cylc {% if not VARIABLE is defined %} {{ raise('VARIABLE must be defined for this suite.') }} {% endif %} Assert """""" The "assert" function will raise an exception containing the text provided in the second argument providing that the first argument evaluates as False. The following example is equivalent to the "raise" example above. .. code-block:: cylc {{ assert(VARIABLE is defined, 'VARIABLE must be defined for this suite.') }} Importing additional Python modules ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Jinja2 allows to gather variable and macro definitions in a separate template that can be imported into (and thus shared among) other templates. .. code-block:: cylc {% import "suite-utils.rc" as utils %} {% from "suite-utils.rc" import VARIABLE as ALIAS %} {{ utils.VARIABLE is equalto(ALIAS)) }} Cylc extends this functionality to allow import of arbitrary Python modules. .. code-block:: cylc {% from "itertools" import product %} [runtime] {% for group, member in product(['a', 'b'], [0, 1, 2]) %} [[{{group}}_{{member}}]] {% endfor %} For better clarity and disambiguation Python modules can be prefixed with ``__python__``: .. code-block:: cylc {% from "__python__.itertools" import product %} .. _EmPylabel: EmPy ---- In addition to Jinja2, Cylc supports EmPy template processor in suite configurations. Similarly to Jinja2, EmPy provides variables, mathematical expressions, loop control structures, conditional logic, etc., that are expanded to generate the final suite configuration seen by Cylc. See the `EmPy documentation `_ for more details on its templating features and how to use them. .. note:: EmPy is not bundled with Cylc and must be installed separately. It should be available to Python through standard ``import em``. Please also note that there is another Python package called "em" that provides a conflicting module of the same name. You can run ``cylc check-software`` command to check your installation. The need for EmPy processing must be declared with a hash-bang comment as the first line of the suite.rc file: .. code-block:: cylc #!empy # ... An example suite ``empy.cities`` demonstrating its use is shown below. It is a translation of ``jinja2.cities`` example from :ref:`Jinja` and can be directly compared against it. .. literalinclude:: ../../etc/examples/empy/cities/suite.rc :language: cylc For basic usage the difference between Jinja2 and EmPy amounts to a different markup syntax with little else to distinguish them. EmPy might be preferable, however, in cases where more complicated processing logic have to be implemented. EmPy is a system for embedding Python expressions and statements in template text. It makes the full power of Python language and its ecosystem easily accessible from within the template. This might be desirable for several reasons: - no need to learn different language and its idiosyncrasies just for writing template logic - availability of lambda functions, list and dictionary comprehensions can make template code smaller and more readable compared to Jinja2 - natural and straightforward integration with Python package ecosystem - no two-language barrier between writing template logic and processing extensions makes it easier to refactor and maintain the template code as its complexity grows - inline pieces of Python code can be gathered into subroutines and eventually into separate modules and packages in a seamless manner. Omitting Tasks At Runtime ------------------------- It is sometimes convenient to omit certain tasks from the suite at runtime without actually deleting their definitions from the suite. Defining ``[runtime]`` properties for tasks that do not appear in the suite graph results in verbose-mode validation warnings that the tasks are disabled. They cannot be used because the suite graph is what defines their dependencies and valid cycle points. Nevertheless, it is legal to leave these orphaned runtime sections in the suite configuration because it allows you to temporarily remove tasks from the suite by simply commenting them out of the graph. To omit a task from the suite at runtime but still leave it fully defined and available for use (by insertion or ``cylc submit``) use one or both of ``[scheduling][[special task]]`` lists, *include at start-up* or *exclude at start-up* (documented in :ref:`IASU` and :ref:`EASU`). Then the graph still defines the validity of the tasks and their dependencies, but they are not actually loaded into the suite at start-up. Other tasks that depend on the omitted ones, if any, will have to wait on their insertion at a later time or otherwise be triggered manually. Finally, with Jinja2 (:ref:`Jinja`) you can radically alter suite structure by including or excluding tasks from the ``[scheduling]`` and ``[runtime]`` sections according to the value of a single logical flag defined at the top of the suite. Naked Dummy Tasks And Strict Validation --------------------------------------- A *naked dummy task* appears in the suite graph but has no explicit runtime configuration section. Such tasks automatically inherit the default "dummy task" configuration from the root namespace. This is very useful because it allows functional suites to be mocked up quickly for test and demonstration purposes by simply defining the graph. It is somewhat dangerous, however, because there is no way to distinguish an intentional naked dummy task from one generated by typographic error: misspelling a task name in the graph results in a new naked dummy task replacing the intended task in the affected trigger expression; and misspelling a task name in a runtime section heading results in the intended task becoming a dummy task itself (by divorcing it from its intended runtime config section). To avoid this problem any dummy task used in a real suite should not be naked - i.e. it should have an explicit entry in under the runtime section of the suite configuration, even if the section is empty. This results in exactly the same dummy task behaviour, via implicit inheritance from root, but it allows use of ``cylc validate --strict`` to catch errors in task names by failing the suite if any naked dummy tasks are detected. .. [1] An OR operator on the right doesn't make much sense: if "B or C" triggers off A, what exactly should cylc do when A finishes? .. [2] In NWP forecast analysis suites parts of the observation processing and data assimilation subsystem will typically also depend on model background fields generated by the previous forecast.