Concurrent Flows
New in version 8.0.0.
In Cylc, a flow is a self-propagating run through the workflow graph from some initial task(s).
At start-up the scheduler automatically launches a flow from the start
of the graph. But you can use cylc trigger --flow=new ID
to start additional
flows anywhere in the graph, while the original (and any other) flow still
runs. New flows continue on from triggered tasks as dictated by the graph.
When a flow advances to a new task in the graph, the task will only run if it did not already run in the same flow.
See below for suggested use cases, and an example, of this capability.
Note
Flows merge where (and if) tasks collide in the n=0
active window. Downstream of a merge, tasks are considered to belong
to all constituent flows.
Flow Numbers
Flows are identified by numbers passed down from parent task to child task in the graph.
Flow number 1
is triggered automatically by cylc play
at scheduler
start-up. The next flow started by manual triggering
gets the number 2
, then 3
, and so on.
Tasks can carry multiple flow numbers as a result of flow merging.
Note
Flow numbers are not yet exposed in the UI, but they are logged with task events in the scheduler log.
Triggering & Flows
By default, manual triggering (with cylc trigger
or the UI) starts a new
front of activity in current flows.
But it can also start new flows and trigger flow-independent single tasks.
In the diagrams below, the grey tasks run in the original flow (1
), and the
blue ones run as a result of a manual triggering event. They may be triggered
as part of flow 1
, or as a new flow 2
, or with no flow number.
- Triggering in Current Flows
cylc trigger [--wait] ID
This is the default action. The triggered task gets all current active flow numbers. Subsequently, each of those flows will consider the task to have run already.
Ahead of active flows this starts a new front of activity for the existing flows, which by default can continue on without waiting for them to catch up:
With
--wait
, action downstream of the triggered task is delayed until the first flow catches up:Behind active flows the triggered task will run, but nothing more will happen if any of the original flows already passed by there:
- Triggering in Specific Flows
cylc trigger --flow=1,2 ID
This triggers the task with flow numbers
1
and2
.The result is like the default above, except that tasks in the new front belong only to the specified flow(s), regardless of which flows are active at triggering time.
- Triggering a New Flow
cylc trigger --flow=new ID
This triggers the task with a new, incremented flow number.
The new flow will re-run tasks that already ran in previous flows:
- Triggering a Flow-Independent Single Task
cylc trigger --flow=none ID
This triggers a task with no flow numbers.
It will not spawn children, and other flows that come by will re-run it.
- Special Case: Triggering
n=0
Tasks Tasks in the
n=0
window are active, active-waiting, or incomplete. Their flow membership is already determined - that of the parent tasks that spawned them.Triggering an active task has no effect (it is already triggered).
Triggering an active-waiting task queues it to run in the same flow.
Triggering an incomplete task queues it to re-run in the same flow.
Flow Merging in n=0
If a task spawning into the n=0
window finds another instance
of itself already there (i.e., same name and cycle point, different flow
number) a single instance will carry both (sets of) flow numbers forward from
that point. Downstream tasks belong to both flows.
Flow merging in n=0
means flows are not completely independent. One flow
might not be able to entirely overtake another because one or more of its tasks
might merge in n=0
. Merging is necessary while task IDs - and associated
log directory paths etc. - do not incorporate flow numbers, because task IDs
must be unique in the active task pool.
Merging with Incomplete tasks
Incomplete tasks are retained in the active window in expectation of retriggering to complete required outputs and continue their flow.
If another flow encounters an incomplete task (i.e. if another instance of the
same task collides with it in the n=0
active window) the task will
run again and carry both flow numbers forward if it successfully completes its
required outputs.
Stopping Flows
By default, cylc stop
halts the workflow and shuts the scheduler down.
It can also stop specific flows: cylc stop --flow=N
removes the flow number
N
from tasks in the active task pool. Tasks that have no flow
numbers left as a result do not spawn children at all. If there are no active
flows left, the scheduler shuts down.
Some Use Cases
- Running Tasks Ahead of Time
To run a task within the existing flow(s) even though its prerequisites are not yet satisfied, just trigger it. Use
--wait
if you don’t want the new flow front to continue immediately. Triggered task(s) will not re-run when the main front catches up.- Regenerating Outputs Behind a Flow
To re-run a sub-graph (e.g. because the original run was affected by a corrupt file), just trigger the task(s) at the top of the sub-graph with
--flow=new
.You may need to manually stop the new flow if it leads into the main trunk of the graph, and you do not want it to carry on indefinitely.
- Rewinding a Workflow
To rewind the workflow to an earlier point, perhaps to regenerate data and/or allow the workflow to evolve a new path into the future, trigger a new flow at the right place and then stop the original flow.
- Test-running Tasks in a Live Workflow
You can trigger individual tasks as many times as you like with
--flow=none
, without affecting the workflow. The task submit number will increment each time.- Processing Flow-Specific Data?
Flow numbers are passed to job environments, so it is possible for tasks to process flow-specific data. Every task would have to be capable of processing multiple datasets at once, however, in case of flow merging. Generally, you should use cycling for this kind of use case.
Example: Rerun a Sub-Graph
The following cycling workflow runs a task called model
in
every cycle, followed by a postprocessing task, two product-generating tasks,
and finally a task that publishes results for the cycle point:
[scheduling]
cycling mode = integer
initial cycle point = 1
[[graph]]
P1 = model[-P1] => model => post => prod1 & prod2 => publish
Let’s say the workflow has run to cycle 8, but we have just noticed that a corrupted ancillary file resulted in bad products at cycle 5.
To rectify this we could fix the corrupted file and trigger a new flow at
5/post
:
cylc trigger --flow=new <workflow_id>//5/post
The new flow will regenerate and republish cycle 5 products before naturally coming to a halt, because the triggered tasks do not feed into the next cycle.
Meanwhile, the original flow will carry on unaffected, from cycle point 8.