(IN DISCUSSION: NOT ACCEPTED YET)
Users need to be able to intervene and affect the progression of workflows.
Currently we only have only
cylc trigger and
cylc set-ouputs which don’t
cover all requirements. (And note the latter command does not actually set task
outputs: it just spawns downstream children with the corresponding
prerequisites set, as if the outputs had been completed).
I am proposing that we do NOT support Cylc 7 style task status reset, e.g.
failed task to
Primarily this is for provenance reasons: the run history will be clearer if a task’s status reflects what its last job did, followed by evidence that the user told the scheduler to carry on as if something else had happened.
NOTE we could choose to change the task status as well as carry on as if the forced status had been achieved naturally. However, that’s not needed to make the workflow continue in Cylc 8, so we can choose based on clarity of run history. Do we want (e.g.):
This may be a matter of opinion, but I prefer the latter.
failedoutputs are set, disable automatic retries. - DEFAULT: set all required outputs.
cylc removeobsolete (currently, incomplete tasks have to be “removed” if not re-run to completion).
Expiration is a bigger topic in its own right, due to its connection to task outputs and automatic triggering. See Task Expire Proposal
$ cylc set [OPTIONS] [TASK(S)] Command for intervening in task prerequisite, output, and completion status. Setting a prerequisite contributes to the task's readiness to run. Setting an output contributes to the task's completion, and sets the corresponding prerequisites of child tasks. Setting a future waiting task to expire allows the scheduler to forget it without running it. WARNING: nothing downstream of the task will run automatically after this. Setting a finished-but-incomplete task to expire allows the scheduler to forget it without completing its required outputs. WARNING: nothing downstream of of the incomplete outputs with run automatically after this. Setting an output also results in any implied outputs being set: - started implies submitted - succeeded, failed, and custom outputs, imply started - succeeded implies all required custom outputs - failed does not imply custom outputs [OPTIONS] --flow=INT`: flow(s) to attribute spawned tasks. Default: all active flows. If a task already spawned, merge flows. --pre=PRE: set a prerequisite to satisfied (multiple use allowed). --pre=all: set all prerequisites to satisfied. Equivalent to trigger. --out=OUT: set an output (and implied outputs) to completed (multiple use allowed). --out=all or [no-options]: set all required outputs. `--expire`: allow the scheduler to forget a task without running it, or without rerunning it if incomplete.
If a task has already run, its prerequisites have already been “used”. There’s no point in unsatisfying them.
If a task has not been spawned yet, none of its prerequisites are satisfied anyway.
Is there a case for unsatisfying the prerequisites of a waiting partially satisfied task that has not run yet?
Use cases aren’t obvious, but we should probably support this for completeness.
There’s no point in unsetting a completed output because any downstream action will have already been spawned (on demand).
I think the only exception is in a flow-wait scenario, which is pretty niche.
No need to support this?
foo failed and is incomplete, but I fixed some files externally
so the workflow can carry on as if it had succeeded.
Set the task’s
succeeded output, to:
The database will record:
failed, and with the
foo:succeededset by manual forcing
foo => post<m>
If we don’t want to run
foo again, setting
foo:succeed (for a new flow) is
more convenient than triggering all the downstream children individually.
This will spawn all
post<m> at once, with prerequisite
If a flow is not entirely self-contained, you can prime the appropriate tasks before triggering the flow, by either (whichever is more convenient):
(Note this is more than a convenient alternative to
cylc trigger because the
flow needs to progress according to the graph edges, and we don’t want to
trigger the off-low parent tasks just to provide the off-flow prerequisites).
Many tasks failed due to (say) external disk error. I want to start over at the next cycle point rather than re-run the failed tasks.
I don’t think this case is valid. (Unless I’ve misunderstood the requirement?).
If jobs were submitted already they will have already failed naturally. Otherwise, you can prevent job submission by holding the tasks, or expiring them if necessary.
I’m not sure this is valid either. Why would we need to do this?
foo:x? => bar # take this branch regardless of x or y at run time? foo:y? => baz # or: foo:fail? => bar # take this branch regardless of succeed or fail at run time? foo? => baz
foo still run when the flow reaches it, but only trigger the right
branch regardless of what outputs it completes? That amounts to “the graph is
Or do we want to pretend that
foo has run already and generated the chosen
output, but don’t flow on from there until the flow catches up?
In that case:
cylc broadcast to change the
script to simply send the chosen
output and exit.