(IN DISCUSSION: NOT ACCEPTED YET)
Users need to be able to intervene and affect the progression of workflows.
Currently we only have only cylc trigger
and cylc set-ouputs
which don’t
cover all requirements. (And note the latter command does not actually set task
outputs: it just spawns downstream children with the corresponding
prerequisites set, as if the outputs had been completed).
I am proposing that we do NOT support Cylc 7 style task status reset, e.g.
resetting a failed
task to succeeded
.
Primarily this is for provenance reasons: the run history will be clearer if a task’s status reflects what its last job did, followed by evidence that the user told the scheduler to carry on as if something else had happened.
NOTE we could choose to change the task status as well as carry on as if the forced status had been achieved naturally. However, that’s not needed to make the workflow continue in Cylc 8, so we can choose based on clarity of run history. Do we want (e.g.):
This may be a matter of opinion, but I prefer the latter.
succeeded
or failed
outputs are set, disable automatic retries.
- DEFAULT: set all required outputs.cylc remove
obsolete (currently, incomplete tasks have to be
“removed” if not re-run to completion).Expiration is a bigger topic in its own right, due to its connection to task outputs and automatic triggering. See Task Expire Proposal
(Proposed)
$ cylc set [OPTIONS] [TASK(S)]
Command for intervening in task prerequisite, output, and completion status.
Setting a prerequisite contributes to the task's readiness to run.
Setting an output contributes to the task's completion, and sets the
corresponding prerequisites of child tasks.
Setting a future waiting task to expire allows the scheduler to forget it
without running it. WARNING: nothing downstream of the task will run
automatically after this.
Setting a finished-but-incomplete task to expire allows the scheduler to forget
it without completing its required outputs. WARNING: nothing downstream of
of the incomplete outputs with run automatically after this.
Setting an output also results in any implied outputs being set:
- started implies submitted
- succeeded, failed, and custom outputs, imply started
- succeeded implies all required custom outputs
- failed does not imply custom outputs
[OPTIONS]
--flow=INT`: flow(s) to attribute spawned tasks. Default: all active flows.
If a task already spawned, merge flows.
--pre=PRE: set a prerequisite to satisfied (multiple use allowed).
--pre=all: set all prerequisites to satisfied. Equivalent to trigger.
--out=OUT: set an output (and implied outputs) to completed (multiple use allowed).
--out=all or [no-options]: set all required outputs.
`--expire`: allow the scheduler to forget a task without running it, or
without rerunning it if incomplete.
If a task has already run, its prerequisites have already been “used”. There’s no point in unsatisfying them.
If a task has not been spawned yet, none of its prerequisites are satisfied anyway.
Is there a case for unsatisfying the prerequisites of a waiting partially satisfied task that has not run yet?
Use cases aren’t obvious, but we should probably support this for completeness.
There’s no point in unsetting a completed output because any downstream action will have already been spawned (on demand).
I think the only exception is in a flow-wait scenario, which is pretty niche.
No need to support this?
Expire it.
E.g. foo
failed and is incomplete, but I fixed some files externally
so the workflow can carry on as if it had succeeded.
Set the task’s succeeded
output, to:
The database will record:
foo
state failed
, and with the failed
outputfoo:succeeded
set by manual forcingfoo => post<m>
If we don’t want to run foo
again, setting foo:succeed
(for a new flow) is
more convenient than triggering all the downstream children individually.
This will spawn all post<m>
at once, with prerequisite foo:succeed
satisfied.
If a flow is not entirely self-contained, you can prime the appropriate tasks before triggering the flow, by either (whichever is more convenient):
(Note this is more than a convenient alternative to cylc trigger
because the
flow needs to progress according to the graph edges, and we don’t want to
trigger the off-low parent tasks just to provide the off-flow prerequisites).
Many tasks failed due to (say) external disk error. I want to start over at the next cycle point rather than re-run the failed tasks.
cylc trigger
or cylc set
)I don’t think this case is valid. (Unless I’ve misunderstood the requirement?).
If jobs were submitted already they will have already failed naturally. Otherwise, you can prevent job submission by holding the tasks, or expiring them if necessary.
I’m not sure this is valid either. Why would we need to do this?
foo:x? => bar # take this branch regardless of x or y at run time?
foo:y? => baz
# or:
foo:fail? => bar # take this branch regardless of succeed or fail at run time?
foo? => baz
Should foo
still run when the flow reaches it, but only trigger the right
branch regardless of what outputs it completes? That amounts to “the graph is
wrong”!
Or do we want to pretend that foo
has run already and generated the chosen
output, but don’t flow on from there until the flow catches up?
In that case: cylc broadcast
to change the script
to simply send the chosen
output and exit.