cylc.flow.main_loop.auto_restart
Automatically restart workflows if they are running on bad servers.
Loads in the global configuration to check if the server a workflow is running
on is listed in global.cylc[scheduler][run hosts]condemned
.
This is useful if a host needs to be taken off-line e.g. for scheduled maintenance.
This functionality is configured via the following site configuration settings:
The auto stop-restart feature has two modes:
- Normal Mode
When a host is added to the
[scheduler][run hosts]condemned
list, any workflows running on that host will automatically shutdown then restart selecting a new host from[scheduler][run hosts]available
.For safety, before attempting to stop the workflow Cylc will first wait for any jobs running locally (under background or at) to complete.
In order for Cylc to be able to restart workflows the
[scheduler][run hosts]available
hosts must all be on a shared filesystem.- Force Mode
If a host is suffixed with an exclamation mark then Cylc will not attempt to automatically restart the workflow and any local jobs (running under background or at) will be left running.
For example in the following configuration any workflows running on
foo
will attempt to restart on pub
whereas any workflows
running on bar
will stop immediately, making no attempt to restart.
[scheduler]
[[run hosts]]
available = pub
condemned = foo, bar!
Warning
Cylc will reject hosts with ambiguous names such as localhost
or
127.0.0.1
for this configuration as
[scheduler][run hosts]condemned
are evaluated on the workflow host server.
To prevent large numbers of workflows attempting to restart simultaneously the
[scheduler]auto restart delay
setting defines a period
of time in seconds.
Workflows will wait for a random period of time between zero and
[scheduler]auto restart delay
seconds before
attempting to stop and restart.
Workflows that are started up in no-detach mode cannot auto stop-restart on a different host - as it will still end up attached to the condemned host. Therefore, a workflow in no-detach mode running on a condemned host will abort with a non-zero return code. The parent process should manually handle the restart of the workflow if desired.
Python API
Coroutines
Automatically restart the workflow if configured to do so. |