cylc.flow.main_loop.auto_restart

Automatically restart workflows if they are running on bad servers.

Loads in the global configuration to check if the server a workflow is running on is listed in global.cylc[scheduler][run hosts]condemned.

This is useful if a host needs to be taken off-line e.g. for scheduled maintenance.

This functionality is configured via the following site configuration settings:

The auto stop-restart feature has two modes:

Normal Mode

When a host is added to the [scheduler][run hosts]condemned list, any workflows running on that host will automatically shutdown then restart selecting a new host from [scheduler][run hosts]available.

For safety, before attempting to stop the workflow Cylc will first wait for any jobs running locally (under background or at) to complete.

In order for Cylc to be able to restart workflows the [scheduler][run hosts]available hosts must all be on a shared filesystem.

Force Mode

If a host is suffixed with an exclamation mark then Cylc will not attempt to automatically restart the workflow and any local jobs (running under background or at) will be left running.

For example in the following configuration any workflows running on foo will attempt to restart on pub whereas any workflows running on bar will stop immediately, making no attempt to restart.

[scheduler]
     [[run hosts]]
         available = pub
         condemned = foo, bar!

Warning

Cylc will reject hosts with ambiguous names such as localhost or 127.0.0.1 for this configuration as [scheduler][run hosts]condemned are evaluated on the workflow host server.

To prevent large numbers of workflows attempting to restart simultaneously the [scheduler]auto restart delay setting defines a period of time in seconds. Workflows will wait for a random period of time between zero and [scheduler]auto restart delay seconds before attempting to stop and restart.

Workflows that are started up in no-detach mode cannot auto stop-restart on a different host - as it will still end up attached to the condemned host. Therefore, a workflow in no-detach mode running on a condemned host will abort with a non-zero return code. The parent process should manually handle the restart of the workflow if desired.

Python API

Coroutines

auto_restart

Automatically restart the workflow if configured to do so.

async cylc.flow.main_loop.auto_restart.auto_restart(scheduler, _)

Automatically restart the workflow if configured to do so.