Remote Job Management

Managing tasks in a workflow requires more than just job execution: Cylc performs additional actions with rsync for file transfer, and direct execution of cylc sub-commands over non-interactive SSH.

SSH-free Job Management?

Some sites may want to restrict access to job hosts by whitelisting SSH connections to allow only rsync for file transfer, and allowing job execution only via a local job runner that sees the job hosts [1] . We are investigating the feasibility of SSH-free job management when a local job runner is available, but this is not yet possible unless your workflow and job hosts also share a filesystem, which allows Cylc to treat jobs as entirely local [2] .

SSH-based Job Management

Cylc does not have persistent agent processes running on job hosts to act on instructions received over the network [3] so instead we execute job management commands directly on job hosts over SSH. Reasons for this include:

  • It works equally for job runner and background jobs.

  • SSH is required for background jobs, and for jobs in other job runners if the job runner is not available on the workflow host.

  • Querying the job runner alone is not sufficient for full job polling functionality.

    • This is because jobs can complete (and then be forgotten by the job runner) while the network, workflow host, or scheduler is down (e.g. between workflow shutdown and restart).

    • To handle this we get the automatic job wrapper code to write job messages and exit status to job status files that are interrogated by schedulers during job polling operations.

    • Job status files reside on the job host, so the interrogation is done over SSH.

  • Job status files also hold job runner name and job ID; this is written by the job submit command, and read by job poll and kill commands

Other Cases Where Cylc Uses SSH Directly

  • To see if a workflow is running on another host with a shared filesystem - see cylc/flow/workflow_files:detect_old_contact_file.