cylc-admin

Proposal for Authentication between the CLI & the Workflow Service

Acronyms used:

The Problem

Introduction

We need authentication between the CLI & the WFS in Cylc 8.

(NB: we also need security for network communications to & from the WFS, & though that is not the domain of this proposal, approaches or libraries or protocols (etc.) used for this may also be relevant, as outlined below.)

Client cases

There are various client cases to cover, ultimately, some of which we will treat in the same way, & some differently (see the next section for requirements):

  1. User (interactive) CLI => WFS, for user commands:
    1. what I will call “the direct, or full-privilege full-access (FPFA), case”:
      • user is logged in (authenticated);
      • user is the workflow owner;
      • user has full access to the file system where the WFS is running;
      • the WFS is running as this user.
    2. the user CLI where any of the conditions in (1i) are not satisfied, as I will call the “non-direct or non-FPFA case”.
  2. Task-job CLI <=> WFS, for job status messaging etc.:
    1. remote task-jobs that are on a non-shared file system;
    2. any task-jobs not covered by (2i), i.e. local ones, or ones running remotely on a host where the file system is shared.
  3. Involving the UIS:
    1. UIS <=> WFS:
      • This should be in scope, since this case should be completely equivalent to the ‘CLI <=> WFS’ case, i.e. to the cases in (1) (see Q2 under ‘Open Questions’)?
    2. CLI => UIS:
      • This is out-of-scope.

Token-based approaches

Discussions on the topic have converged (see e.g. here & here towards using some form of non-permanent (transient) token to achieve this. These forms have been distinguished as options:

The old (Cylc 7) approach

The approach for authentication between the CLI & the suite server program (name for the Cylc 7 WFS equivalent) is described here. It is ultimately a token written out on-disk in plain text, that relies on Linux file permissions to be secure & only usable by the suite owner.

Note the phrase “passphrase” is used historically, but it is not actually a user-specified mnemonic passphrase (though users can override it with one, but we believe this is done rarely), it is a random token that is generated on the first run of a suite. Since the token (“passphrase”) is valid for the lifetime of that suite, it is effectively a one-time suite-level token, where the creation event is the first suite run & the deletion event is the deletion of the suite (or its .service directory).

Relevant differences from Cylc 7 to 8

There are aspects of the Cylc 7 authentication we have decided, at the least, to change, or which otherwise necessitate changes, as follows:

Aims: what do we want?

We want to provide & solution that will, by means of, & on top of, being functional:

Open questions

On client cases (c.f. ‘Client cases’ section above)

  1. What cases shall we manage through the same approach?
  2. What is in scope for authentication involving the UIS (see cases under 3)?

On token choice (c.f. ‘Token-based approaches’ section above)

  1. What token-based approach should be used in each client case group as in Q1?

On token management

  1. For timed tokens: how do we deal with the changeover of tokens? E.g:
    • will both tokens be valid over the changeover period?
  2. For one-time tokens: what events do we set as those for which one-time tokens are created & then deleted?
    • We should consider the granularity, e.g. to make sure there will not be a performance overhead from too much interaction with the filesystem etc., but not so coarse-grained as to compromise security. How long-lived should the events be, along the scale of covering the duration of a whole:
      • workflow (as in Cylc 7)?
      • cycle-point?
      • task (i.e. for all of its task-jobs)?
      • job;
      • set of defined (workflow-based) commands?
      • (single) command?

On the nature & location of the tokens

  1. What algorithms &/or standards to use (see also Q9)? Notably for:
    • tokenisation (signing, verification, etc.);
    • encryption, if used;
  2. How shall we provide any related information required to identify jobs:
    • incorporate it into the token?
    • just have it alongside the token (exposing it, but does that matter)?
  3. Which location shall we use to store the token(s)?
    • Somewhere under each workflow’s .service as before?

On implementation

  1. What module(s) to use (see also Q6). Should it/they be:
    • Python built-in? Notably e.g:
      • hashlib: “a common interface to many different secure hash and message digest algorithms”;
      • secrets: “used for generating cryptographically strong random numbers”.
    • Third-party? Notably e.g:
      • python-jose: “A JOSE [JavaScript Object Signing and Encryption technologies: JSON Web Signature (JWS), JSON Web Encryption (JWE), JSON Web Key (JWK), and JSON Web Algorithms (JWA)] implementation in Python”
      • pyjwt: “JSON Web Token implementation in Python”;
      • PyOTP: “a Python library for generating and verifying one-time passwords”.

Working Proposal for a Solution [on hold, see note]

This is the current status of the plan to address defined aspects (else stated as ‘out of scope’ meaning to be addressed in the future, after the rest) for the problem outlined in ‘The Problem’ section above.

Timeline of major updates to the working proposal

Case-by-case outline

For command-to-workflow-service (CLI-to-WFS) authentication, use:

(The numbers below refer to the cases outlined in the ‘Client cases’ section above, so please cross-reference with that.)

Questions addressed & remaining

Overall, from the above case outline, for the questions in ‘Open Questions’, we have:

Plus a new question of:

Case-by-case UML Sequence Diagrams

Key for interaction arrow colours:

“One-command” (one-time for a single command) tokens, the “plan A” for cases (1i), (2ii) & (3i)

This diagram outlines the interactions for the creation & deletion, & for the usage, of valid one-command tokens:

Authentication: CLI => WFS

“One-job” (one-time over the lifetime of a single job) tokens for case (2i)

This diagram outlines the interactions for the creation & deletion, & for the usage, of valid one-job tokens:

Authentication: WFS <=> Remote Job

Note on Status of the Working Proposal

Note that this proposal represents the state of plans preceding discussions on an approach based on public & private keys (asymmetric cryptography) for WFS network communications security which could also be viable solution here: see the corresponding Issue for details (& perhaps the comment here).

Consequently, we have agreed to postpone this work until we have undertaken some investigations there & know more about the potential of CurveZMQ as an approach for (some) CLI <=> WFS authentication.

Therefore this document represents a plan that will be re-considered & perhaps went forward with depending on the above, but is currently “on hold”.