cylc-admin

Cylc Web GUI & Architecture: Tentative Roadmap

(Last update: 1 Nov 2018)

DOCUMENT OUTDATED This pre-dates the dec 2018 workshop; see:

Target completion date: late 2019 (Python 2 end of life)

Architecture

We need something very much like the Jupyter Hub Architecture: a privileged hub that spawns a reverse proxy server and sub-services for suite discovery etc., and spawns UI servers (c.f. Jupyter notebooks) and potentially suite daemons, as the user.

Note below the term “reverse proxy server” may encompass full “hub” (and sub-service) functionality as well as a web proxy (but these should be separate components); and “suite daemon” == “suite server program”.

Core Principles & Motivation For The New Architecture

Enable users to run suites via the browser from any platform without requiring a Cylc installation on that platform.
Enable users to view and interact with suites on remote platforms without requiring shared filesystem or SSH access.
Remove the requirement for port scanning.
Retire the self-signed SSL certificate.
Provide a single point of access to Cylc suites running on a server, group of servers or trust zone.
Replace the on-disk plain-text passphrase authentication between the client and the web UI.

Reverse Proxy and Hub

Single point of access, to:
1. Discover and present suites, and route client requests to them
2. (this could include for task job clients - status messaging etc. especially for jobs running on remote platforms with locked-down ports)
Uses several sub-services:
1. Suite discovery (running and stopped suites) (not needed?)
2. Suite start-up (stopped suites)
3. User authentication (below)
4. Static services: “cylc graph”, “cylc review” (formerly Rose Bush), …
5. GUI server for suite status data (see H)?
Extend to access to other services: “rose config-edit”, rosie suite repository
Implementation:
1. Python web framework: Tornado, Flask?
2. Ad-hoc server or WSGI service? (under Apache, NGINX, gevent?)
3. JupyterHub
  1. similar architecture
  2. very good technical documentation
  3. solves web proxy, spawn-as-user, authentication (and discovery?) out of the box?!
  4. configurable-http-proxy supports websocket
  5. (running the hub as root, or as a less-privileged user with sudo: here)

Suite Server API

(Server “endpoint” functions called by clients)
Currently a REST API: multiple fixed and inflexible endpoints
GraphQL instead? one simple, flexible endpoint?
Do we expose the API to clients, or translate at the Reverse Proxy (or GUI Server?)?
Implementation: Python web framework: a. Ad-hoc server or WSGI service? (under Apache or NGINX or gevent?) a. Flask (+gevent?) a. Tornado (async even at Python 2; “ideal for websocket or long polling”) a. (popularity) a. Other? Django; Sanic (flask-like async); AIOHTTP
Distinguish between the “suite status API” (for the GUI - WebSocket and GraphQL an advantage here?) and the “suite control API” (for commands like cylc stop and cylc trigger - WebSocket and GraphQL less of a win, but could use for overall consistency, and get real response back from commands without polling)

GUI Server?

(collates status data from multiple suites and serve the GUI) - do we need this?
run as the user
spawned by the “hub”?
Pros:
1. present information about stopped suites in uniform way
2. no HTML templating in the suite server program
3. take much of the comms load off suite server programs?
  1. server program push changes?
  2. or scrape suite databases? (removes all comms load from the suites, but what about disk latency, e.g. if on NFS?)
Cons:
1. Complexity? - yet another component
2. suite status information less “distributed” - memory hog, for many large suites?

Communication Protocols

The WebSocket protocol may be ideal:
1. Very efficient, persistent full-duplex (and therefore server push)
2. No need for polling by the GUI?!
3. Proper quick feedback in response to client commands?
4. all the way from GUI to suite daemons? (If not, what are the implications?)
The reverse proxy means suite servers don’t necessarily have to talk HTTPS (or WebSocket), e.g.
1. Protocol Buffers and gRPC
2. ZeroMQ
3. can we ignore these options initially, and refactor later within the new architecture, only if necessary?

Authentication

For the GUI, and for client CLI commands executed by users
Current method (automatic owner passphrase on disk) won’t work for web GUI
How to plug in to site identity management? (LDAP, AD, RedHat IPA)
Session management for GUI (session tokens stored as secure cookies?)
Session management for command line clients??
Authenticate at the Reverse Proxy, not at suites?
1. Require suites to automatically trust the reverse proxy – SSL Client Certs?

Authorization

(which users can do what?)
Authenticated user information passed to suites from reverse proxy?
Simplest: text file read by suite and/or reverse proxy, mapping users to privileges
At suite server program: is user authorized for the requested action?
Reverse proxy: can user spin up a suite or service, or connect to an existing one?
(authentication by job clients - see next section)

Job-to-Suite Communications

(“job authentication” required for status messaging and other client commands)
Currently jobs authenticate automatically as the user - won’t work for web GUI
Secure one-time token provided by the suite program?
Jobs connect directly to suites, OR go via the Reverse Proxy like user clients?
1. Jobs don’t need “suite discovery” (the suite can tell its jobs its location)
2. But going via reverse proxy may still be convenient for other reasons

Suite Status Data

Underlies the following displays:
1. Detailed suite status views: dependency graph, text tree, “dot”
2. Multi-suite summary status
3. Task inheritance tree (to expand/collapse task families in all views)
The new GUI needs to be more integrated and dynamic than the old:
1. Multi-suite summary (and stopped suites) and suite views, all at once
2. Show only selected parts of a large suite (esp. for the graph view)
For efficiency, we need to rethink how suites present this data
1. Avoid sending redundant or unneeded information to the GUI
2. Clients to select just what they need: suggests GraphQL endpoint?
3. Incremental updates – can we send only what’s changed?
Lists of nodes and edges (or just edges, with full nodes at each end?)?
1. (probably can’t use a nested tree structure based on runtime inheritance - it would be useful for collapsible family views, but the dependency graph can’t be encoded in this form).
display “ghost nodes” in all views - i.e. hide the “task pool” from users
Oliver’s thought’s on data structure and GraphQL schema.
Relay: pagination with GraphQL.

GUI

Which front-end JavaScript framework? (Angular, React, Vue, D3, …?)
1. Must look good, and be efficient and responsive for very large suites
2. May/June demos suggest development won’t be as difficult as the old GUIs (but instead we have many architectural issues to contend with).
3. Should pages be served by the suites, or a “GUI server” under the reverse proxy?
4. From suite server endpoints, or suite DBs? Is latency a problem in the DB case?
5. Recent comparison
6. Typescript? Maybe not
7. Automated testing?
UI Design
1. Oliver’s UI design ideas
  - one
  - two
  - three
  - four
  - graph view
2. Sadie’s mind maps!
“Serve Everything” (GUI has no access to the filesystem)
1. Current GUI gets some information from the filesystem:
  1. Static suite visualization and validation (parses suite config file) (low priority?)
  2. Live access to job logs (uses “cylc cat-log” which reads files off disk) (low priority?)
  3. Edit run (generate new job file and present it in the user’s editor) (higher priority)

Transition Period

Users must continue to use the old GUI while we developing the new one
Evolve the old GUI along with the new suite API?
Or support the old API as-is concurrently with the new?
Or develop the cylc-8 separately on a new branch?

DECISION: develop cylc-8 separately on a long-lived dev branch
- the new system is too different to develop “in place”
- early migration to Python 3 also demands this

Packaging

proper setup.py-based installation for pip and conda
ditch bundled libraries (Jinja2 etc.)
allow different installation groups, e.g. “single user” and “full” (the works)
Bruno has made a good start on this already
DECISION: definitely needed

cylc-7 architecture

Figure 1: Current (cylc-7) architecture: simple: all clients are equal (CLI, GUI, Jobs), automatic owner authentication; but suite discovery and authentication require clients to see the file system (also port scanning for discovery) – an in-browser GUI cannot do this.

Possible new architecture

Figure 2: Proposed new architecture

Possible client-suite interaction

Figure 3: possible client-suite interaction, in the new architecture

Possible job-suite interaction

Figure 4: possible job-suite interaction, in the new architecture

Prioritization

Many of the following activities can be done concurrently if we have a big enough team. But we should prioritize the highest risks where possible.

Identify the front-end framework for the UI - not a big risk (many frameworks would do) but it is a big unknown, and we may have to spend considerable time learning the technologies
- Angular.js? (been around a while, but relatively heavy-weight)
- React.js? (by Facebook, most popular)
- Vue.js? (increasingly popular, lighter than React, more “template”-like)
- D3.js? (focused on “data driven” display of charts etc.)
- DECISION: Vue.js
  - we’ll start with Vue, unless we run into problems that suggest we should convert to React
  - we may be able to use some of D3’s libraries for graph display etc.
Python 3 Migration
- Port to Python 3 after everything is developed?
  - can keep the old GUI in the interim
- Do the Python 3 migration first
  - can develop all the new stuff in Python 3
- What new techniques can we exploit after the initial style change?
  - Asyncio for the main loop?
- DECISION: Migrate to Python 3 first
  - otherwise we may be stuck with outdated versions of some critical libraries
  - Bruno has had an initial crack at this and got a simple suite to run
Test Coverage Reports
- and then more tests, for better coverage where possible
- need this for confidence in the initial Python 3 port
- Bruno has implemented this, with GitHub integration
- (DECISION: Done: PR in review)
Make long-lived cylc-8 development branch
- the new system is way too different to do incrementally and (compatibly) on master
- then:
  - delete the old GUI
  - packaging
  - test coverage
  - Python 3 migration
  - and all the new bits…
- DECISION: to be done immediately after cylc-7.8.0 release
Prototype the reverse proxy and the communication flow This is the foundation of the new architecture, so we should be comfortable with it sooner rather than later. We need to understand where the components sit and what technology to use. Risks and unknowns:
1. How to do initial authentication, and then management of authenticated sessions?
2. (How) Does it work with our enterprise identity management systems?
3. How to spin up new services/suites as user?
4. How to route requests and session tokens to existing services/suites?
5. How to exploit WebSocket for our use?
6. How can existing services/suites trust the reverse proxy?
7. Who/what serves open static content?
8. Do we write the new code in Python 3, given its independence from other components? (Some important modules may no longer be maintained in Python 2.)
Finalize API and authorization management for suite and other services. There are many things to do here, but they should be relatively straightforward.
1. Determine who should serve what and how. E.g.:
  1. Uni-directional or bi-directional via WebSocket.
  2. GraphQL to replace the REST end points currently used to serve the UI views.
  3. GraphQL to replace the rest of the REST end points?
2. Migrate “cylc review” (formerly Rose Bush) to new service points.
3. Develop other essential service points and logics.
4. Authorization and session management.
5. Job to suite session management.
6. Continue to develop the UI as we evolve the API to suit its needs.

This site is open source. Improve this page.