Cylc Web GUI & Architecture: Tentative Roadmap
(Last update: 1 Nov 2018)
DOCUMENT OUTDATED
This pre-dates the dec 2018 workshop; see:
Target completion date: late 2019 (Python 2 end of life)
Architecture
We need something very much like the Jupyter
Hub Architecture: a privileged
hub that spawns a reverse proxy server and sub-services for suite discovery etc.,
and spawns UI servers (c.f. Jupyter notebooks) and potentially suite daemons,
as the user.
Note below the term “reverse proxy server” may encompass full “hub” (and
sub-service) functionality as well as a web proxy (but these should be
separate components); and “suite daemon” == “suite server program”.
Core Principles & Motivation For The New Architecture
- Enable users to run suites via the browser from any platform without
requiring a Cylc installation on that platform.
- Enable users to view and interact with suites on remote platforms without
requiring shared filesystem or SSH access.
- Remove the requirement for port scanning.
- Retire the self-signed SSL certificate.
- Provide a single point of access to Cylc suites running on a server,
group of servers or trust zone.
- Replace the on-disk plain-text passphrase authentication between the client
and the web UI.
Reverse Proxy and Hub
- Single point of access, to:
- Discover and present suites, and route client requests to them
- (this could include for task job clients - status messaging etc.
especially for jobs running on remote platforms with locked-down
ports)
- Uses several sub-services:
- Suite discovery (running and stopped suites) (not needed?)
- Suite start-up (stopped suites)
- User authentication (below)
- Static services: “cylc graph”, “cylc review” (formerly Rose Bush), …
- GUI server for suite status data (see H)?
- Extend to access to other services: “rose config-edit”, rosie suite repository
- Implementation:
- Python web framework: Tornado, Flask?
- Ad-hoc server or WSGI service? (under Apache, NGINX, gevent?)
- JupyterHub
- similar architecture
- very good technical
documentation
- solves web proxy, spawn-as-user, authentication (and discovery?) out of the box?!
- configurable-http-proxy
supports websocket
- (running the hub as root, or as a less-privileged user with sudo:
here)
Suite Server API
- (Server “endpoint” functions called by clients)
- Currently a REST API: multiple fixed and inflexible endpoints
- GraphQL instead? one simple, flexible endpoint?
- Do we expose the API to clients, or translate at the Reverse Proxy (or GUI Server?)?
- Implementation: Python web framework:
a. Ad-hoc server or WSGI service? (under Apache or NGINX or gevent?)
a. Flask (+gevent?)
a. Tornado (async even at
Python 2; “ideal for websocket or long polling”)
a. (popularity)
a. Other? Django; Sanic
(flask-like async); AIOHTTP
- Distinguish between the “suite status API” (for the GUI - WebSocket and
GraphQL an advantage here?) and the “suite control API” (for commands
like
cylc stop
and cylc trigger
- WebSocket and GraphQL less of a
win, but could use for overall consistency, and get real response back
from commands without polling)
GUI Server?
- (collates status data from multiple suites and serve the GUI) - do we need this?
- run as the user
- spawned by the “hub”?
- Pros:
- present information about stopped suites in uniform way
- no HTML templating in the suite server program
- take much of the comms load off suite server programs?
- server program push changes?
- or scrape suite databases? (removes all comms load from the suites,
but what about disk latency, e.g. if on NFS?)
- Cons:
- Complexity? - yet another component
- suite status information less “distributed” - memory hog, for many large suites?
Communication Protocols
- The WebSocket protocol may be ideal:
- Very efficient, persistent full-duplex (and therefore server push)
- No need for polling by the GUI?!
- Proper quick feedback in response to client commands?
- all the way from GUI to suite daemons? (If not, what are the implications?)
- The reverse proxy means suite servers don’t necessarily have to talk
HTTPS (or WebSocket), e.g.
- Protocol Buffers and gRPC
- ZeroMQ
- can we ignore these options initially, and refactor later within the
new architecture, only if necessary?
Authentication
- For the GUI, and for client CLI commands executed by users
- Current method (automatic owner passphrase on disk) won’t work for web GUI
- How to plug in to site identity management? (LDAP, AD, RedHat IPA)
- Session management for GUI (session tokens stored as secure cookies?)
- Session management for command line clients??
- Authenticate at the Reverse Proxy, not at suites?
- Require suites to automatically trust the reverse proxy – SSL Client Certs?
Authorization
- (which users can do what?)
- Authenticated user information passed to suites from reverse proxy?
- Simplest: text file read by suite and/or reverse proxy, mapping users to privileges
- At suite server program: is user authorized for the requested action?
- Reverse proxy: can user spin up a suite or service, or connect to an existing one?
- (authentication by job clients - see next section)
Job-to-Suite Communications
- (“job authentication” required for status messaging and other client commands)
- Currently jobs authenticate automatically as the user - won’t work for web GUI
- Secure one-time token provided by the suite program?
- Jobs connect directly to suites, OR go via the Reverse Proxy like user clients?
- Jobs don’t need “suite discovery” (the suite can tell its jobs its location)
- But going via reverse proxy may still be convenient for other reasons
Suite Status Data
- Underlies the following displays:
- Detailed suite status views: dependency graph, text tree, “dot”
- Multi-suite summary status
- Task inheritance tree (to expand/collapse task families in all views)
- The new GUI needs to be more integrated and dynamic than the old:
- Multi-suite summary (and stopped suites) and suite views, all at once
- Show only selected parts of a large suite (esp. for the graph view)
- For efficiency, we need to rethink how suites present this data
- Avoid sending redundant or unneeded information to the GUI
- Clients to select just what they need: suggests GraphQL endpoint?
- Incremental updates – can we send only what’s changed?
- Lists of nodes and edges (or just edges, with full nodes at each end?)?
- (probably can’t use a nested tree structure based on runtime
inheritance - it would be useful for collapsible family views,
but the dependency graph can’t be encoded in this form).
-
display “ghost nodes” in all views - i.e. hide the “task pool” from users
- Oliver’s thought’s on data structure and GraphQL schema.
- Relay: pagination with GraphQL.
GUI
- Which front-end JavaScript framework? (Angular, React, Vue, D3, …?)
- Must look good, and be efficient and responsive for very large suites
- May/June demos suggest development won’t be as difficult as the old
GUIs (but instead we have many architectural issues to contend with).
- Should pages be served by the suites, or a “GUI server” under the reverse proxy?
- From suite server endpoints, or suite DBs? Is latency a problem in the DB case?
- Recent comparison
- Typescript? Maybe not
- Automated testing?
- UI Design
- Oliver’s UI design ideas
- Sadie’s mind maps!
- “Serve Everything” (GUI has no access to the filesystem)
- Current GUI gets some information from the filesystem:
- Static suite visualization and validation (parses suite config file)
(low priority?)
- Live access to job logs (uses “cylc cat-log” which reads files off disk)
(low priority?)
- Edit run (generate new job file and present it in the user’s editor)
(higher priority)
Transition Period
- Users must continue to use the old GUI while we developing the new one
- Evolve the old GUI along with the new suite API?
- Or support the old API as-is concurrently with the new?
- Or develop the cylc-8 separately on a new branch?
- DECISION: develop cylc-8 separately on a long-lived dev branch
- the new system is too different to develop “in place”
- early migration to Python 3 also demands this
Packaging
Figure 1: Current (cylc-7) architecture: simple: all clients are equal (CLI,
GUI, Jobs), automatic owner authentication; but suite discovery and
authentication require clients to see the file system (also port scanning for
discovery) – an in-browser GUI cannot do this.
Figure 2: Proposed new architecture
Figure 3: possible client-suite interaction, in the new architecture
Figure 4: possible job-suite interaction, in the new architecture
Prioritization
Many of the following activities can be done concurrently if we have a big
enough team. But we should prioritize the highest risks where possible.
-
Identify the front-end framework for the UI - not a big risk (many
frameworks would do) but it is a big unknown, and we may have to spend
considerable time learning the technologies
-
Python 3 Migration
- Port to Python 3 after everything is developed?
- can keep the old GUI in the interim
- Do the Python 3 migration first
- can develop all the new stuff in Python 3
- What new techniques can we exploit after the initial style change?
- Asyncio for the main loop?
- DECISION: Migrate to Python 3 first
- otherwise we may be stuck with outdated versions of some critical
libraries
- Bruno has had an initial crack at this and got a simple suite to run
- Test Coverage Reports
- Make long-lived cylc-8 development branch
- the new system is way too different to do incrementally and (compatibly)
on master
- then:
- delete the old GUI
- packaging
- test coverage
- Python 3 migration
- and all the new bits…
- DECISION: to be done immediately after cylc-7.8.0 release
- Prototype the reverse proxy and the communication flow This is the
foundation of the new architecture, so we should be comfortable with it
sooner rather than later. We need to understand where the components sit and
what technology to use. Risks and unknowns:
- How to do initial authentication, and then management of authenticated sessions?
- (How) Does it work with our enterprise identity management systems?
- How to spin up new services/suites as user?
- How to route requests and session tokens to existing services/suites?
- How to exploit WebSocket for our use?
- How can existing services/suites trust the reverse proxy?
- Who/what serves open static content?
- Do we write the new code in Python 3, given its independence from other
components? (Some important modules may no longer be maintained in Python 2.)
- Finalize API and authorization management for suite and other services.
There are many things to do here, but they should be relatively
straightforward.
- Determine who should serve what and how. E.g.:
- Uni-directional or bi-directional via WebSocket.
- GraphQL to replace the REST end points currently used to serve the UI views.
- GraphQL to replace the rest of the REST end points?
- Migrate “cylc review” (formerly Rose Bush) to new service points.
- Develop other essential service points and logics.
- Authorization and session management.
- Job to suite session management.
- Continue to develop the UI as we evolve the API to suit its needs.