Recognising the benefits your hub can provide for your user community#
🔍 OVERVIEW
Questions
What is interactive computing in the cloud?
How can interactive computing in the cloud serve hub communities in research and/or education?
What is the role of a Hub Champion?
Objectives
Identify the benefits provided by a hub for interactive cloud computing
Recognise use cases for interactive cloud computing in a Hub Champion’s user community
Describe their role and capacity as a Hub Champion to administer and support their hub
Introduction#
This episode introduces the concept of a hub for interactive cloud computing and how a hub in the cloud can benefit the user communities a hub champion supports.
How can interactive computing in the cloud benefit my community?#
Interactive computing is a fundamental way in which researchers and educators use a computer program to perform tasks that include (but are not limited to)
generating data from experiments or simulations, or gathering data from secondary sources
processing and analysing data using statistical techniques and algorithms
developing data-driven and/or mechanistic predictive models
visualising results to graphically reveal insights.
Real-time interactivity helps information flow back and forth between the user and their work, creating a dynamic and productive environment for research and educational activity. Project Jupyter is an ecosystem of open-source tools that can provide a web-based interactive computational environment, in the form of “notebooks”, for several languages, namely Julia, Python and R (hence the name Project Jupyter).
With the advent of “big data”, many researchers and educators encounter limitations with using traditional methods for handling data on a local machine or laptop. The size and number of datasets prevalent in disciplines such as genomics, meteorology, healthcare, and environmental sciences, to name a few, are growing at an exponential rate, and consequently, the need for large-scale infrastructure to support continually intensive computational workflows also grows.
Cloud computing is one way of provisioning the system resources needed to meet this demand. This relies on the sharing of on-demand services usually provided by commercial entities such as AWS, Google Cloud and Microsoft Azure. Academics in research and education are often limited in their capacity to sustainably administrate cloud infrastructure at scale and may also wish to support open-source tools and vendor-agnostic infrastructure to protect scientific reproducibility and encourage effective collaboration.
This is where a non-profit initiative such as 2i2c can help. 2i2c specialises in managing open Jupyter architecture in the cloud specially designed for communities of practice in research and education using open-source tools that allow them to reproduce environments, code and data on different machines. Users can access large-scale compute and storage as needed and workflows are entirely reproducible and supported by using community-driven, open-source technologies.
JupyterHubs in the Cloud#
In this lesson, our definition of the word hub refers to a JupyterHub that is hosted on cloud infrastructure and managed by 2i2c.
Community Hub#
A hub provides an access point to interactive computing in the cloud for a user community. Access to the hub is via a URL of the following form
<hub-name>.<community-name>.2i2c.cloud
and the landing page contains a Log in to continue button (see example screenshot)
Authentication#
Access to the hub is controlled by Hub Champions and granted by adding a user’s GitHub account to a special GitHub Team associated with the hub. A permitted user enters their GitHub credentials to log into the hub (see example screenshot)
Custom environments#
Users can choose from several machine types with varying numbers of CPU cores and RAM available, and select their desired software environment using images.
Images available by default include
Handbook Authoring (JupyterLab user interface) – installed Python packages include
jupyter-book, jupyterlab_myst, ghp-import, numpy, matplotlib
Jupyter DataScience (JupyterLab user interface) – installed Python packages include
dask, h5py, pandas, scikit-learn, scipy, sympy
Rocker Geospatial (RStudio user interface) – installed R packages include
ncdf4, proj4, raster, rgdal, rgeos, sf, sp
Other… – specify a custom JupyterLab/RStudio image
Online Content#
There are many ways to manage online content on the hub. For example
users have access to their own filesystem and a home directory of up to 10 GB
Hub Champions can distribute data to all hub users in a shared directory
users can securely pull and push code to and from the hub using GitHub.
Cloud infrastructure#
Hubs are deployed on AWS, Google Cloud or Microsoft Azure commercial cloud providers. Code for 2i2c hub configuration and deployment follows best practices and is open and transparent to all. 2i2c hubs are designed with the Right to Replicate by anybody on their own cloud infrastructure.
Support and Services#
2i2c provides dedicated operations support for the hub, such as
continuously monitoring the infrastructure
responding to incidents
deploying hub environments
upstreaming open-source developments
operating Kubernetes clusters.
Hub Champions are the first point of contact for their user community to provide support and guide users to make the best use of the hub. Hub Champions may then represent and escalate support requests to 2i2c for more technical issues.
Which of the following statements are True/False?
a. 2i2c-managed hubs are only available via a single cloud provider.
b. A hub can be accessed and used by anyone who knows its URL.
c. Software environments are inflexible and irreproducible.
d. 2i2c provides operations support for your hub.
Solution
a. False – Hubs are built entirely with open-source and community-driven tooling. The Right to Replicate gives communities the right to replicate their infrastructure in its entirety elsewhere.
b. False – Hub Champions can control who can access the hub through GitHub or other authentication methods.
c. False – Software environments are managed by Hub Champions and automatically deployed with containerisation, allowing for scalability across the cloud and reproducible user environments.
d. True – 2i2c engineers maintain service availability, uptime and operational upgrades. Hub Champions are responsible for hub configuration and management to support their user community.
Define success for your user community#
As a Hub Champion, your main goal is to empower your community to make the best use of the Hub service. You represent the interests of your community and are familiar with their computational workflows and data needs.
To ensure that the hub serves the interests of your community, Hub Champions may perform common administrative tasks such as
controlling user access policy to the hub
stopping and restarting servers for users
guiding users to selecting the appropriate server options and images for their use case
transferring and distributing data on the hub
representing and escalating support requests to 2i2c for technical issues
facilitate knowledge transfer within the community, e.g. training events, documentation, and communication channels
Exercise: What are the needs of your user community?
Let us take some time to reflect and assess the needs of your user community.
In pairs, discuss and share the following points:
What domain of expertise does your community have?
How large is your community? Do they work alone or do they need to collaborate together?
Does your community work with large datasets?
Does your community need access to intensive computational power?
Where applicable, are the needs for collaboration, large datasets, and/or computational power consistent across your whole community?
What are the main software applications your community uses to conduct their work?
How familiar is your user community with version control using git/GitHub?
How can a hub address the challenges your community faces?
Prepare to summarise and share with the rest of the workshop.
Exercise: In which of the following ways should a Hub Champion support their users?
a. Providing community guidance on best research software practices for users of your hub.
b. Troubleshooting and supporting common user issues.
c. Communicating infrastructure level requests and incidents to 2i2c.
d. Overseeing user access policy to the hub.
Solution
All of the above are things a Hub Champion should do to enable their community to make the best use of a hub.
Next Steps#
The next part of the training is a self-guided study of the following episodes
Selecting the optimal server resources for your computational work responsibly
Navigating the filesystem and transferring data to and from the hub.
and concludes with a synchronous session covering the episode Troubleshooting and providing user support.
To complete the self-study portion of the workshop, you will need to be able to access the 2i2c Community Showcase Hub. When working through the self-study sections of the workshop, write your answers to the exercises in our collaborative note-taking document.
If you cannot access the hub, please contact the training instructor.
🔑 KEY POINTS
A hub is a 2i2c-managed JupyterHub in the cloud that provides an interactive computing service to users
Hub Champions empower hub users to make the best use of the service and are the first point of contact for user support
Hub Champions may perform common administrative tasks and configure the hub to set their community up for success