Managing Software Images#
🔍 OVERVIEW
Questions
What software environments are available on the hub?
How is containerization helpful for my work?
Is there a way to customise my software environment?
Objectives
Explain the value of reproducibility with containerized software environments
Select pre-configured JupyterLab or RStudio containers for their hub
Pull a custom container for their hub
Software for research and education#
Hub users usually require specific software packages to carry out their work. Installing software on a laptop significantly differs from installing software on a shared resource such as a hub in the cloud, largely because of different
operating systems, e.g. Linux, macOS, Windows
system libraries
filesystems
hardware.
Due to these factors, a software installation on a user’s local machine is difficult to reproduce and the installation process is often poorly documented and out of date.
Exercise: Software Challenges
Think about the challenges that you may have faced with using software for your work.
Is there a tool that you wanted to use that was not compatible with your operating system?
Have you ever struggled to install a piece of software?
Did you and your collaborators fail to reproduce results using the same software application?
Write up your thoughts in our shared collaborative document.
Reproduce software environments with containers#
Containers are a useful technology for overcoming software challenges in reproducibility, portability and scalability. A container bundles up the software application and its dependencies into a lightweight and standalone package that can be run on any infrastructure.
- Reproducibility
An image is a reusable and shareable file that outlines the “recipe” needed to create a container. This is useful for reproducing a software environment since you can tag and keep track of the different versions of the image.
- Portability
A container can run on different target systems, such as a laptop, supercomputer or cloud server.
- Scalability
Containers can be scaled up or down to take advantage of the system resources available (i.e. CPUs and RAM).
Exercise: Reasons to use containers
Which of the following statements are True or False?
Containers are lightweight, portable and isolated units of software that can be used on any computer and operating system.
An image is a runnable, self-contained software environment created from a container. A container is a shareable “recipe” used to create images.
A container enables reproducible modelling or analyses to be carried out on your laptop or in the cloud.
Containers are easily scalable and can be deployed on many machines to distribute work.
Solution
True
False – A container is a runnable, self-contained software environment or service created from an image. An image is a shareable “recipe” used to create containers.
True
True
Tip
For further resources on containerisation, please see Further Resources.
Selecting pre-configured containers on the hub#
As seen in the previous episode, Selecting the optimal server resources for your computational work responsibly, a user is presented with a list of server options once logged into a hub and each option has an Image dropdown box.
On the Community Showcase Hub, the default list of images available include
Jupyter DataScience - launches a JupyterLab interface with various Python[1], R[2] and Julia[3] packages installed
Rocker Geospatial – launches an RStudio interface with R packages[2] and geospatial toolkits installed
Handbook Authoring – a 2i2c-maintained image containing Python packages[4] for authoring documentation in Jupyter Book and MystMD
Other… – allows a user to self-serve a JupyterHub-compatible custom image (see section below).
Note
The list of image options presented can vary for different hubs. Hub Champions can build their own non-default environment and open a support ticket with 2i2c to modify this list to include non-default options – this is beyond the scope of this training although further details can be found in the Hub Service Guide.
Choosing a custom image with the “Other…” option#
The Other… option allows a user to specify a custom image container to pull into the hub. There are many container registries online that host containerized applications, such as Docker Hub, GitHub, Azure, Amazon and Google Container Registries, Red Hat Quay, etc.
Only containers that are compatible with JupyterHub can be pulled into a hub. To find JupyterHub-compatible containers you can, e.g.
browse the list of 2i2c-maintained hub images on Red Hat Quay
take a look at Jupyter Docker Stacks
take a look at the Rocker Project for R Docker containers (note only the
binder
image is JupyterHub-compatible)search Docker Hub for the term “jupyter”.
Caution
Anyone can create a container and publicly share it online, therefore it is important to be cautious about downloading this software onto your machine. A few good indicators to look for are
the image is updated regularly
the image is authored and maintained by a well-known company or community
the container is used by many people
there is an image file provided or metadata listing the exact contents of the container
documentation is provided on how to use the image.
An image listed on a container registry may have many different versions associated with it. A TAG
is used to distinguish these different versions. The name of the container image can also include the OWNER
. The general format for specifying an image is
OWNER/IMAGE_NAME:TAG
For example, if a user wanted to pull the Jupyter PyTorch notebook container, then they would enter quay.io/jupyter/pytorch-notebook:x86_64-pytorch-2.2.0
into the Custom image field.
Tip
We recommend always explicitly specifying a version number in the TAG
field rather than using the generic latest
tag. Providing the version number in the tag is useful for producing informative server logs for debugging purposes and allows you to check whether the correct version of the image is loaded into the hub by running the command
jovyan@user:~$ echo $JUPYTER_IMAGE
Exercise: Specifying a custom image tag
Which of the following would you paste into the Custom Image field to pull the latest version of the handbook-authoring-image
from the list of 2i2c-maintained hub images on Red Hat Quay to your hub?
Solution
Notes on the other answers:
This is using an older version tag and not the latest version available.
This is missing the version tag.
Here you would need to remove the ‘docker pull’ part.
🔑 KEY POINTS
Images are useful for reproducing software environments across platforms
Default containers are available on the hub
Additional custom containers can be pulled to the hub if required