Managing Software Images

Managing Software Images#

🔍 OVERVIEW

Questions

What software environments are available on the hub?
How is containerization helpful for my work?
Is there a way to customise my software environment?

Objectives

Explain the value of reproducibility with containerized software environments
Select pre-configured JupyterLab or RStudio containers for their hub
Pull a custom container for their hub

Software for research and education#

Hub users usually require specific software packages to carry out their work. Installing software on a laptop significantly differs from installing software on a shared resource such as a hub in the cloud, largely because of different

operating systems, e.g. Linux, macOS, Windows
system libraries
filesystems
hardware.

Due to these factors, a software installation on a user’s local machine is difficult to reproduce and the installation process is often poorly documented and out of date.

Exercise: Software Challenges

Think about the challenges that you may have faced with using software for your work.

Is there a tool that you wanted to use that was not compatible with your operating system?
Have you ever struggled to install a piece of software?
Did you and your collaborators fail to reproduce results using the same software application?

Write up your thoughts in our shared collaborative document.

Reproduce software environments with containers#

Containers are a useful technology for overcoming software challenges in reproducibility, portability and scalability. A container bundles up the software application and its dependencies into a lightweight and standalone package that can be run on any infrastructure.

block-beta columns 3 id0{{"Containerized Software Applications<br>Entornos de contenedores de software"}}:3 block:apps:3 %% columns auto (default) A["App A"] B["App B"] C["App C"] D["App D"] E["App E"] F["App F"] end id1["Container Host"]:3 id2["Operating System<br>Sistema Operativo"]:3 id3["Infrastructure<br>Infraestructura"]:3

Fig. 7 Diagram of how a layer of many user software container environments shares the hub’s underlying operating system and server infrastructure.#

Reproducibility: An image is a reusable and shareable file that outlines the “recipe” needed to create a container. This is useful for reproducing a software environment since you can tag and keep track of the different versions of the image.
Portability: A container can run on different target systems, such as a laptop, supercomputer or cloud server.
Scalability: Containers can be scaled up or down to take advantage of the system resources available (i.e. CPUs and RAM).

Exercise: Reasons to use containers

Which of the following statements are True or False?

Containers are lightweight, portable and isolated units of software that can be used on any computer and operating system.
An image is a runnable, self-contained software environment created from a container. A container is a shareable “recipe” used to create images.
A container enables reproducible modelling or analyses to be carried out on your laptop or in the cloud.
Containers are easily scalable and can be deployed on many machines to distribute work.

Solution

True
False – A container is a runnable, self-contained software environment or service created from an image. An image is a shareable “recipe” used to create containers.
True
True

Tip

For further resources on containerisation, please see Further Resources.

Selecting pre-configured containers on the hub#

As seen in the previous episode, Selecting the optimal server resources for your computational work responsibly, a user is presented with a list of server options once logged into a hub and each option has an Image dropdown box.

Screenshot showing a list of server options available on the Community Showcase Hub.

On the Community Showcase Hub, the default list of images available include

Jupyter DataScience - launches a JupyterLab interface with various Python[1], R[2] and Julia[3] packages installed
Rocker Geospatial – launches an RStudio interface with R packages[2] and geospatial toolkits installed
Handbook Authoring – a 2i2c-maintained image containing Python packages[4] for authoring documentation in Jupyter Book and MystMD
Other… – allows a user to self-serve a JupyterHub-compatible custom image (see section below).

Note

The list of image options presented can vary for different hubs. Hub Champions can build their own non-default environment and open a support ticket with 2i2c to modify this list to include non-default options – this is beyond the scope of this training although further details can be found in the Hub Service Guide.

Choosing a custom image with the “Other…” option#

The Other… option allows a user to specify a custom image container to pull into the hub. There are many container registries online that host containerized applications, such as Docker Hub, GitHub, Azure, Amazon and Google Container Registries, Red Hat Quay, etc.

Only containers that are compatible with JupyterHub can be pulled into a hub. To find JupyterHub-compatible containers you can, e.g.

browse the list of 2i2c-maintained hub images on Red Hat Quay
take a look at Jupyter Docker Stacks
take a look at the Rocker Project for R Docker containers (note only the binder image is JupyterHub-compatible)
search Docker Hub for the term “jupyter”.

Caution

Anyone can create a container and publicly share it online, therefore it is important to be cautious about downloading this software onto your machine. A few good indicators to look for are

the image is updated regularly
the image is authored and maintained by a well-known company or community
the container is used by many people
there is an image file provided or metadata listing the exact contents of the container
documentation is provided on how to use the image.

An image listed on a container registry may have many different versions associated with it. A TAG is used to distinguish these different versions. The name of the container image can also include the OWNER. The general format for specifying an image is

OWNER/IMAGE_NAME:TAG

For example, if a user wanted to pull the Jupyter PyTorch notebook container, then they would enter quay.io/jupyter/pytorch-notebook:x86_64-pytorch-2.2.0 into the Custom image field.

Screenshot showing where to specify a custom image quay.io/jupyter/pytorch-notebook:x86_64-pytorch-2.2.0 in the server options page on a hub.

Tip

We recommend always explicitly specifying a version number in the TAG field rather than using the generic latest tag. Providing the version number in the tag is useful for producing informative server logs for debugging purposes and allows you to check whether the correct version of the image is loaded into the hub by running the command

jovyan@user:~$ echo $JUPYTER_IMAGE

Exercise: Specifying a custom image tag

Which of the following would you paste into the Custom Image field to pull the latest version of the handbook-authoring-image from the list of 2i2c-maintained hub images on Red Hat Quay to your hub?

Solution

quay.io/2i2c/handbook-authoring-image:7c62c6e63869

Notes on the other answers:

This is using an older version tag and not the latest version available.
This is missing the version tag.
Here you would need to remove the ‘docker pull’ part.

🔑 KEY POINTS

Images are useful for reproducing software environments across platforms
Default containers are available on the hub
Additional custom containers can be pulled to the hub if required