Abstract

The demand for computational power for scientific research has increased dramatically in the last decade. To deal with this demand, high-performance computing (HPC) clusters have been established as a collaboration between multiple research institutions and universities, providing thousands of researchers with a large pool of shared computing resources. Simply put, HPC clusters are a bunch of interconnected computers where individual users can carry out large-scale computations. Due to security reasons, only a few administrators can install software on these computers. As a result, individual users are constrained to use only the software provided by the administrators, limiting the use-case of HPC clusters. Singularity and docker solve this issue by allowing individual users to create and run custom virtual software environments where they can install any software they like. In this blog post, I’ll show you how to create a docker container and use it with singularity (a.k.a. apptainer).

Step 1: Create a dockerfile

To create a docker container, we first need to create what is called a dockerfile.

touch my.dockerfile
vi my.dockerfile

Inside the my.dockerfile, we need to list instructions for Docker to create a docker container. The format of a typical docker container that I use looks as follows:

# Dockerfile for Seurat 4.3.0
FROM rocker/r-ver:4.2.0
# Install Seurat's system dependencies
RUN apt-get update
RUN apt-get install -y \
    libhdf5-dev \
    libcurl4-openssl-dev \
    libssl-dev \
    libpng-dev \
    libboost-all-dev \
    libxml2-dev \
    openjdk-8-jdk \
    python3-dev \
    python3-pip \
    wget \
    git \
    libfftw3-dev \
    libgsl-dev \
    pkg-config
RUN apt-get install -y llvm-10
# Install system library for rgeos
RUN apt-get install -y libgeos-dev
# Install UMAP
RUN LLVM_CONFIG=/usr/lib/llvm-10/bin/llvm-config pip3 install llvmlite
RUN pip3 install numpy
RUN pip3 install umap-learn
RUN git clone --branch v1.2.1 https://github.com/KlugerLab/FIt-SNE.git
RUN g++ -std=c++11 -O3 FIt-SNE/src/sptree.cpp FIt-SNE/src/tsne.cpp FIt-SNE/src/nbodyfft.cpp  -o bin/fast_tsne -pthread -lfftw3 -lm
# Install bioconductor dependencies & suggests
RUN R --no-echo --no-restore --no-save -e "install.packages('BiocManager')"

In each line, the capitalised words are instructions for docker to tell what to do with the code following those words. For example, FROM tells docker that it should use the docker container located at hub.docker.com/layers/rocker/r-ver/4.2.0 as a base image and build the container on top of that image. This is typically used to set the operating system of the docker container, e.g. ubuntu, Fedora, etc. . In this particular case, I’m using the official docker container of R programming language version 4.2.0 as the base so that I don’t have to install it myself.

The lines following the first line tell which commands to execute to set up the container. For example, in the second line, I’m first updating the virtual operating system of the docker container, and then in the third line, I’m installing some libraries to the operating system that will be needed for the R packages that I use. The details of how to set up a docker container are beyond the scope of this blog post, so I’ll skip that for the moment.

Once you finish setting up the dockerfile, now we are ready to build the container and upload it somewhere so that we can access it whenever we want.

Step 2: Create the docker container and push it to dockerhub

Assuming that you are creating the docker container in a non-Linux environment, we will tell the docker that the container will be used in the Linux environment while we create the container.

docker build -t mycontainer:latest . -f my.dockerfile  --platform linux/x86_64 2>&1 | tee build.log

Here, mycontainer:latest gives the container a name and a tag to easily identify it. The tag is like a version of that container. The dot . following tells the docker that the docker file -f my.dockerfile is located at the current location where this command is executed. The parameter --platform Linux/x86_64 tells docker that the container should be compatible with a Linux host. Once you execute the above command, it can take from minutes to hours to create the container, depending on what you asked the docker to install to your container.

Once the creation of the container is complete, we need to push it a dockerhub as a repository. First, create an account at dockerhub. Then create a repository at dockerhub on the website. You will push the container you created on your computer to the repo you created in docker hub. Then from your terminal, log into your account

docker login --username <dockerhub username>

Now you first need to tag the docker container in your computer to the repo in the dockerhub. List all docker images and copy IMAGE ID of that container. It should be a 12-character alphanumeric string, something like 76ad0cae35c3.

docker image ls

Once you find the IMAGE ID of your container, we tag it to the repo as

docker tag <IMAGE ID> <dockerhub username>/<repo name>

Now we are ready to push the container that is still stored in your computer to the repo at <dockerhub username>/<repo name>.

docker push <dockerhub username>/<repo name>

It might take some time for the push to finish if this is your first time creating this container and it is a large container.

Step 3: Run the docker container with Singularity/Apptainer

Having created the docker container and stored it in the dockerhub, we are ready to use with singularity. Log into the HPC cluster and make sure that the singularity/apptainer is installed already. We first make sure that the cache directories are created - otherwise, singularity will complain. Then we can directly run the docker container with singularity in the following way:

mkdir $SINGULARITY_TMPDIR 
mkdir $APPTAINER_CACHEDIR
singularity run  docker://<dockerhub username>/<repo name>:latest

Here, the word docker before <dockerhub username>/<repo name>:latest tells singularity to search for the repo in the dockerhub. With this, singularity downloads the docker container at <dockerhub username>/<repo name>:latest and runs it. Once it is done, you will be inside the container and start using the software that you have installed in that container.

However, typically, when we are running the docker container, we also would like to access the data is that in the host environment. As a result, we need to bind some of the directories in the host to a directory in the virtual operating system. Binding simply means that you make a directory in the host operating system accessible from within the container. To accomplish this we pass --bind parameter. The path before : indicates the directory in the host, and the path after : indicates the path inside the container.

singularity run --bind <host path>:<container path> docker://<dockerhub username>/<repo name>:latest

Conclusion

In conclusion, here I have shown you how to setup of a Docker container, push it to dockerhub so that it can be used anywhere, and run it using Singularity. Thanks to such containers, you can install and use any software/package in any of the computers that you have access to, as long as docker or singularity is installed.