Containers with Pytorch

Docker is a platform for developing, shipping, and running applications inside containers.

A software container is a lightweight, portable package that includes everything an application needs to run—including code, libraries, dependencies, operating system, and system settings.

It ensures your code can be deployed to different machines without worrying about installing dependencies. Although often compared to virtual machines because of their structure, containers are generally faster and more efficient than virtual machines.

This containerization guide will focus specifically on the AI/ML applications of containerization with Docker. For a more in-depth guide, view the official TACC containers guide.

What is a Docker? What are Images?

Docker is a platform that allows developers to package applications into containers and share them through the cloud.

A Docker image is a pre-configured package that contains everything needed to run an application, including the code, runtime, libraries, and dependencies. Once an image is instantiated, it becomes a container. The distinction is necessary because multiple containers can be instantiate from the same base image.

Apptainers vs Containers

Apptainer (formerly Singularity) is a containerization platform designed specifically for high-performance computing (HPC) environments, offering a solution optimized for scientific research and large-scale data processing.

Unlike general containers like Docker, which require root privileges and are commonly used for development and cloud-based applications, Apptainer is built to run efficiently on shared systems such as TACC’s supercomputers.

It provides portability, reproducibility, and seamless integration with HPC job schedulers, making it ideal for researchers who need to run complex applications in secure, isolated environments without compromising performance or requiring administrative access.

In this tutorial, we follow the workflow highlighted in TACC’s container tutorial .

We will:

Use Docker to develop containers locally

Push (upload) our container to Docker hub

Use apptainer to run the container on a TACC HPC system

Note: we can skip steps 1 and 2 above if a base container exists with all dependencies for our application, as you will see highlighted in the demo below.

Runnining GPU enabled PyTorch Containers at TACC

Below, we will walk you through the steps for setting up a GPU-enabled Pytorch container at TACC to run the Multigpu_Torchrun.py testing script from the How to Create a Virtual Environment and How to Install Conda tutorials.

For the purposes of this tutorial, we will be using the Frontera system.

Step 1: Login to Frontera

ssh <TACC username>@frontera.tacc.utexas.edu

Step 2: Move to your scratch directory

cd $SCRATCH

You can check your location by typing

pwd

The scratch directory will look different for everyone, but generally, the working directory for your scratch partition will look as follows:

/scratch1/<group number>/<username>

Note

We have used the $WORK directory thus far to run tasks. Here, we use $SCRATCH, since setting up Pytorch and CUDA are intensive I/O tasks.

Step 3: Request a Node

Apptainer is only available on compute nodes at TACCs system. To test container on our systems, we suggest launching an interactive session with idev. Below we request an interactive session on an gpu development node (-p rtx-dev) for a total time of 2 hours (-t 02:00:00).

idev -p rtx-dev -t 02:00:00

There may be a wait time as you sit in the queue. Once the command runs successfully, you will have a shell on a dedicated compute node, and your working directory will appear as follows:

c196-011[rtx](458)$

Step 3: Load in Apptainer

Once you have successfully have a shell launched on a compute node, you will need to load apptainer using module.

To view the default modules on Frontera, view tutorials here.

Run the command:

module list

You should see:

Currently Loaded Modules:
1) intel/19.1.1   4) autotools/1.2   7) hwloc/1.11.12  10) tacc-apptainer/1.3.3
2) impi/19.0.9    5) python3/3.7.0   8) xalt/2.10.34
3) git/2.24.1     6) cmake/3.24.2    9) TACC

We currently have modules loaded which are default to Frontera. You can choose to unload modules at your leisure.

To load the apptainer module, run:

module load tacc-apptainer

Now the apptainer command should be be available. You can check by typing:

type apptainer

Which should return:

apptainer is /opt/apps/tacc-apptainer/1.3.3/bin/apptainer

When you run module list, you should now see:

Currently Loaded Modules:
1) intel/19.1.1   4) autotools/1.2   7) hwloc/1.11.12  **10) tacc-apptainer/1.3.3**
2) impi/19.0.9    5) python3/3.7.0   8) xalt/2.10.34
3) git/2.24.1     6) cmake/3.24.2    9) TACC

Step 4. Download test data First, we will download some test data to run a simple ML task on. Clone the examples library from the official Pytorch Github repository by running:

git clone https://github.com/pytorch/examples.git

Step 5. Pull a Prebuilt PyTorch Docker Image

Instead of creating our own Dockerfile that is GPU-enabled, we can use an official PyTorch image from DockerHub to make the process of setting up a container for GPU use easier for us. For more detailed instructions on how to build and upload your own Docker image from scratch, see TACC’s official Docker tutorial.

Note

DockerHub is the official cloud-based repository where developers store, share, and distribute Docker images, similar to Github.

Run the following command to pull the latest PyTorch image from Dockerhub with CUDA support:

apptainer pull output.sif docker://pytorch/pytorch:2.5.1-cuda12.4-cudnn9-devel

This will download the image and convert it into an Apptainer image format (.sif). You can replace “output.sif” with whatever you would like to name the file. Otherwise, it will default to the name of the image as defined on Dockerhub.

Note

CUDA is an API that allows software to utilize NVIDIA GPUs for accelerated computing. This is essential for deep learning because GPUs process certain calculations much faster than CPUs. Since TACC machines have NVIDIA GPUs, we must use a CUDA-enabled PyTorch image to fully leverage GPU acceleration.

Step 6. Run code on GPU

Finally, we can execute the multigpu training script within our Pytorch container. It is important to note in the command below that apptainer will only recognize the presence of a GPU by passing the apptainer command the --nv flag.

$ apptainer exec --nv output.sif torchrun --nproc_per_node=4 examples/distributed/ddp-tutorial-series/multigpu_torchrun.py 50 10

Step 7: Verifying the Script Execution Once you’ve executed the script, you can check the output directly in your terminal. If there are any issues or errors, they will be displayed in the terminal.

Conclusion

You have now successfully pulled a PyTorch image from Docker Hub and run a Python script within an Apptainer container.

For a more detailed introduction to containers please see the Containers at TACC tutorial.

You have also now completed the first section of this tutorial. In the next section, we will detail how to build your own container on top of a CUDA base container, install pytorch and other dependencies, and upload it to TACC systems.