To run this bootcamp you will need a machine with NVIDIA GPUs. The profiling tools require:
- GPU: NVIDIA GPUs with Ampere architecture and above (SM 80+) for Nsight Systems and Nsight Compute.
- Container Runtime: Install Docker or Singularity
- NVIDIA Toolkit: Install NVIDIA toolkit, Nsight Systems.
- NGC Account: Building the base container image requires users to create a NGC account and generate an API key.
- Linux Machine: Ubuntu Operating System.
We tested and ran all labs on a DGX machine equipped with A100 and H100 GPUs.
You can deploy this material Docker containers. Please refer to the respective sections for the instructions.
To run the labs, you will need access to a single GPU. Build a Docker container by following these steps:
-
Open a terminal window and navigate to the directory where the Dockerfile is located (e.g.,
cd ~/Profiling-AI-Software-Bootcamp) -
To build the docker container, run:
sudo docker build -t aiprofiler-jupyter:latest .- To run the built container:
docker run -it --gpus "all" \
-p 8888:8888 --rm \
-v /path/to/Profiling-AI-Software-Bootcamp:/workspace-aiprofiler \
aiprofiler-jupyter:latestFlag descriptions:
--rmcleans up temporary images created during the running of the container-itenables interactive mode and killing the jupyter server withctrl-c--gpus=allenables all NVIDIA GPUs during container runtime-vmounts local directories in the container filesystem-pexplicitly maps port 8888
When this command is run, you can browse to the serving machine on port 8888 using any web browser to access the labs. For instance, if running on the local machine, the web browser should be pointed to http://localhost:8888.
-
Once inside the container, open the jupyter lab in browser: http://localhost:8888, and start the lab by clicking on the
start_here.ipynbnotebook. -
As soon as you are done with the labs, shut down jupyter lab by selecting File > Shut Down and exit the container by typing
exitor pressingctrl + din the terminal window.
Check the container logs:
docker logs <container_id>Ensure the workspace path in the -v flag points to the correct local directory
The tools should be pre-installed in the container. Verify installation:
nsys --versionFor additional support, please refer to the NVIDIA Developer Forums or open an issue in the repository.
If you encounter ERR_NVGPUCTRPERM error when profiling, ensure the container is started with --cap-add=SYS_ADMIN. For a permanent solution, enable access on the host: sudo sh -c 'echo "options nvidia NVreg_RestrictProfilingToAdminUsers=0" > /etc/modprobe.d/nvidia-profiling.conf' then reboot.
See NVIDIA's solutions guide for details.