Skip to content

Instantly share code, notes, and snippets.

@shivanshuraj1333
Last active August 26, 2024 22:53
Show Gist options
  • Save shivanshuraj1333/7cb66f122afb2096a59faa8361c4a643 to your computer and use it in GitHub Desktop.
Save shivanshuraj1333/7cb66f122afb2096a59faa8361c4a643 to your computer and use it in GitHub Desktop.
DRA example demo

-> Machine configuration

2 Nvideia GPU A100 with 40 Gb capacity which supports MIG

-> Show the current GPU configuration

command

nvidia-smi -L

-> Docker is configured to use Nvidia container runtime, meaning docker can access GPU slices using nvidia runtime

cat .config/docker/daemon.json

-> Check nvidia driver

kubectl get pods -n nvidia-dra-driver

^this take care of allocation of GPUs

-> Can be enabled or disabled via

nvidia-smi -i 0 -mig 1

-> Check MIG mode enabled

nvidia-smi -i 0 -q 

-> Check avaialble profiles

sudo nvidia-smi mig -lgip
  • lists all the available Multi-Instance GPU (MIG) profiles, decide based on your resource claims of the pods

-> Create MIG

sudo nvidia-smi mig -cgi 19,19,20,5,0 -C

these are ids

-> Again list the MIG

nvidia-smi

List MIG and their sizes

nvidia-smi -L

-> Run a sample workload with docker which lists the MIG devices

sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

-> Created MIGs

sudo nvidia-smi mig -lgi

Main repo: https://github.com/NVIDIA/k8s-dra-driver

Share example to define resource claims: https://github.com/NVIDIA/k8s-dra-driver/blob/main/demo/specs/quickstart/gpu-test5.yaml

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment