Skip to content

Instantly share code, notes, and snippets.

@yejingxin
yejingxin / nsight.sh
Created September 24, 2024 18:34 — forked from mcarilli/nsight.sh
Favorite nsight systems profiling commands for Pytorch scripts
# This isn't supposed to run as a bash script, i named it with ".sh" for syntax highlighting.
# https://developer.nvidia.com/nsight-systems
# https://docs.nvidia.com/nsight-systems/profiling/index.html
# My preferred nsys (command line executable used to create profiles) commands
#
# In your script, write
# torch.cuda.nvtx.range_push("region name")
# ...
@skye
skye / tpu_topology_env_vars.py
Last active April 13, 2025 18:33
You can use these environment variables to run a Python process on a subset of the TPU cores on a Cloud TPU VM. This allows running multiple TPU processes at the same time, since only one process can access a given TPU chip at a time. Note that on TPU v2 and v3, 1 TPU chip = 2 TpuDevice as reported by `jax.devices()` (8 devices total). On v4, 1 …
# ==== Non-communicating processes
# 4x 1 chip per process:
os.environ["TPU_CHIPS_PER_PROCESS_BOUNDS"] = "1,1,1"
os.environ["TPU_PROCESS_BOUNDS"] = "1,1,1"
# Different per process:
os.environ["TPU_VISIBLE_DEVICES"] = "0" # "1", "2", "3"
# 1-liner for bash: TPU_CHIPS_PER_PROCESS_BOUNDS=1,1,1 TPU_PROCESS_BOUNDS=1,1,1 TPU_VISIBLE_DEVICES=0 TPU_MESH_CONTROLLER_ADDRESS=localhost:8476 TPU_MESH_CONTROLLER_PORT=8476
@mcarilli
mcarilli / nsight.sh
Last active April 23, 2025 01:26
Favorite nsight systems profiling commands for Pytorch scripts
# This isn't supposed to run as a bash script, i named it with ".sh" for syntax highlighting.
# https://developer.nvidia.com/nsight-systems
# https://docs.nvidia.com/nsight-systems/profiling/index.html
# My preferred nsys (command line executable used to create profiles) commands
#
# In your script, write
# torch.cuda.nvtx.range_push("region name")
# ...