pip install ipydrawio[all] jupyterlab_widgets jupyterlab_image_editor jlab-enhanced-cell-toolbar \
jlab-enhanced-launcher jupyterlab_latex jupyter-server-proxy jupyter-vscode-proxy
sudo systemctl restart jupyterhub.service
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Show route | |
netstat -nr | |
# Show DNS | |
cat /etc/resolv.conf | |
# Default connection | |
sudo /sbin/route change default 10.6.0.1 # 10.0.0.1 | |
# Forward server address via VPN |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#define BLOCK_SIZE 1024 | |
#define STRIDE 16 | |
__global__ void kernel(float *A, float *B) { | |
int idx = blockIdx.x * blockDim.x + threadIdx.x; | |
if (idx * STRIDE < BLOCK_SIZE) | |
B[idx] = A[idx * STRIDE]; | |
// STRIDE * 4 bytes stride read (STRIDE * 4 bytes float) |
Ampere (GA10x GPU): 6144 KB L2 Cache (12 32-bit memory controllers (384-bit total), 512 KB of L2 cache is paired with each 32-bit memory controller) Each SM: 128 CUDA Cores, 4 3rd-generation Tensor Cores, a 256 KB Register File, 128 KB of L1/Shared Memory. each SM has 4 partitions (a 64 KB Register File, one 3rd-generation Tensor Core, an L0 instruction cache, one warp scheduler, one dispatch unit, and sets of math and other units). The four partitions share a combined 128 KB L1 data cache/shared memory subsystem.
Turing and Volta SMs support concurrent execution of FP32 and INT32 operations
ncu --list-sets # The configuration for sets. A set defines a set of sections.
ncu --list-sections # The configuration for sections. A section defines a set of metrics.
ncu --query-metrics # All individual metrics.
ncu --query-metrics-mode suffix --metrics <metrics list> # Check various suffixes for a base metric name.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
### AMDGPU | |
cmake -G 'Ninja' \ | |
-DCMAKE_CXX_FLAGS="-D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11" \ | |
-DCMAKE_BUILD_TYPE=Release \ | |
-DLLVM_ENABLE_PROJECTS="clang;lld;compiler-rt;libcxx;libcxxabi;" \ | |
-DLLVM_TARGETS_TO_BUILD="AMDGPU;X86;NVPTX" \ | |
-DLLVM_ENABLE_ASSERTIONS=On \ | |
../llvm | |
NewerOlder