For solving pytorch-version not match system's cuda.
e.g.
ld: cannot find -lcurand: 没有那个文件或目录
collect2: error: ld returned 1 exit status
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
- go to https://anaconda.org/nvidia/cuda-toolkit, select special version download
- install
cudnn
by default
ll xxx/miniconda/lib/libcurand.so
or ll xxx/miniconda/envs/xx/lib/libcurand.so
export LIBRARY_PATH=xxx/miniconda/envs/xxx/lib/
export LD_LIBRARY_PATH=xxx/miniconda/envs/xxx/lib/
If install deepspeed failed, try CUDA_HOME=xxx/miniconda3 pip install deepspeed