PyTorch related
Training related
Beyond an optimal number (experiment!), throwing more worker processes at the IOPS barrier WILL NOT HELP, it’ll make it worse. Try
htop
ortop
to check CPU utilization before increasingnum_workers
.
<DistributedDataParallel (DDP):> Suitable for single-machine multi-card/multi-machine multi-card, not limited by GIL due to the use of multiprocessing parallelism. When using DDP, the model is replicated on each process and each model copy is fed with a different set of inputs. DDP maintains synchronization of model copies through gradient communication. Internal Design
- Switch CUDA version
Modify the
PATH
variable in~/.bashrc
and~/.profile
to switch CUDA version. For example, to switch from CUDA 10.2 to CUDA 11.1, add the following lines to~/.bashrc
and~/.profile
:
export CUDA_HOME=/usr/local/cuda
export CUDA_PATH=/usr/local/cuda
export LD_LIBRARY_PATH=${CUDA_HOME}/lib64:$LD_LIBRARY_PATH
export PATH=${CUDA_HOME}/bin:${PATH}
Use softlink to switch between different CUDA versions.
sudo ln -s /usr/local/cuda-{version} /usr/local/cuda