CUDA#
Versions#
Make sure the versions used are correct
Installing Graphics Card Driver#
To install via CLI,
# Check recommendation
ubuntu-drivers devices
# Install recommendation
sudo ubuntu-drivers autoinstall
# Reboot
sudo shutdown -r now #One way to reboot
# Check installation
nvidia-smi
To install Nvidia Beta Drivers,
# Add Repository
sudo add-apt-repository ppa:graphics-drivers/ppa
# Check recommendation
ubuntu-drivers devices
# Install recommendation OR
sudo ubuntu-drivers autoinstall
# Install a selection
sudo apt install nvidia-driver-435
# Reboot
sudo reboot #Another way to reboot
Installing CUDA#
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda-repo-ubuntu1804-11-8-local_11.8.0-520.61.05-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1804-11-8-local_11.8.0-520.61.05-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu1804-11-8-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
Test on PyTorch
#!/bin/bash
conda create --name cuda_env
conda activate cuda_env
conda install numpy pyyaml mkl mkl-include setuptools cmake cffi typing
conda install pytorch torchvision cudatoolkit=10.1 -c pytorch
#!/bin/python
# validate PyTorch:
from __future__ import print_function
import torch
x = torch.rand(5, 3)
print(x) # Check for a tensor output
torch.cuda.is_available() # The output should be a boolean "True"
Test on TensorFlow
#!/bin/bash
conda activate cuda_env
pip install --upgrade tensorflow-gpu
#!/bin/python
from tensorflow.python.client import device_lib
device_lib.list_local_devices()
Installing cuDNN#
Download and install * cuDNN Runtime Library for Ubuntu18.04 (Deb) * cuDNN Developer Library for Ubuntu18.04 (Deb) * cuDNN Code Samples and User Guide for Ubuntu18.04 (Deb)
sudo apt install ./libcudnn7_7.6.5.32-1+cuda10.1_amd64.deb
sudo apt install ./libcudnn7-dev_7.6.5.32-1+cuda10.1_amd64.deb
sudo apt install ./libcudnn7-doc_7.6.5.32-1+cuda10.1_amd64.deb
Installing Tensorflow#
Installing tensorflow is tricky because of the CUDA, TensorRT and other library requirements. Using a container is recommended! The following NVIDIA® software must be installed on your system:
- NVIDIA® GPU drivers —CUDA 10.1 requires 418.x or higher.
- CUDA® Toolkit —TensorFlow supports CUDA 10.1 (TensorFlow >= 2.1.0)
- CUPTI ships with the CUDA Toolkit.
- cuDNN SDK (>= 7.6)
- (Optional) TensorRT 6.0 to improve latency and throughput for inference on some models.
CUPTI#
sudo apt install libcupti-dev
[ref]
TensorRT#
- Install TensorRT 6.
- To upgrade TensorRT etc, refer to the website.
- Check the TensorRT Version that Tensorflow uses.
- NVIDIA TensorRT 6.x Download
# To install
sudo apt install ./nv-tensorrt-repo-ubuntu1804-cuda10.2-trt6.0.1.8-ga-20191108_1-1_amd64.deb
sudo apt-get update
sudo apt-get install tensorrt
sudo apt-get install python3-libnvinfer-dev
sudo apt-get install uff-converter-tf
# Verify Installation
dpkg -l | grep TensorRT
#To uninstall TensorRT related files, use
sudo apt remove $(dpkg -l | grep TensorRT)
Troubleshoot#
If you see the following error, install TensorRT and associated files. Make sure to install the correct version.
>>> from tensorflow.python.client import device_lib
2020-03-29 20:22:13.480239: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:
2020-03-29 20:22:13.480305: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:
2020-03-29 20:22:13.480313: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
When downgrading, if you spot the following error
Reading package lists... Done
E: The repository 'file:/var/nv-tensorrt-repo-cuda10.2-trt7.0.0.11-ga-20191216 Release' does not have a Release file.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
Run dpkg -l | grep tensor
(cuda_env) jon@Jon-AI-Rig:~/Downloads$ dpkg -l | grep tensor
ii nv-tensorrt-repo-ubuntu1804-cuda10.2-trt6.0.1.8-ga-20191108 1-1 amd64 nv-tensorrt repository configuration files
rc nv-tensorrt-repo-ubuntu1804-cuda10.2-trt7.0.0.11-ga-20191216 1-1 amd64 nv-tensorrt repository configuration files
Purge it using sudo dpkg -P nv-tensorrt-repo-ubuntu1804-cuda10.2-trt7.0.0.11-ga-20191216
(cuda_env) jon@Jon-AI-Rig:~/Downloads$ dpkg -l | grep tensor
ii nv-tensorrt-repo-ubuntu1804-cuda10.2-trt6.0.1.8-ga-20191108 1-1 amd64 nv-tensorrt repository configuration files
For Driver/Library Driver mismatch, run link
dmesg | grep NVRM
nvidia-smi
sudo ubuntu-drivers autoinstall # Try. If it doesnt work, proceed
sudo apt-add-repository -r ppa:graphics-drivers/ppa
sudo apt update
sudo apt remove nvidia*
sudo apt autoremove
sudo ubuntu-drivers autoinstall
sudo apt install aptitude
sudo aptitude install <name_of_package_with_conflicts>