CUDA#

Versions#

Make sure the versions used are correct

Installing Graphics Card Driver#

Guide

To install via CLI,

# Check recommendation
ubuntu-drivers devices
# Install recommendation
sudo ubuntu-drivers autoinstall
# Reboot
sudo shutdown -r now #One way to reboot
# Check installation
nvidia-smi

To install Nvidia Beta Drivers,

# Add Repository
sudo add-apt-repository ppa:graphics-drivers/ppa
# Check recommendation
ubuntu-drivers devices
# Install recommendation OR
sudo ubuntu-drivers autoinstall
# Install a selection
sudo apt install nvidia-driver-435
# Reboot
sudo reboot #Another way to reboot

Installing CUDA#

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda-repo-ubuntu1804-11-8-local_11.8.0-520.61.05-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1804-11-8-local_11.8.0-520.61.05-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu1804-11-8-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
      && curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
      && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
            sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
            sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

Test on PyTorch

#!/bin/bash
conda create --name cuda_env
conda activate cuda_env
conda install numpy pyyaml mkl mkl-include setuptools cmake cffi typing
conda install pytorch torchvision cudatoolkit=10.1 -c pytorch
#!/bin/python
# validate PyTorch:
from __future__ import print_function
import torch
x = torch.rand(5, 3)
print(x) # Check for a tensor output
torch.cuda.is_available()  # The output should be a boolean "True"

Test on TensorFlow

#!/bin/bash
conda activate cuda_env
pip install --upgrade tensorflow-gpu
#!/bin/python
from tensorflow.python.client import device_lib
device_lib.list_local_devices()

Installing cuDNN#

Download and install * cuDNN Runtime Library for Ubuntu18.04 (Deb) * cuDNN Developer Library for Ubuntu18.04 (Deb) * cuDNN Code Samples and User Guide for Ubuntu18.04 (Deb)

sudo apt install ./libcudnn7_7.6.5.32-1+cuda10.1_amd64.deb
sudo apt install ./libcudnn7-dev_7.6.5.32-1+cuda10.1_amd64.deb
sudo apt install ./libcudnn7-doc_7.6.5.32-1+cuda10.1_amd64.deb

Installing Tensorflow#

Guide

Installing tensorflow is tricky because of the CUDA, TensorRT and other library requirements. Using a container is recommended! The following NVIDIA® software must be installed on your system:

  • NVIDIA® GPU drivers —CUDA 10.1 requires 418.x or higher.
  • CUDA® Toolkit —TensorFlow supports CUDA  10.1 (TensorFlow >= 2.1.0)
  • CUPTI ships with the CUDA Toolkit.
  • cuDNN SDK (>= 7.6)
  • (Optional) TensorRT 6.0 to improve latency and throughput for inference on some models.

CUPTI#

sudo apt install libcupti-dev [ref]

TensorRT#

# To install
sudo apt install ./nv-tensorrt-repo-ubuntu1804-cuda10.2-trt6.0.1.8-ga-20191108_1-1_amd64.deb
sudo apt-get update
sudo apt-get install tensorrt
sudo apt-get install python3-libnvinfer-dev
sudo apt-get install uff-converter-tf
# Verify Installation
dpkg -l | grep TensorRT
#To uninstall TensorRT related files, use
sudo apt remove $(dpkg -l | grep TensorRT)

Troubleshoot#

If you see the following error, install TensorRT and associated files. Make sure to install the correct version.

>>> from tensorflow.python.client import device_lib
2020-03-29 20:22:13.480239: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:
2020-03-29 20:22:13.480305: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:
2020-03-29 20:22:13.480313: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.

When downgrading, if you spot the following error

Reading package lists... Done
E: The repository 'file:/var/nv-tensorrt-repo-cuda10.2-trt7.0.0.11-ga-20191216  Release' does not have a Release file.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.

Run dpkg -l | grep tensor

(cuda_env) jon@Jon-AI-Rig:~/Downloads$ dpkg -l | grep tensor
ii  nv-tensorrt-repo-ubuntu1804-cuda10.2-trt6.0.1.8-ga-20191108  1-1                                              amd64        nv-tensorrt repository configuration files
rc  nv-tensorrt-repo-ubuntu1804-cuda10.2-trt7.0.0.11-ga-20191216 1-1                                              amd64        nv-tensorrt repository configuration files

Purge it using sudo dpkg -P nv-tensorrt-repo-ubuntu1804-cuda10.2-trt7.0.0.11-ga-20191216

(cuda_env) jon@Jon-AI-Rig:~/Downloads$ dpkg -l | grep tensor
ii  nv-tensorrt-repo-ubuntu1804-cuda10.2-trt6.0.1.8-ga-20191108 1-1                                              amd64        nv-tensorrt repository configuration files

For Driver/Library Driver mismatch, run link

dmesg | grep NVRM
nvidia-smi

For ummet dependencies,

sudo ubuntu-drivers autoinstall # Try. If it doesnt work, proceed
sudo apt-add-repository -r ppa:graphics-drivers/ppa
sudo apt update
sudo apt remove nvidia*
sudo apt autoremove
sudo ubuntu-drivers autoinstall
sudo apt install aptitude
sudo aptitude install <name_of_package_with_conflicts>