Install drivers on the cloud server with the GPU
This is an instruction with an example of installing drivers on a cloud server that is created from a finished image Ubuntu 24.04 LTS 64-bit.
For stable operation of NVIDIA® GPUs on a cloud server with a GPU you need to install the drivers.
If you have created a cloud server from finished image with GPU optimization, drivers are already installed on it, no additional installation is required. Ready images with GPU-optimization:
- Ubuntu 24.04 LTS 64-bit GPU driver;
- Ubuntu 24.04 LTS 64-bit CUDA 11.8 Docker;
- Ubuntu 24.04 LTS 64-bit CUDA 12.8 Docker;
- Ubuntu 22.04 LTS 64-bit GPU driver;
- Ubuntu 22.04 LTS 64-bit CUDA 11.8 Docker;
- Ubuntu 22.04 LTS 64-bit CUDA 12.8 Docker;
- Data Science VM (Ubuntu 22.04 LTS 64-bit);
- Data Analytics VM (Ubuntu 22.04 LTS 64-bit).
Install drivers
-
Install the package
ubuntu-drivers-common
:sudo apt install -y ubuntu-drivers-common alsa-utils
-
Check out the recommended driver version:
sudo ubuntu-drivers devices
A list of versions will appear in the response. The recommended version will be marked as
recommended
. Copy the recommended version.Example for NVIDIA® Tesla T4 GPU with recommended version
nvidia-driver-550
:== /sys/devices/pci0000:00/0000:00:06.0 ==
modalias : pci:v000010DEd00001EB8sv000010DEsd000012A2bc03sc02i00
vendor : NVIDIA Corporation
model : TU104GL [Tesla T4]
manual_install: True
driver : nvidia-driver-450-server - distro non-free
driver : nvidia-driver-535-server - distro non-free
driver : nvidia-driver-470-server - distro non-free
driver : nvidia-driver-470 - distro non-free
driver : nvidia-driver-550 - third-party non-free recommended
driver : nvidia-driver-418-server - distro non-free
driver : xserver-xorg-video-nouveau - distro free builtin -
Optional: verify that the selected driver version is higher than the minimum compatible version for the cloud server GPU architecture:
sudo apt-cache search nvidia-driver-*
A list of compatible driver versions will appear in the response. To see the GPU architecture, see the instructions Create a cloud server with GPU and the driver version and architecture correspondence is in the manual CUDA Compatibility NVIDIA® documentation.
-
If your GPU architecture is Pascal (such as the NVIDIA® GTX 1080), add the NVIDIA® Personal Package Archive repository to the cloud server:
sudo add-apt-repository ppa:graphics-drivers/ppa -y
-
Set the kernel headers:
sudo apt update
for kernel in $(linux-version list); do apt install -y "linux-headers-<kernel-version>"; doneSpecify
<kernel-version>
— kernel version. The list of kernel versions can be viewed using the commandapt-cache search linux-image
. -
Install the driver:
sudo apt install -y <driver_version>
Specify
<driver_version>
— the driver version you copied in step 3.Example of installing the recommended version
nvidia-driver-550
for NVIDIA® Tesla T4 GPUs:sudo apt install -y nvidia-driver-550
-
Check that the driver is installed and working:
nvidia-smi
The response will show NVIDIA-SMI versions, driver versions, and a CUDA version that is compatible with the current driver version but is not installed on the system. The CUDA Runtime API and CUDA Toolkit are installed separately and are not included in the package
nvidia-driver
. Example answer:+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla T4 Off | 00000000:00:06.0 Off | 0 |
| N/A 41C P8 10W / 70W | 0MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+ -
Open the configuration file of the package
unattended-upgrades
that works with security updates:nano /etc/apt/apt.conf.d/50unattended-upgrades
-
Disable package updates for NVIDIA®. To do this, add a block to the file:
Unattended-Upgrade::Package-Blacklist {
"linux-";
"nvidia-";
}; -
Exit the text editor
nano
while saving changes: press Ctrl+Xand then Y+Enter. -
Optional: lock the kernel version to disable kernel update. Updating the kernel version may cause errors in GPU drivers.
Commit kernel version
In the ready images with pre-installed drivers, except for Data Analytics VM (Ubuntu 22.04 LTS 64-bit) and Data Science VM (Ubuntu 22.04 LTS 64-bit), the kernel version is already fixed.
Drivers are compiled with the source code headers of the current kernel version during the installation process. Changing the kernel version causes the GPU driver to fail. In this case, the output of the command nvidia-smi
the following error may occur:
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
To disable kernel updates, commit the kernel version in the package manager settings apt
. Once you're locked in, you'll be able to upgrade the kernel.
-
Open the CLI.
-
Create a file
pin-linux-kernel-nvidia-dkms
in the directory/etc/apt/preferences.d
to fix the version of the packageslinux-headers
andlinux-image
:cat <<EOF > /etc/apt/preferences.d/pin-linux-kernel-nvidia-dkms
Package: linux-image-*
Pin: version *
Pin-Priority: -1
Package: linux-headers-*
pin: version *
Pin-Priority: -1
EOF
Update the kernel version after committing
Once you commit a kernel version, you cannot update it. To download security updates, performance improvements, and add new features, delete the kernel version commit file and upgrade the version.
-
Open the CLI.
-
Delete the file you created for kernel version lock:
rm /etc/apt/preferences.d/pin-linux-kernel-nvidia-dkms
-
Update the kernel version:
apt install linux-image-<kernel-version>
Specify
<kernel-version>
— kernel version. The list of kernel versions can be viewed using the commandapt-cache search linux-image
. -
Set the kernel headers:
apt install linux-headers-$(uname -r)
Once the kernel headers are installed, the utility will run
dkms
which will automatically rebuild NVIDIA modules for the new kernel version.