Skip to main content

Install drivers on a cloud server with GPU

For your information

This is an instruction with an example of how to install drivers on a cloud server created from a pre-built image Ubuntu 24.04 LTS 64-bit.

To ensure stable operation of the NVIDIA® GPU on a cloud server with GPU, you need to install drivers.

If you created a cloud server from a pre-built image that is GPU-optimized, the drivers are already installed, and no additional installation is required. GPU-optimized pre-built images include:

  • Ubuntu 24.04 LTS 64-bit GPU Driver 535;
  • Ubuntu 24.04 LTS 64-bit GPU Driver 535 Docker;
  • Ubuntu 24.04 LTS 64-bit GPU Driver 580;
  • Ubuntu 24.04 LTS 64-bit GPU Driver 580 Docker;
  • Ubuntu 22.04 LTS 64-bit GPU Driver 535;
  • Ubuntu 22.04 LTS 64-bit GPU Driver 535 Docker;
  • Ubuntu 22.04 LTS 64-bit GPU Driver 580;
  • Ubuntu 22.04 LTS 64-bit GPU Driver 580 Docker;
  • Data Science VM (Ubuntu 22.04 LTS 64-bit);
  • Data Analytics VM (Ubuntu 22.04 LTS 64-bit).

Install drivers

  1. Connect to the cloud server with GPU.

  2. Install the ubuntu-drivers-common package:

    sudo apt install -y ubuntu-drivers-common alsa-utils
  3. View the recommended driver version:

    sudo ubuntu-drivers devices

    A list of versions will appear in the response. The recommended version will be marked as recommended. Copy the recommended version.

    Example for an NVIDIA® Tesla T4 GPU with the recommended version nvidia-driver-550:

    == /sys/devices/pci0000:00/0000:00:06.0 ==
    modalias : pci:v000010DEd00001EB8sv000010DEsd000012A2bc03sc02i00
    vendor : NVIDIA Corporation
    model : TU104GL [Tesla T4]
    manual_install: True
    driver : nvidia-driver-450-server - distro non-free
    driver : nvidia-driver-535-server - distro non-free
    driver : nvidia-driver-470-server - distro non-free
    driver : nvidia-driver-470 - distro non-free
    driver : nvidia-driver-550 - third-party non-free recommended
    driver : nvidia-driver-418-server - distro non-free
    driver : xserver-xorg-video-nouveau - distro free builtin
  4. Optional: verify that the selected driver version is higher than the minimum compatible version for the cloud server's GPU architecture:

    sudo apt-cache search nvidia-driver-*

    A list of compatible driver versions will appear in the response. You can view the GPU architecture in the Graphics Processors (GPU) guide, and the correspondence between the driver version and architecture in the CUDA Compatibility documentation by NVIDIA®.

  5. If the GPU architecture is Pascal (for example, for an NVIDIA® GTX 1080), add the NVIDIA® Personal Package Archive repository to the cloud server:

    sudo add-apt-repository ppa:graphics-drivers/ppa -y
  6. Install the kernel headers:

    sudo apt update
    for kernel in $(linux-version list); do apt install -y "linux-headers-<kernel-version>"; done

    Specify <kernel-version> — the kernel version. You can view the list of kernel versions using the apt-cache search linux-image command.

  7. Install the driver:

    sudo apt install -y <driver_version>

    Specify <driver_version> — the driver version you copied in step 3.

    Example of installing the recommended version nvidia-driver-550 for an NVIDIA® Tesla T4 GPU:

    sudo apt install -y nvidia-driver-550
  8. Verify that the driver is installed and working:

    nvidia-smi

    The NVIDIA-SMI and driver versions will appear in the response. Example output:

    +-----------------------------------------------------------------------------------------+
    | NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
    |-----------------------------------------+------------------------+----------------------+
    | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
    | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
    | | | MIG M. |
    |=========================================+========================+======================|
    | 0 Tesla T4 Off | 00000000:00:06.0 Off | 0 |
    | N/A 41C P8 10W / 70W | 0MiB / 15360MiB | 0% Default |
    | | | N/A |
    +-----------------------------------------+------------------------+----------------------+

    +-----------------------------------------------------------------------------------------+
    | Processes: |
    | GPU GI CI PID Type Process name GPU Memory |
    | ID ID Usage |
    |=========================================================================================|
    | No running processes found |
    +-----------------------------------------------------------------------------------------+
  9. Open the configuration file for the unattended-upgrades package, which handles security updates:

    nano /etc/apt/apt.conf.d/50unattended-upgrades
  10. Disable package updates for NVIDIA®. To do this, add the following block to the file:

    Unattended-Upgrade::Package-Blacklist {
    "linux-";
    "nvidia-";
    };
  11. Exit the nano text editor and save your changes: press Ctrl+X, then Y+Enter.

  12. Optional: pin the kernel version to disable kernel updates. Updating the kernel version may cause errors in GPU driver operation.

Pin the kernel version

For your information

In pre-built images with pre-installed drivers, except for Data Analytics VM (Ubuntu 22.04 LTS 64-bit) and Data Science VM (Ubuntu 22.04 LTS 64-bit), the kernel version is already pinned.

During installation, drivers are compiled with the source code of the current kernel version's headers. Changing the kernel version leads to GPU driver failure. In that case, the following error may appear in the output of the nvidia-smi command:

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

To disable kernel updates, pin the kernel version in the settings of the apt package manager. After pinning it, you will be able to update the kernel version.

  1. Open the CLI.

  2. Create the pin-linux-kernel-nvidia-dkms file in the /etc/apt/preferences.d directory to pin the versions of the linux-headers and linux-image:

    cat <<EOF > /etc/apt/preferences.d/pin-linux-kernel-nvidia-dkms
    Package: linux-image-*
    Pin: version *
    Pin-Priority: -1

    Package: linux-headers-*
    pin: version *
    Pin-Priority: -1
    EOF

Update the kernel version after pinning

Once the kernel version is pinned, you cannot update it. To download security updates, performance improvements, and new features, remove the kernel version pinning file and update the version.

  1. Open the CLI.

  2. Delete the file you created to pin the kernel version:

    rm /etc/apt/preferences.d/pin-linux-kernel-nvidia-dkms
  3. Update the kernel version:

    apt install linux-image-<kernel-version>

    Specify <kernel-version> — the kernel version. You can view the list of kernel versions using the apt-cache search linux-image command.

  4. Reboot the cloud server.

  5. Install the kernel headers:

    apt install linux-headers-$(uname -r)

    After installing the kernel headers, the dkms utility will launch, which will automatically recompile the NVIDIA modules for the new kernel version.