How to enable GPU for AI in Linux

From vCloud.ai Documentation
Jump to navigation Jump to search

What you require

1. OS: Ubuntu 22.04

2. Nvidia GPU

3. Driver Version: 550 (check if the driver is installed correctly using: lspci | grep -i nvidia)

4. Update docker-compose config

NVIDIA driver installation

Install CUDA Keyring and update the system

sudo apt-get update && apt-get upgrade -y
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update

*If Secure boot is enabled, use signed drivers (550-open)

Install the CUDA Toolkit and required drivers

sudo apt-get -y install cuda-toolkit-12-4
sudo apt-get install -y nvidia-driver-550-open
sudo apt-get install -y cuda-drivers-550

Install NVIDIA Docker

sudo apt-get install -y nvidia-docker2

Edit Docker daemon.json to use NVIDIA runtime

nano /etc/docker/daemon.json

{

  "log-driver": "json-file",

  "log-opts": {

   "max-size": "100m",

   "max-file": "3"

  },

  "default-runtime": "nvidia",

  "runtimes": {

   "nvidia": {

     "path": "/usr/bin/nvidia-container-runtime",

     "runtimeArgs": []

   }

  }

}

Share CUDA Drivers by setting environment variables

sudo systemctl restart docker
export PATH=/usr/local/cuda-12.4/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-12.4/lib64:$LD_LIBRARY_PATH 

Prepare the VMS

Amend the ai part in docker-compose.yml

nano docker-compose.yml

ai:

   image: vcloudaiorg/vcloudai-vms-ai:latest

   container_name: ai

   restart: always

   privileged: true

   volumes:

     - ./static/customAIModels:/app/Models/customAIModels

     - ./static/customAIstatic:/app/static

     - ./static/customAIStorage:/app/Storage

     - /etc/timezone:/etc/timezone

     - /etc/localtime:/etc/localtime

   network_mode: host

   depends_on:

     - appback

     - appfont

   environment:

     MYSQL_HOST: ${MACHINE_HOST}

     MYSQL_USER: ${MYSQL_USER}

     MYSQL_PASSWORD: ${MYSQL_PASSWORD}

     MYSQL_DB: ${MYSQL_DB}

     EXTERNAL_MYSQL_PORT: ${EXTERNAL_MYSQL_PORT}

     MACHINE_HOST: ${MACHINE_HOST}

     APP_HOST: ${APP_HOST}

     SERVER_PORT: ${LOCAL_PORT}

     SECRET_WORD: ${SECRET_WORD}

     AI_PORT: ${AI_PORT}

     NVIDIA_VISIBLE_DEVICES: all

     NVIDIA_DRIVER_CAPABILITIES: all

   runtime: nvidia

   deploy:

     resources:

       reservations:

         devices:

           - driver: nvidia

             count: all

             capabilities: [gpu]

Save and restart ai container

docker-compose down ai
docker-compose up ai -d