前言

部署KYC人脸对比服务,需要GPU支持,生产环境都是容器,所以就需要捣鼓下docker如何支持GPU。

准备工作

服务器类型:AWS-g4
GPU型号 :T4 (要求显卡驱动版本大于520)
OS Version:Ubuntu 22.04

部署显卡驱动

根据所创建服务器的显卡型号,去nvidia官网下载指定的驱动,也可以根据aws官网文档从s3里面cp对应的驱动,我这里直接去nvidia官网下载相关驱动程序。 [ 部署kyc 人脸识别(face api,需要显卡驱动大于520,g2类型服务器的显卡k520,在官网没有大于520版本的驱动,) ]

sudo apt-get upgrade -y linux-aws
reboot
sudo apt-get install -y gcc make linux-headers-$(uname -r)
mkdir /data
cd /data/
mkdir software
cd software/
wget https://us.download.nvidia.com/tesla/535.104.05/nvidia-driver-local-repo-ubuntu2204-535.104.05_1.0-1_amd64.deb
dpkg -i nvidia-driver-local-repo-ubuntu2204-535.104.05_1.0-1_amd64.deb 
sudo cp /var/nvidia-driver-local-repo-ubuntu2204-535.104.05/nvidia-driver-local-62140ACB-keyring.gpg /usr/share/keyrings/
dpkg -i nvidia-driver-local-repo-ubuntu2204-535.104.05_1.0-1_amd64.deb 
cat << EOF | sudo tee --append /etc/modprobe.d/blacklist.conf
blacklist vga16fb
blacklist nouveau
blacklist rivafb
blacklist nvidiafb
blacklist rivatv
EOF
GRUB_CMDLINE_LINUX="rdblacklist=nouveau"
/var/nvidia-driver-local-repo-ubuntu2204-535.104.05/nvidia-driver-local-62140ACB-keyring.gpg

部署cuda

distribution=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/\.//g')
wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/cuda-keyring_1.0-1_all.deb
cuda-keyring.
sudo apt-key del 7fa2af80
directories
dpkg -i cuda-keyring_1.0-1_all.deb 
apt-get update
sudo apt-get -y install cuda-drivers


sudo apt-get install ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
nvidia-smi 

部署runtime

sudo apt-get install nvidia-container-runtime

测试

使用如下nvidia-smi命令,可以查看到kyc容器已经开始使用GPU了。

root@ip-192-115-111-202:~# nvidia-smi 
Mon Sep 11 04:17:51 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       On  | 00000000:00:1E.0 Off |                    0 |
| N/A   42C    P0              29W /  70W |  13828MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      4178      C   /app/.venv/bin/python                      2812MiB |
|    0   N/A  N/A      4219      C   /app/.venv/bin/python                      2800MiB |
|    0   N/A  N/A      4525      C   /app/.venv/bin/python                      1822MiB |
|    0   N/A  N/A      4707      C   /app/.venv/bin/python                      2704MiB |
|    0   N/A  N/A      4877      C   /app/.venv/bin/python                      1832MiB |
|    0   N/A  N/A      4917      C   /app/.venv/bin/python                      1854MiB |
+---------------------------------------------------------------------------------------+

文档

文章目录