前言
折腾waydroid时,重装了comfyui所在的宿主机系统。想了一下,为了更方便的折腾,把这个电脑也做成了虚拟化平台!抄作业过程如遇到问题,自行补充。因子方放的内容,不包括之前错误的,可能有些看似错误的步骤,对本次成功安装是有帮助的。
nvdia驱动卸载
可能会一直搞坏驱动,更换版本之类的,放一些卸载的命令,结合着用吧。例如使用run文件安装的驱动,应该使用nvidia-uninstall命令,而非apt。此外说一点,之所以使用run文件安装,是因为apt 安装时,会触动pve的钩子文件,导致无法正常安装。
nvidia-uninstall rm -rf /usr/local/cuda* rm -rf /etc/modprobe.d/nvidia* rm -rf /var/lib/nvidia rm -rf /usr/share/nvidia rm -rf /usr/bin/nvidia* rm -rf /usr/lib/x86_64-linux-gnu/nvidia sudo nvidia-uninstall sudo apt purge 'nvidia-*' -y sudo rm -rf /usr/local/cuda* sudo rm -f /usr/lib/x86_64-linux-gnu/libcuda* sudo rm -f /usr/lib/x86_64-linux-gnu/libnvidia* sudo rm -rf /etc/modprobe.d/nvidia* sudo rm -rf /lib/modules/$(uname -r)/kernel/drivers/video/nvidia*
自动配置
apt install pve-nvidia-vgpu-helper pve-nvidia-vgpu-helper setup
pve-nvidia-vgpu-helper工具将设置一些基本功能,例如将nouveau驱动程序列入黑名单、安装头文件包、DKMS 等等。
确认一些信息
如果没有安装驱动,下面只有一个crad和render。要记住哪个没有,安装驱动后出现的,那个就是nvdia的显卡,后面我们要绑定到lxc里面的。(当然没记住,后面再找就麻烦一点,这个是比较简单的办法)。我一开始就是没绑定card和render,无法进行计算。
另外补充一点,宿主机重新开机,启动CT提示/dev/nvidia-uvm没有挂载的,可以在宿主机允许nvidia-smi,再开启CT
root@mini:~# ls -l /dev/dri total 0 drwxr-xr-x 2 root root 120 Jul 31 14:16 by-path crw-rw---- 1 root video 226, 0 Jul 31 14:16 card0 crw-rw---- 1 root video 226, 1 Jul 31 14:16 card1 crw-rw---- 1 root render 226, 128 Jul 31 14:16 renderD128 crw-rw---- 1 root render 226, 129 Jul 31 14:16 renderD129
驱动下载
指引页面:https://www.nvidia.com/en-us/drivers/unix/
全部驱动:https://download.nvidia.com/XFree86/Linux-x86_64/
宿主机安装驱动
安装
考虑到显卡老爷爷,选择535
wget https://download.nvidia.com/XFree86/Linux-x86_64/535.261.03/NVIDIA-Linux-x86_64-535.261.03.run
sh NVIDIA-Linux-x86_64-535.261.03.run
验证1,确认驱动被正确识别
root@mini:~# nvidia-smi Thu Jul 31 11:02:15 2025 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.261.03 Driver Version: 535.261.03 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA P104-100 Off | 00000000:01:00.0 Off | N/A | | 63% 42C P0 42W / 180W | 0MiB / 8192MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | No running processes found | +---------------------------------------------------------------------------------------+
验证2,找到需要绑定的设备
root@mini:~# ls -al /dev/nvidia* crw-rw-rw- 1 root root 195, 0 Jul 31 10:24 /dev/nvidia0 crw-rw-rw- 1 root root 195, 255 Jul 31 10:24 /dev/nvidiactl crw-rw-rw- 1 root root 510, 0 Jul 31 10:57 /dev/nvidia-uvm crw-rw-rw- 1 root root 510, 1 Jul 31 10:57 /dev/nvidia-uvm-tools /dev/nvidia-caps: total 0 drwxr-xr-x 2 root root 80 Jul 31 10:57 . drwxr-xr-x 19 root root 4280 Jul 31 11:02 .. cr-------- 1 root root 235, 1 Jul 31 10:57 nvidia-cap1 cr--r--r-- 1 root root 235, 2 Jul 31 10:57 nvidia-cap2
# 如果是桌面环境,上面还应该显示/dev/nvidia-modeset
设备绑定
在对应lxc.conf中添加以下内容,或者在web界面绑定
dev0: /dev/nvidia0 dev1: /dev/nvidiactl dev2: /dev/nvidia-uvm dev3: /dev/nvidia-uvm-tools dev4: /dev/nvidia-caps/nvidia-cap1 dev5: /dev/nvidia-caps/nvidia-cap2 dev6: /dev/dri/card0,gid=44 dev7: /dev/dri/renderD129,gid=104
推送到CT容器安装
pct push 100 NVIDIA-Linux-x86_64-535.261.03.run /root/NVIDIA-Linux-x86_64-535.261.03.run sh NVIDIA-Linux-x86_64-535.261.03.run --no-kernel-modules
# LXC 容器共享宿主机的内核,所以安装时需要指定 –no-kernel-module 参数,避免安装内核模块
容器验证1,确认驱动正常识别
root@Comfy:~# nvidia-smi Thu Jul 31 11:13:16 2025 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.261.03 Driver Version: 535.261.03 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA P104-100 Off | 00000000:01:00.0 Off | N/A | | 64% 42C P0 39W / 180W | 0MiB / 8192MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | No running processes found | +---------------------------------------------------------------------------------------+
容器验证2,绑定状态是否一致
root@Comfy:~# ls -al /dev/nvidia* crw-rw---- 1 root root 509, 0 Jul 31 10:36 /dev/nvidia-uvm crw-rw---- 1 root root 509, 1 Jul 31 10:36 /dev/nvidia-uvm-tools crw-rw-rw- 1 root root 195, 0 Jul 31 10:36 /dev/nvidia0 crw-rw-rw- 1 root root 195, 255 Jul 31 10:36 /dev/nvidiactl /dev/nvidia-caps: total 0 drwxr-xr-x 2 root root 80 Jul 31 10:36 . drwxr-xr-x 7 root root 580 Jul 31 10:36 .. crw-rw---- 1 root root 234, 1 Jul 31 10:36 nvidia-cap1 crw-rw---- 1 root root 234, 2 Jul 31 10:36 nvidia-cap2
常规安装操作
apt install sudo git sudo usermod -aG sudo l apt install python3-venv python3-full -y # 验证下版本,等下要使用对应的wheel文件。详情看之前的文章 l@Comfy:~$ python3 --version Python 3.11.2
准备虚拟环境
python3 -m venv ~/comfyui-env source ~/comfyui-env/bin/activate pip install --upgrade pip pip config set global.index-url https://mirrors.cernet.edu.cn/pypi/web/simple
手动安装准备的文件
pip install /home/l/torch-2.6.0+cu118-cp311-cp311-linux_x86_64.whl pip install /home/l/torchaudio-2.6.0+cu118-cp311-cp311-linux_x86_64.whl pip install /home/l/torchvision-0.21.0+cu118-cp311-cp311-linux_x86_64.whl
下载对应源码
git clone --depth=1 https://github.com/comfyanonymous/ComfyUI cd /home/l/ComfyUI/ pip install -r requirements.txt ??完工!! python main.py --listen 0.0.0.0 ??一键进入命令 source ~/comfyui-env/bin/activate && cd ComfyUI && python main.py --listen 0.0.0.0
NVIDIA Container Toolkit
本来以为要成功了,结果跑图就崩溃了。又继续研究了一番,主要是添加了NVIDIA Container Toolkit。先放官方链接指引:https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
apt install gpg curl curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \ && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \ sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \ sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list sudo apt-get update export NVIDIA_CONTAINER_TOOLKIT_VERSION=1.17.8-1 sudo apt-get install -y \ nvidia-container-toolkit=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \ nvidia-container-toolkit-base=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \ libnvidia-container-tools=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \ libnvidia-container1=${NVIDIA_CONTAINER_TOOLKIT_VERSION}
再次运行,正常了!