# 主机能看到 GPU 占用进程号,但不知道运行在哪
$ nvidia-smi
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1794854 C ./bin/llama_example 31534MiB |
| 1 N/A N/A 1916178 C ...onserver/bin/tritonserver 15454MiB |
| 2 N/A N/A 1809897 C /usr/bin/python3 16738MiB |
| 7 N/A N/A 3306896 C /bin/python 39964MiB |
+-----------------------------------------------------------------------------+
# 核心指令
# 比如查看 PID 1809897
# 找到了目标容器的启动命令, 以及容器名称为 lmdeploy
$ ps -e -o pid,cmd,comm,cgroup | grep 1809897
1809897 /usr/bin/python3 /usr/local lmdeploy 12:perf_event:/system.slice/docker-d33d781cbad158192f3819f1118f46ccb1979298a563f6d19f627d8602e6edfb.scope,11:freezer:/system.slice
3758737 grep --color=auto 1809897 grep 8:devices:/system.slice/ssh.service,7:pids:/system.slice/ssh.service,6:blkio:/system.slice/ssh.service,4:cpu,cpuacct:/system.slice/ssh.service,3:memory:/system.slice/ssh.service,1:name=systemd:/system.slice/ssh.service,0::/system.slice/ssh.service
# 再次通过 docker container id 确认
$ docker ps | grep d33d781cbad1
d33d781cbad1 nvcr.io/nvidia/tritonserver:22.12-py3 ... lmdeploy
# 此时可以 exec 进入容器,进行删除
网友评论