- nvidia
- grub
- cuda
- nvcc -V 版本不一致
- cudnn
- wsl ubuntu
- error
- docker: Error response from daemon: could not select device driver “” with capabilities: [[gpu]].ERRO[0014] error waiting for container: context canceled
- docker: Error response from daemon: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown.
- error
- test
- Setting up CUDA Toolkit">Setting up CUDA Toolkit
- error
- Pytorch
nvidia
error
sudo apt-get purge nvidia-* //删除可能存在的已有驱动
sudo apt-get autoremove
# sudo add-apt-repository ppa:graphics-drivers/ppa # 该方法不稳定
sudo apt-get update
#或者
#查看显卡设备和显卡驱动
ubuntu-drivers devices
#自动安装合适显卡驱动
sudo ubuntu-drivers autoinstall
禁用更新:
出现了:sudo apt-get install nvidia-*安装方式造成登录界面循环。
ubuntu-drivers autoinstall
禁用nouveau
sudo gedit /etc/modprobe.d/blacklist.conf
blacklist nouveau
sudo apt-mark hold nvidia-driver-440
禁用nouveau
lsmod | grep nouveau
方法1
sudo gedit /etc/modprobe.d/blacklist.conf
blacklist nouveau
options nouveau modeset=0
方法 2
DEST_FILE=/etc/modprobe.d/blacklist.confsudo
echo “” >> ${DEST_FILE}
sudo echo “blacklist vga16fb” >> ${DEST_FILE}
sudo echo “blacklist nouveau” >> ${DEST_FILE}
sudo echo “blacklist rivafb” >> ${DEST_FILE}
sudo echo “blacklist nvidiafb” >> ${DEST_FILE}
sudo echo “blacklist rivatv” >> ${DEST_FILE}
sudo echo “” >> ${DEST_FILE}
blacklist vga16fb
blacklist nouveau
blacklist rivafb
blacklist nvidiafb
blacklist rivatv
方法3
sudo find -name nouveau* |grep ko
找到ko文件,然后改名:
在当前系统内核移除驱动
mv /lib/modules/3.0.0-12-generic/kernel/drivers/gpu/drm/nouveau/nouveau.ko /lib/modules/3.0.0-12-generic/kernel/drivers/gpu/drm/nouveau/nouveau.ko.org
grub
GRUB_CMDLINE_LINUX_DEFAULT=”quiet splash nomodeset” quiet splash nomodeset
cuda
nvcc -V
sudo apt install nvidia-cuda-toolkit
添加环境变量
# cuda11
# sudo gedit ~/.bashrc
export LD_LIBRARY_PATH='/usr/lib/x86_64-linux-gnu/libcuda.so'
LD_LIBRARY_PATH="/usr/lib/x86_64-linux-gnu/libcudnn.so:$LD_LIBRARY_PATH"
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
nvcc
# cuda /gpu
export LD_LIBRARY_PATH='/usr/lib/x86_64-linux-gnu/libcuda.so'
LD_LIBRARY_PATH="/usr/lib/x86_64-linux-gnu/libcudnn.so:$LD_LIBRARY_PATH"
export CUDNN_HOME='/usr/local/cuda'
export LD_LIBRARY_PATH=${CUDNN_HOME}/lib64:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH="/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH"
export CPLUS_INCLUDE_PATH=${CUDNN_HOME}/include:$CPLUS_INCLUDE_PATH
https://developer.nvidia.com/cuda-toolkit-archive
nvcc -V 版本不一致
sudo apt autoremove nvidia-cuda-toolkit
确保/usr/local/cuda/bin 下存在nvcc
.bashrc 添加
export PATH=$PATH:/usr/local/cuda/bin
cudnn
wsl ubuntu
windows wsl ubuntu install
https://docs.microsoft.com/zh-cn/windows/wsl/install-win10
installing-nvidia-docker
https://docs.nvidia.com/cuda/wsl-user-guide/index.html#installing-nvidia-docker
wsl ubunt cuda install
安装流程
https://blog.csdn.net/weixin_36815313/article/details/107641262
error
docker: Error response from daemon: could not select device driver “” with capabilities: [[gpu]].ERRO[0014] error waiting for container: context canceled
sudo service docker start
docker: Error response from daemon: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown.
https://github.com/NVIDIA/nvidia-docker/issues/1409
test
docker run —gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmarkSetting up CUDA Toolkit
It is recommended to use the Linux package manager to install the CUDA for the Linux distributions supported under WSL 2. Follow these instructions to install the CUDA Toolkit. First, set up the CUDA network repository. The instructions shown here are for Ubuntu 18.04. See theCUDA Linux Installation Guidefor more information on other distributions. Now install CUDA. Note that for WSL 2, you should use thecuda-toolkit-error
Ubuntu中Failed to initialize NVML: Driver/library version mismatch问题的解决
问题描述
nvidia-smi
Failed to initialize NVML: Driver/library version mismatch
问题根源
NVIDIA 内核驱动版本与系统驱动不一致
解决方案
以下是我的解决方案,卸载电脑驱动,重装**NVIDIA 内核驱动**版本与nvidia 内核版本匹配
输入下条命令,查看你的显卡驱动所使用的内核版本
cat /proc/driver/nvidia/version
可以看出Kernel Module 版本号为
输入下条命令,查看电脑驱动
cat /var/log/dpkg.log | grep nvidia
输入下条命令,查看电脑所有驱动
sudo dpkg --list | grep nvidia-*
查找本机内核版本:
cat /proc/driver/nvidia/version
发现为440.33.01
sudo apt remove libnvidia-compute-* --purge
sudo apt remove nvidia* --purge
于是尝试安装440版本的驱动:
sudo apt-get install nvidia-driver-440 nvidia-settings nvidia-prime
Pytorch
Pytorch==1.7.1
conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=11.0 -c pytorch
anaconda3/envs/th1.7/lib/python3.8/site-packages/torch/lib/../../../../libcublas.so.11: undefined symbol: free_gemm_select, version libcublasLt.so.11
查了系统中也已经安装好对应的动态库。
find /usr -name libcublas.so*
cp /usr/*//libcublas.so.11 path/anaconda3/envs/th1.7/lib/
解决问题