nvidia
- 禁用nouveau
grub
cuda
nvcc -V 版本不一致
cudnn
wsl ubuntu
- error
  - docker: Error response from daemon: could not select device driver “” with capabilities: [[gpu]].ERRO[0014] error waiting for container: context canceled
  - docker: Error response from daemon: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown.
test
Setting up CUDA Toolkit">Setting up CUDA Toolkit
error
- Ubuntu中Failed to initialize NVML: Driver/library version mismatch问题的解决
Pytorch
- Pytorch==1.7.1
  - 查了系统中也已经安装好对应的动态库。

nvidia

sudo apt-get purge nvidia-*  //删除可能存在的已有驱动
sudo apt-get autoremove 
# sudo add-apt-repository ppa:graphics-drivers/ppa # 该方法不稳定
sudo apt-get update
#或者
＃查看显卡设备和显卡驱动
ubuntu-drivers devices 
＃自动安装合适显卡驱动
sudo ubuntu-drivers autoinstall

error

出现了：sudo apt-get install nvidia-*安装方式造成登录界面循环。
ubuntu-drivers autoinstall  
禁用nouveau 
sudo gedit /etc/modprobe.d/blacklist.conf
blacklist nouveau

禁用更新：

sudo apt-mark hold nvidia-driver-440

禁用nouveau

lsmod | grep nouveau

方法1

sudo gedit /etc/modprobe.d/blacklist.conf

blacklist nouveau

options nouveau modeset=0

方法 2

DEST_FILE=/etc/modprobe.d/blacklist.confsudo

echo “” >> ${DEST_FILE}

sudo echo “blacklist vga16fb” >> ${DEST_FILE}

sudo echo “blacklist nouveau” >> ${DEST_FILE}

sudo echo “blacklist rivafb” >> ${DEST_FILE}

sudo echo “blacklist nvidiafb” >> ${DEST_FILE}

sudo echo “blacklist rivatv” >> ${DEST_FILE}

sudo echo “” >> ${DEST_FILE}

blacklist vga16fb

blacklist nouveau

blacklist rivafb

blacklist nvidiafb

blacklist rivatv

方法3

sudo find -name nouveau* |grep ko

找到ko文件，然后改名：

在当前系统内核移除驱动

mv /lib/modules/3.0.0-12-generic/kernel/drivers/gpu/drm/nouveau/nouveau.ko /lib/modules/3.0.0-12-generic/kernel/drivers/gpu/drm/nouveau/nouveau.ko.org

grub

GRUB_CMDLINE_LINUX_DEFAULT=”quiet splash nomodeset” quiet splash nomodeset

cuda

 nvcc -V
 sudo apt install nvidia-cuda-toolkit

添加环境变量

# cuda11
# sudo gedit ~/.bashrc
export LD_LIBRARY_PATH='/usr/lib/x86_64-linux-gnu/libcuda.so'
LD_LIBRARY_PATH="/usr/lib/x86_64-linux-gnu/libcudnn.so:$LD_LIBRARY_PATH"
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
nvcc

# cuda /gpu
export LD_LIBRARY_PATH='/usr/lib/x86_64-linux-gnu/libcuda.so'
LD_LIBRARY_PATH="/usr/lib/x86_64-linux-gnu/libcudnn.so:$LD_LIBRARY_PATH"
export CUDNN_HOME='/usr/local/cuda'
export LD_LIBRARY_PATH=${CUDNN_HOME}/lib64:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH="/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH"
export CPLUS_INCLUDE_PATH=${CUDNN_HOME}/include:$CPLUS_INCLUDE_PATH

https://developer.nvidia.com/cuda-toolkit-archive

nvcc -V 版本不一致

sudo apt autoremove nvidia-cuda-toolkit

确保/usr/local/cuda/bin 下存在nvcc

.bashrc 添加

export PATH=$PATH:/usr/local/cuda/bin

cudnn

wsl ubuntu

windows wsl ubuntu install

https://docs.microsoft.com/zh-cn/windows/wsl/install-win10

installing-nvidia-docker

https://docs.nvidia.com/cuda/wsl-user-guide/index.html#installing-nvidia-docker

wsl ubunt cuda install

https://developer.nvidia.com/zh-cn/cuda-downloads?target_os=Linux&target_arch=x86_64&target_distro=WSLUbuntu&target_version=20&target_type=deblocal

安装流程

https://blog.csdn.net/weixin_36815313/article/details/107641262

error

docker: Error response from daemon: could not select device driver “” with capabilities: [[gpu]].ERRO[0014] error waiting for container: context canceled

sudo service docker start

docker: Error response from daemon: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown.

https://github.com/NVIDIA/nvidia-docker/issues/1409

test

docker run —gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark

Setting up CUDA Toolkit

It is recommended to use the Linux package manager to install the CUDA for the Linux distributions supported under WSL 2. Follow these instructions to install the CUDA Toolkit. First, set up the CUDA network repository. The instructions shown here are for Ubuntu 18.04. See theCUDA Linux Installation Guidefor more information on other distributions. Now install CUDA. Note that for WSL 2, you should use thecuda-toolkit-meta-package to avoid installing the NVIDIA driver that is typically bundled with the toolkit. You can also install other components of the toolkit by choosing the rightmeta-package. Do not choose thecuda,cuda-11-0, orcuda-driversmeta-packages under WSL 2 since these packages will result in an attempt to install the Linux NVIDIA driver under WSL 2. apt-get install -y cuda-toolkit-11-0

error

Ubuntu中Failed to initialize NVML: Driver/library version mismatch问题的解决

问题描述

nvidia-smi
Failed to initialize NVML: Driver/library version mismatch

问题根源

NVIDIA 内核驱动版本与系统驱动不一致

解决方案

以下是我的解决方案，卸载电脑驱动，重装**NVIDIA 内核驱动**版本与nvidia 内核版本匹配

输入下条命令，查看你的显卡驱动所使用的内核版本

cat /proc/driver/nvidia/version

可以看出Kernel Module 版本号为

输入下条命令，查看电脑驱动

cat /var/log/dpkg.log | grep nvidia

输入下条命令，查看电脑所有驱动

sudo dpkg --list | grep nvidia-*

查找本机内核版本：
cat /proc/driver/nvidia/version
发现为440.33.01
sudo apt remove libnvidia-compute-* --purge
sudo apt remove nvidia* --purge
于是尝试安装440版本的驱动：
sudo apt-get install nvidia-driver-440 nvidia-settings nvidia-prime

ubuntu cuda/cudnn/nvidia install - 图1

Pytorch

Pytorch==1.7.1

conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=11.0 -c pytorch

anaconda3/envs/th1.7/lib/python3.8/site-packages/torch/lib/../../../../libcublas.so.11: undefined symbol: free_gemm_select, version libcublasLt.so.11

查了系统中也已经安装好对应的动态库。

find /usr -name libcublas.so*

cp /usr/*//libcublas.so.11 path/anaconda3/envs/th1.7/lib/

解决问题

提示找不到 libcublas.so 的问题，或出错。