前言
GPU环境问题
- 统一参考官网https://www.tensorflow.org/install/docker?hl=zh-cn
- 统一在linux中使用docker安装。不使用windows.
- 只需在主机上安装 NVIDIA® GPU 驱动程序,而不必安装 NVIDIA® CUDA® 工具包。(工具包cudnn和Toolkit)
- 检查docker版本
1.请注意,随着Docker 19.03的发布,不赞成使用nvidia-docker2软件包,因为Docker运行时中现在已将NVIDIA GPU作为设备本地支持。 2.如需在 Linux 上启用 GPU 支持,请安装 NVIDIA Docker 支持。
- 请通过
docker -v
检查 Docker 版本。对于 19.03 之前的版本,您需要使用 nvidia-docker2 和--runtime=nvidia
标记;**对于 19.03 及之后的版本,您将需要使用nvidia-container-toolkit
软件包和--gpus all
标记。**这两个选项都记录在上面链接的网页上。 - 参考官网方式进行安装 nvidia-docker https://github.com/NVIDIA/nvidia-docker
- 请通过
nvidia-container-toolkit 安装
centos7
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl https://nvidia.github.io/nvidia-docker/centos7/nvidia-docker.repo > /etc/yum.repos.d/nvidia-docker.repo
sudo yum install -y nvidia-container-toolkit
docker启动
docker启动tensorflow支持gpu要加上 参数 —gpus all
# Add the package repositories
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
# Test nvidia-smi with the latest official CUDA image
docker run --gpus all nvidia/cuda:10.0-base nvidia-smi
docker run --gpus all -it tensorflow/tensorflow:latest-gpu \
python -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"
第10行会返回gpu信息:
第12~13行会返回一大堆日志,我截取最后一句,含有“gpu”字样
进入容器,用以下代码检查容器环境,查看返回信息即可:
# TensorFlow and tf.keras
import tensorflow as tf
from tensorflow import keras
from tensorflow.python.client import device_lib
print("tensorflow版本:",tf.__version__)
print("kears版本:",keras.__version__)
print("是否建立在cuda上:",tf.test.is_built_with_cuda())
print("""\n ***** 以下将输出环境检测日志,需要出现 device_type: "GPU" 才算成功使用GPU ***** \n""")
print(device_lib.list_local_devices())
print("""\n ***** 环境检测日志输出完毕,需要出现 device_type: "GPU" 才算成功使用GPU ***** \n""")
docker-compose 启动
参考官网https://github.com/nvidia/nvidia-container-runtime#installation
需要安装 nvidia-container-runtime
Ubuntu distributions
- Install the repository for your distribution by following the instructions here.
Install the
nvidia-container-runtime
package:sudo apt-get install nvidia-container-runtime
CentOS distributions
Install the repository for your distribution by following the instructions here.
- Install the
nvidia-container-runtime
package:sudo yum install nvidia-container-runtime
Docker Engine setup
如果您已安装nvidia-docker2软件包,则该部分已注册运行时,请不要遵循此部分。 要注册nvidia运行时,请使用以下最适合您的环境的方法。 您可能需要将新参数与现有配置合并。Systemd drop-in file
sudo mkdir -p /etc/systemd/system/docker.service.d
sudo tee /etc/systemd/system/docker.service.d/override.conf <<EOF
[Service]
ExecStart=
ExecStart=/usr/bin/dockerd --host=fd:// --add-runtime=nvidia=/usr/bin/nvidia-container-runtime
EOF
sudo systemctl daemon-reload
sudo systemctl restart docker
Daemon configuration file
You can optionally reconfigure the default runtime by adding the following tosudo tee /etc/docker/daemon.json <<EOF
{
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
EOF
sudo pkill -SIGHUP dockerd
/etc/docker/daemon.json
:
最终的/etc/docker/daemon.json文件内容为:"default-runtime": "nvidia"
{
"registry-mirrors": [
"https://1nj0zren.mirror.aliyuncs.com",
"https://docker.mirrors.ustc.edu.cn",
"http://f1361db2.m.daocloud.io",
"https://registry.docker-cn.com"
],
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
版本问题
注意,docker安装的tensorflow是已经包含了keras的,因为版本足够高。导入方式
from tensorflow import keras
后续使用方式与单独的keras一致。
如有必要,可以先用docker装tensloflow,再进入环境装keras
pip install keras==xxxx
tensorflow搭建
docker安装tensorflow官网教程 https://www.tensorflow.org/install/docker?hl=zh-cn
下载 TensorFlow Docker 映像
官方 TensorFlow Docker 映像位于 tensorflow/tensorflow Docker Hub 代码库中。映像版本按照以下格式进行标记:
标记 | 说明 |
---|---|
latest |
TensorFlow CPU 二进制映像的最新版本。(默认版本) |
nightly |
TensorFlow 映像的每夜版。(不稳定) |
version |
指定 TensorFlow 二进制映像的版本,例如:2.1.0 |
devel |
TensorFlow master 开发环境的每夜版。包含 TensorFlow 源代码。 |
每个基本标记都有用于添加或更改功能的变体:
标记变体 | 说明 |
---|---|
tag -gpu |
支持 GPU 的指定标记版本。(详见下文) |
tag -py3 |
支持 Python 3 的指定标记版本。 |
tag -jupyter |
带有 Jupyter 的指定标记版本(包含 TensorFlow 教程笔记本) |
您可以一次使用多个变体。例如,以下命令会将 TensorFlow 版本映像下载到计算机上:
docker pull tensorflow/tensorflow # latest stable release
docker pull tensorflow/tensorflow:devel-gpu # nightly dev release w/ GPU support
docker pull tensorflow/tensorflow:latest-gpu-jupyter # latest release w/ GPU support and Jupyter
如tensorflow/tensorflow:latest-gpu-jupyter,latest是第一个表格的标记,gpu 和 jupyter是第二个表格的标记,三个标记一起使用了。
版本问题—-keras和tensorflow的版本对应关系
keras和tensorflow的版本对应关系,可参考:
Framework | Env name (—env parameter) | Description | Docker Image | Packages and Nvidia Settings |
---|---|---|---|---|
TensorFlow 1.14 | tensorflow-1.14 | TensorFlow 1.14.0 + Keras 2.2.5 on Python 3.6. | floydhub/tensorflow | TensorFlow-1.14 |
TensorFlow 1.13 | tensorflow-1.13 | TensorFlow 1.13.0 + Keras 2.2.4 on Python 3.6. | floydhub/tensorflow | TensorFlow-1.13 |
TensorFlow 1.12 | tensorflow-1.12 | TensorFlow 1.12.0 + Keras 2.2.4 on Python 3.6. | floydhub/tensorflow | TensorFlow-1.12 |
tensorflow-1.12:py2 | TensorFlow 1.12.0 + Keras 2.2.4 on Python 2. | floydhub/tensorflow | ||
TensorFlow 1.11 | tensorflow-1.11 | TensorFlow 1.11.0 + Keras 2.2.4 on Python 3.6. | floydhub/tensorflow | TensorFlow-1.11 |
tensorflow-1.11:py2 | TensorFlow 1.11.0 + Keras 2.2.4 on Python 2. | floydhub/tensorflow | ||
TensorFlow 1.10 | tensorflow-1.10 | TensorFlow 1.10.0 + Keras 2.2.0 on Python 3.6. | floydhub/tensorflow | TensorFlow-1.10 |
tensorflow-1.10:py2 | TensorFlow 1.10.0 + Keras 2.2.0 on Python 2. | floydhub/tensorflow | ||
TensorFlow 1.9 | tensorflow-1.9 | TensorFlow 1.9.0 + Keras 2.2.0 on Python 3.6. | floydhub/tensorflow | TensorFlow-1.9 |
tensorflow-1.9:py2 | TensorFlow 1.9.0 + Keras 2.2.0 on Python 2. | floydhub/tensorflow | ||
TensorFlow 1.8 | tensorflow-1.8 | TensorFlow 1.8.0 + Keras 2.1.6 on Python 3.6. | floydhub/tensorflow | TensorFlow-1.8 |
tensorflow-1.8:py2 | TensorFlow 1.8.0 + Keras 2.1.6 on Python 2. | floydhub/tensorflow | ||
TensorFlow 1.7 | tensorflow-1.7 | TensorFlow 1.7.0 + Keras 2.1.6 on Python 3.6. | floydhub/tensorflow | TensorFlow-1.7 |
tensorflow-1.7:py2 | TensorFlow 1.7.0 + Keras 2.1.6 on Python 2. | floydhub/tensorflow | ||
TensorFlow 1.5 | tensorflow-1.5 | TensorFlow 1.5.0 + Keras 2.1.6 on Python 3.6. | floydhub/tensorflow | TensorFlow-1.5 |
tensorflow-1.5:py2 | TensorFlow 1.5.0 + Keras 2.1.6 on Python 2. | floydhub/tensorflow | ||
TensorFlow 1.4 | tensorflow-1.4 | TensorFlow 1.4.0 + Keras 2.0.8 on Python 3.6. | floydhub/tensorflow | |
tensorflow-1.4:py2 | TensorFlow 1.4.0 + Keras 2.0.8 on Python 2. | floydhub/tensorflow | ||
TensorFlow 1.3 | tensorflow-1.3 | TensorFlow 1.3.0 + Keras 2.0.6 on Python 3.6. | floydhub/tensorflow | |
tensorflow-1.3:py2 | TensorFlow 1.3.0 + Keras 2.0.6 on Python 2. | floydhub/tensorflow | ||
TensorFlow 1.2 | tensorflow-1.2 | TensorFlow 1.2.0 + Keras 2.0.6 on Python 3.5. | floydhub/tensorflow | |
tensorflow-1.2:py2 | TensorFlow 1.2.0 + Keras 2.0.6 on Python 2. | floydhub/tensorflow | ||
TensorFlow 1.1 | tensorflow | TensorFlow 1.1.0 + Keras 2.0.6 on Python 3.5. | floydhub/tensorflow | |
tensorflow:py2 | TensorFlow 1.1.0 + Keras 2.0.6 on Python 2. | floydhub/tensorflow | ||
TensorFlow 1.0 | tensorflow-1.0 | TensorFlow 1.0.0 + Keras 2.0.6 on Python 3.5. | floydhub/tensorflow | |
tensorflow-1.0:py2 | TensorFlow 1.0.0 + Keras 2.0.6 on Python 2. | floydhub/tensorflow | ||
TensorFlow 0.12 | tensorflow-0.12 | TensorFlow 0.12.1 + Keras 1.2.2 on Python 3.5. | floydhub/tensorflow | |
tensorflow-0.12:py2 | TensorFlow 0.12.1 + Keras 1.2.2 on Python 2. | floydhub/tensorflow | ||
PyTorch 1.1 | pytorch-1.1 | PyTorch 1.1.0 + fastai 1.0.57 on Python 3.6. | floydhub/pytorch | PyTorch-1.1 |
PyTorch 1.0 | pytorch-1.0 | PyTorch 1.0.0 + fastai 1.0.51 on Python 3.6. | floydhub/pytorch | PyTorch-1.0 |
pytorch-1.0:py2 | PyTorch 1.0.0 on Python 2. | floydhub/pytorch | ||
PyTorch 0.4 | pytorch-0.4 | PyTorch 0.4.1 on Python 3.6. | floydhub/pytorch | PyTorch-0.4 |
pytorch-0.4:py2 | PyTorch 0.4.1 on Python 2. | floydhub/pytorch | ||
PyTorch 0.3 | pytorch-0.3 | PyTorch 0.3.1 on Python 3.6. | floydhub/pytorch | PyTorch-0.3 |
pytorch-0.3:py2 | PyTorch 0.3.1 on Python 2. | floydhub/pytorch | ||
PyTorch 0.2 | pytorch-0.2 | PyTorch 0.2.0 on Python 3.5 | floydhub/pytorch | |
pytorch-0.2:py2 | PyTorch 0.2.0 on Python 2. | floydhub/pytorch | ||
PyTorch 0.1 | pytorch-0.1 | PyTorch 0.1.12 on Python 3. | floydhub/pytorch | |
pytorch-0.1:py2 | PyTorch 0.1.12 on Python 2. | floydhub/pytorch | ||
Theano 0.9 | theano-0.9 | Theano rel-0.8.2 + Keras 2.0.3 on Python3.5. | floydhub/theano | |
theano-0.9:py2 | Theano rel-0.8.2 + Keras 2.0.3 on Python2. | floydhub/theano | ||
Caffe | caffe | Caffe rc4 on Python3.5. | floydhub/caffe | |
caffe:py2 | Caffe rc4 on Python2. | floydhub/caffe | ||
Torch | torch | Torch 7 with Python 3 env. | floydhub/torch | |
torch:py2 | Torch 7 with Python 2 env. | floydhub/torch | ||
Chainer 1.23 | chainer-1.23 | Chainer 1.23.0 on Python 3. | floydhub/chainer | |
chainer-1.23:py2 | Chainer 1.23.0 on Python 2. | floydhub/chainer | ||
Chainer 2.0 | chainer-2.0 | Chainer 1.23.0 on Python 3. | floydhub/chainer | |
chainer-2.0:py2 | Chainer 1.23.0 on Python 2. | floydhub/chainer | ||
MxNet 1.0 | mxnet | MxNet 1.0.0 on Python 3.6. | floydhub/mxnet | |
mxnet:py2 | MxNet 1.0.0 on Python 2. | floydhub/mxnet |