前言

GPU环境问题

  • 统一参考官网https://www.tensorflow.org/install/docker?hl=zh-cn
  • 统一在linux中使用docker安装。不使用windows.
  • 只需在主机上安装 NVIDIA® GPU 驱动程序,而不必安装 NVIDIA® CUDA® 工具包。(工具包cudnn和Toolkit)
  • 检查docker版本

    1.请注意,随着Docker 19.03的发布,不赞成使用nvidia-docker2软件包,因为Docker运行时中现在已将NVIDIA GPU作为设备本地支持。 2.如需在 Linux 上启用 GPU 支持,请安装 NVIDIA Docker 支持

    • 请通过 docker -v 检查 Docker 版本。对于 19.03 之前的版本,您需要使用 nvidia-docker2 和 --runtime=nvidia 标记;**对于 19.03 及之后的版本,您将需要使用 nvidia-container-toolkit 软件包和 --gpus all 标记。**这两个选项都记录在上面链接的网页上。
    • 参考官网方式进行安装 nvidia-docker https://github.com/NVIDIA/nvidia-docker

nvidia-container-toolkit 安装

centos7

  1. distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
  2. curl https://nvidia.github.io/nvidia-docker/centos7/nvidia-docker.repo > /etc/yum.repos.d/nvidia-docker.repo
  3. sudo yum install -y nvidia-container-toolkit

docker启动

docker启动tensorflow支持gpu要加上 参数 —gpus all

  1. # Add the package repositories
  2. distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
  3. curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
  4. curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
  5. sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
  6. sudo systemctl restart docker
  7. # Test nvidia-smi with the latest official CUDA image
  8. docker run --gpus all nvidia/cuda:10.0-base nvidia-smi
  9. docker run --gpus all -it tensorflow/tensorflow:latest-gpu \
  10. python -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"

第10行会返回gpu信息:
image.png
第12~13行会返回一大堆日志,我截取最后一句,含有“gpu”字样
image.png
进入容器,用以下代码检查容器环境,查看返回信息即可:

  1. # TensorFlow and tf.keras
  2. import tensorflow as tf
  3. from tensorflow import keras
  4. from tensorflow.python.client import device_lib
  5. print("tensorflow版本:",tf.__version__)
  6. print("kears版本:",keras.__version__)
  7. print("是否建立在cuda上:",tf.test.is_built_with_cuda())
  8. print("""\n ***** 以下将输出环境检测日志,需要出现 device_type: "GPU" 才算成功使用GPU ***** \n""")
  9. print(device_lib.list_local_devices())
  10. print("""\n ***** 环境检测日志输出完毕,需要出现 device_type: "GPU" 才算成功使用GPU ***** \n""")

docker-compose 启动

参考官网https://github.com/nvidia/nvidia-container-runtime#installation
需要安装 nvidia-container-runtime

Ubuntu distributions

  1. Install the repository for your distribution by following the instructions here.
  2. Install the nvidia-container-runtime package:

    1. sudo apt-get install nvidia-container-runtime

    CentOS distributions

  3. Install the repository for your distribution by following the instructions here.

  4. Install the nvidia-container-runtime package:
    1. sudo yum install nvidia-container-runtime

    Docker Engine setup

    如果您已安装nvidia-docker2软件包,则该部分已注册运行时,请不要遵循此部分。 要注册nvidia运行时,请使用以下最适合您的环境的方法。 您可能需要将新参数与现有配置合并。

    Systemd drop-in file

    1. sudo mkdir -p /etc/systemd/system/docker.service.d
    2. sudo tee /etc/systemd/system/docker.service.d/override.conf <<EOF
    3. [Service]
    4. ExecStart=
    5. ExecStart=/usr/bin/dockerd --host=fd:// --add-runtime=nvidia=/usr/bin/nvidia-container-runtime
    6. EOF
    7. sudo systemctl daemon-reload
    8. sudo systemctl restart docker

    Daemon configuration file

    1. sudo tee /etc/docker/daemon.json <<EOF
    2. {
    3. "runtimes": {
    4. "nvidia": {
    5. "path": "/usr/bin/nvidia-container-runtime",
    6. "runtimeArgs": []
    7. }
    8. }
    9. }
    10. EOF
    11. sudo pkill -SIGHUP dockerd
    You can optionally reconfigure the default runtime by adding the following to /etc/docker/daemon.json:
    1. "default-runtime": "nvidia"
    最终的/etc/docker/daemon.json文件内容为:
    1. {
    2. "registry-mirrors": [
    3. "https://1nj0zren.mirror.aliyuncs.com",
    4. "https://docker.mirrors.ustc.edu.cn",
    5. "http://f1361db2.m.daocloud.io",
    6. "https://registry.docker-cn.com"
    7. ],
    8. "default-runtime": "nvidia",
    9. "runtimes": {
    10. "nvidia": {
    11. "path": "/usr/bin/nvidia-container-runtime",
    12. "runtimeArgs": []
    13. }
    14. }
    15. }

    版本问题

    注意,docker安装的tensorflow是已经包含了keras的,因为版本足够高。导入方式
    from tensorflow import keras
    后续使用方式与单独的keras一致。

如有必要,可以先用docker装tensloflow,再进入环境装keras
pip install keras==xxxx

tensorflow搭建

docker安装tensorflow官网教程 https://www.tensorflow.org/install/docker?hl=zh-cn

下载 TensorFlow Docker 映像

官方 TensorFlow Docker 映像位于 tensorflow/tensorflow Docker Hub 代码库中。映像版本按照以下格式进行标记

标记 说明
latest TensorFlow CPU 二进制映像的最新版本。(默认版本)
nightly TensorFlow 映像的每夜版。(不稳定)
version 指定 TensorFlow 二进制映像的版本,例如:2.1.0
devel TensorFlow master 开发环境的每夜版。包含 TensorFlow 源代码。

每个基本标记都有用于添加或更改功能的变体:

标记变体 说明
tag-gpu 支持 GPU 的指定标记版本。(详见下文
tag-py3 支持 Python 3 的指定标记版本。
tag-jupyter 带有 Jupyter 的指定标记版本(包含 TensorFlow 教程笔记本)

您可以一次使用多个变体。例如,以下命令会将 TensorFlow 版本映像下载到计算机上:
docker pull tensorflow/tensorflow # latest stable release
docker pull tensorflow/tensorflow:devel-gpu # nightly dev release w/ GPU support
docker pull tensorflow/tensorflow:latest-gpu-jupyter # latest release w/ GPU support and Jupyter
如tensorflow/tensorflow:latest-gpu-jupyter,latest是第一个表格的标记,gpu 和 jupyter是第二个表格的标记,三个标记一起使用了。

版本问题—-keras和tensorflow的版本对应关系

keras和tensorflow的版本对应关系,可参考:

Framework Env name (—env parameter) Description Docker Image Packages and Nvidia Settings
TensorFlow 1.14 tensorflow-1.14 TensorFlow 1.14.0 + Keras 2.2.5 on Python 3.6. floydhub/tensorflow TensorFlow-1.14
TensorFlow 1.13 tensorflow-1.13 TensorFlow 1.13.0 + Keras 2.2.4 on Python 3.6. floydhub/tensorflow TensorFlow-1.13
TensorFlow 1.12 tensorflow-1.12 TensorFlow 1.12.0 + Keras 2.2.4 on Python 3.6. floydhub/tensorflow TensorFlow-1.12
tensorflow-1.12:py2 TensorFlow 1.12.0 + Keras 2.2.4 on Python 2. floydhub/tensorflow
TensorFlow 1.11 tensorflow-1.11 TensorFlow 1.11.0 + Keras 2.2.4 on Python 3.6. floydhub/tensorflow TensorFlow-1.11
tensorflow-1.11:py2 TensorFlow 1.11.0 + Keras 2.2.4 on Python 2. floydhub/tensorflow
TensorFlow 1.10 tensorflow-1.10 TensorFlow 1.10.0 + Keras 2.2.0 on Python 3.6. floydhub/tensorflow TensorFlow-1.10
tensorflow-1.10:py2 TensorFlow 1.10.0 + Keras 2.2.0 on Python 2. floydhub/tensorflow
TensorFlow 1.9 tensorflow-1.9 TensorFlow 1.9.0 + Keras 2.2.0 on Python 3.6. floydhub/tensorflow TensorFlow-1.9
tensorflow-1.9:py2 TensorFlow 1.9.0 + Keras 2.2.0 on Python 2. floydhub/tensorflow
TensorFlow 1.8 tensorflow-1.8 TensorFlow 1.8.0 + Keras 2.1.6 on Python 3.6. floydhub/tensorflow TensorFlow-1.8
tensorflow-1.8:py2 TensorFlow 1.8.0 + Keras 2.1.6 on Python 2. floydhub/tensorflow
TensorFlow 1.7 tensorflow-1.7 TensorFlow 1.7.0 + Keras 2.1.6 on Python 3.6. floydhub/tensorflow TensorFlow-1.7
tensorflow-1.7:py2 TensorFlow 1.7.0 + Keras 2.1.6 on Python 2. floydhub/tensorflow
TensorFlow 1.5 tensorflow-1.5 TensorFlow 1.5.0 + Keras 2.1.6 on Python 3.6. floydhub/tensorflow TensorFlow-1.5
tensorflow-1.5:py2 TensorFlow 1.5.0 + Keras 2.1.6 on Python 2. floydhub/tensorflow
TensorFlow 1.4 tensorflow-1.4 TensorFlow 1.4.0 + Keras 2.0.8 on Python 3.6. floydhub/tensorflow
tensorflow-1.4:py2 TensorFlow 1.4.0 + Keras 2.0.8 on Python 2. floydhub/tensorflow
TensorFlow 1.3 tensorflow-1.3 TensorFlow 1.3.0 + Keras 2.0.6 on Python 3.6. floydhub/tensorflow
tensorflow-1.3:py2 TensorFlow 1.3.0 + Keras 2.0.6 on Python 2. floydhub/tensorflow
TensorFlow 1.2 tensorflow-1.2 TensorFlow 1.2.0 + Keras 2.0.6 on Python 3.5. floydhub/tensorflow
tensorflow-1.2:py2 TensorFlow 1.2.0 + Keras 2.0.6 on Python 2. floydhub/tensorflow
TensorFlow 1.1 tensorflow TensorFlow 1.1.0 + Keras 2.0.6 on Python 3.5. floydhub/tensorflow
tensorflow:py2 TensorFlow 1.1.0 + Keras 2.0.6 on Python 2. floydhub/tensorflow
TensorFlow 1.0 tensorflow-1.0 TensorFlow 1.0.0 + Keras 2.0.6 on Python 3.5. floydhub/tensorflow
tensorflow-1.0:py2 TensorFlow 1.0.0 + Keras 2.0.6 on Python 2. floydhub/tensorflow
TensorFlow 0.12 tensorflow-0.12 TensorFlow 0.12.1 + Keras 1.2.2 on Python 3.5. floydhub/tensorflow
tensorflow-0.12:py2 TensorFlow 0.12.1 + Keras 1.2.2 on Python 2. floydhub/tensorflow
PyTorch 1.1 pytorch-1.1 PyTorch 1.1.0 + fastai 1.0.57 on Python 3.6. floydhub/pytorch PyTorch-1.1
PyTorch 1.0 pytorch-1.0 PyTorch 1.0.0 + fastai 1.0.51 on Python 3.6. floydhub/pytorch PyTorch-1.0
pytorch-1.0:py2 PyTorch 1.0.0 on Python 2. floydhub/pytorch
PyTorch 0.4 pytorch-0.4 PyTorch 0.4.1 on Python 3.6. floydhub/pytorch PyTorch-0.4
pytorch-0.4:py2 PyTorch 0.4.1 on Python 2. floydhub/pytorch
PyTorch 0.3 pytorch-0.3 PyTorch 0.3.1 on Python 3.6. floydhub/pytorch PyTorch-0.3
pytorch-0.3:py2 PyTorch 0.3.1 on Python 2. floydhub/pytorch
PyTorch 0.2 pytorch-0.2 PyTorch 0.2.0 on Python 3.5 floydhub/pytorch
pytorch-0.2:py2 PyTorch 0.2.0 on Python 2. floydhub/pytorch
PyTorch 0.1 pytorch-0.1 PyTorch 0.1.12 on Python 3. floydhub/pytorch
pytorch-0.1:py2 PyTorch 0.1.12 on Python 2. floydhub/pytorch
Theano 0.9 theano-0.9 Theano rel-0.8.2 + Keras 2.0.3 on Python3.5. floydhub/theano
theano-0.9:py2 Theano rel-0.8.2 + Keras 2.0.3 on Python2. floydhub/theano
Caffe caffe Caffe rc4 on Python3.5. floydhub/caffe
caffe:py2 Caffe rc4 on Python2. floydhub/caffe
Torch torch Torch 7 with Python 3 env. floydhub/torch
torch:py2 Torch 7 with Python 2 env. floydhub/torch
Chainer 1.23 chainer-1.23 Chainer 1.23.0 on Python 3. floydhub/chainer
chainer-1.23:py2 Chainer 1.23.0 on Python 2. floydhub/chainer
Chainer 2.0 chainer-2.0 Chainer 1.23.0 on Python 3. floydhub/chainer
chainer-2.0:py2 Chainer 1.23.0 on Python 2. floydhub/chainer
MxNet 1.0 mxnet MxNet 1.0.0 on Python 3.6. floydhub/mxnet
mxnet:py2 MxNet 1.0.0 on Python 2. floydhub/mxnet

@https://docs.floydhub.com/guides/environments/