1、shell 脚本中

# jy: 设置可运行的显卡编号(编号即 nvidia-smi 看到的显卡编号);
export CUDA_VISIBLE_DEVICES=2
#export CUDA_VISIBLE_DEVICES=0,1
# jy: Allow multiple threads(允许多线程)
export OMP_NUM_THREADS=8
# jy: Use distributed data parallel (使用分布式数据并行; 多卡处理, 每张卡占用内存并不
#     会因为卡的增多而下降)
# jy: 只使用一张 GPU 卡(如果 train.py 脚本逻辑中会使用 GPU, 则会自动使用以上设置的 GPU 卡号)
python train.py \
    --para_name xxx \
    --para_name2 \
    "$@"
# jy: 使用多张 GPU 卡
python -m torch.distributed.launch --nproc_per_node $NUM_GPU --master_port $PORT_ID train.py \
    --para_name xxx \
    --para_name2 \
    "$@"

2、python 脚本中

import os

os.environ['CUDA_VISIBLE_DEVICES'] = "0,1"
#os.environ['CUDA_VISIBLE_DEVICES'] = "5,6"

3、命令行中指定

单 GPU
- CUDA_VISIBLE_DEVICES=2 python train.py
多 GPU
- CUDA_VISIBLE_DEVICES=2,3,4 python train.py

【03】机器学习、深度学习

（00）指定 GPU 训练模型

1、shell 脚本中

2、python 脚本中

3、命令行中指定