官方说明文档地址:https://mmdetection.readthedocs.io/en/latest/INSTALL.html
Github 地址:https://github.com/open-mmlab/mmdetection

简述

mmdetection 是一款优秀的基于 PyTorch 的开源目标检测系统,由 香港中文大学多媒体实验室开发,遵循 Apache-2.0 开源协议。

主要特点

  1. 模块化设计

将目标检测框架各个模块进行分解,可以进行自由组合成自定义的目标检测框架

  1. 支持多个主流目标检测框架

包括 Faster RCNN, Mask RCNN, RetinaNet, 等

  1. 高效

所有基本操作,如 bbox 和 mask 都可以在 GPU 上运行,训练速度比其他开源框架(如: Detectron, maskrcnn-benchmarkSimpleDet.)相媲美或者更快

  1. 最先进

mmdetection 的创始团队在 2018 年的 COCO 目标检测竞赛上赢得冠军。

安装

要求

安装前,先看看环境是不是支持

  • Linux系统(Windows 不是官方支持的系统)
  • Python 3.5+
  • PyTorch 1.1 或更高版本
  • CUDA 9.0 或更高
  • NCCL 2
  • GCC 4.9 或更高 g**cc —version**
  • mmcv

Python 虚拟环境安装(待确定是否有效)

  1. # 使用清华源
  2. conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
  3. conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
  4. conda config --set show_channel_urls yes
  5. # 创建 python 虚拟环境
  6. conda create -n open-mmlab python=3.7 -y
  7. conda activate open-mmlab
  8. # 安装所需包
  9. conda install pytorch torchvision cudatoolkit=10.0

参考:

安装 NCCL

NCCL(NVIDIA Collective Communications Library) 主要用于多 GPU 和多节点的集体通信,可以对 Nvidia GPU 进行性能优化。

https://developer.nvidia.com/nccl 下载相应版本的安装包(CUDA 版本、系统版本Ubuntu)『注:下载前必须先进行注册』

  1. sudo dpkg -i nccl-repo-ubuntu1604-2.5.6-ga-cuda10.0_1-1_amd64.deb # 这个文件就是从nvidia 下载下来的
  2. sudo apt update
  3. sudo apt install libnccl2=2.5.6-1+cuda10.0 libnccl-dev=2.5.6-1+cuda10.0

image.png

参考

安装 Cython

  1. conda install cython

安装 mmcv

mmcv 是 mmdetection 依赖的重要计算机视觉库

首先运行下面的命令

  1. git clone https://github.com/open-mmlab/mmcv.git
  2. cd mmcv
  3. pip install . # 注意有一个点(.) 安装的时候会有点久

如是出现以下错误,可以运行以下命令后再运行上面的命令pip install pytest-runner pytest

  1. pip install pytest-runner pytest

目标检测框架mmdetection - 图2

安装 mmdetection

  1. git clone https://github.com/open-mmlab/mmdetection.git
  2. cd mmdetection
  3. pip install -v -e . # 注意有一个点(.) or "python setup.py develop" # 编译的时间会有点长

安装 pycocotools

  1. pip install pycocotools

验证环境是否安装成功

Python 环境下运行一下代码,若没有错误,则代表环境安装成功

  1. from mmdet.apis import init_detector
  2. from mmdet.apis import inference_detector
  3. from mmdet.apis import show_result

image.png

Dockerfile 安装

测试代码

测试前,先准备一张测试图片 test.jpg 并且新建一个 checkpoints 文件夹,在 https://github.com/open-mmlab/mmdetection/blob/master/docs/MODEL_ZOO.md 中下载预训练模型到 checkpoints 文件夹中

测试的模型下载地址是:
https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/models/faster_rcnn_r50_fpn_1x_20181010-3d1b3351.pth (这个是亚马逊云,下载会比较慢)

  1. #coding=utf-8
  2. from mmdet.apis import init_detector
  3. from mmdet.apis import inference_detector
  4. from mmdet.apis import show_result
  5. # 模型配置文件
  6. config_file = './configs/faster_rcnn_r50_fpn_1x.py'
  7. # 预训练模型文件
  8. checkpoint_file = './checkpoints/faster_rcnn_r50_fpn_1x_20181010-3d1b3351.pth'
  9. # 通过模型配置文件与预训练文件构建模型
  10. model = init_detector(config_file, checkpoint_file, device='cuda:0')
  11. # # # 测试单张图片并进行展示
  12. img = 'test.jpg'
  13. result = inference_detector(model, img)
  14. print(result)
  15. # 运行测试图片,并保存为 result.jpg
  16. show_result(img, result, model.CLASSES, show=False, out_file="result.jpg")

测试数据集

  1. # single-gpu testing
  2. python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}] [--show]
  3. # multi-gpu testing
  4. ./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}]

CONFIG_FILE :
CHECKPOINT_FILE :
RESULT_FILE :
EVAL_METRICS :
--show : 会将结果绘制在一个新窗口上,仅限在单GPU时使用,也要确保环境支持GUI,否则会报错

COCO VOC 数据测试

COCO 数据集训练与测试

VOC 数据集训练与测试

  • 数据集准备

在 mmdetection 文件夹下创建一个 data 文件夹,数据集的目录结构如下

  1. mmdetection
  2. ├── mmdet
  3. ├── tools
  4. ├── configs
  5. ├── data
  6. ├── coco
  7. ├── annotations
  8. ├── train2017
  9. ├── val2017
  10. ├── test2017
  11. ├── VOCdevkit
  12. ├── VOC2007
  13. ├── VOC2012
  1. cd mmdetection
  2. mkdir data
  3. cd data
  4. mkdir VOCdevkit
  5. cd VOCdevkit
  6. ln -s /you_dataset_path/ ./ # 创建软连接
  • 训练

学习率设置

  • 单 gpu,且 img_per_gpu = 2 lr = 0.00125。
  • 8 gpus、imgs_per_gpu = 2:lr = 0.02;
  • 4 gpus、imgs_per_gpu = 2:lr = 0.01
  • 2 gpus、imgs_per_gpu = 2 或 4 gpus、imgs_per_gpu = 1:lr = 0.005;
  • 16 GPUs 4 img/gpu.: lr=0.08

epoch 设置

epoch 的选择,默认 total_epoch = 12,learning_policy 中,step = [8,11]。total_peoch 可以自行修改,若 total_epoch = 50,则 learning_policy 中,step 也相应修改,例如 step = [38,48]。

configs/pascal_voc/aster_rcnn_r50_fpn_1x_voc0712.py

  1. # model settings
  2. model = dict(
  3. type='FasterRCNN',
  4. pretrained='torchvision://resnet50',
  5. backbone=dict(
  6. type='ResNet',
  7. depth=50,
  8. num_stages=4,
  9. out_indices=(0, 1, 2, 3),
  10. frozen_stages=1,
  11. style='pytorch'),
  12. neck=dict(
  13. type='FPN',
  14. in_channels=[256, 512, 1024, 2048],
  15. out_channels=256,
  16. num_outs=5),
  17. rpn_head=dict(
  18. type='RPNHead',
  19. in_channels=256,
  20. feat_channels=256,
  21. anchor_scales=[8],
  22. anchor_ratios=[0.5, 1.0, 2.0],
  23. anchor_strides=[4, 8, 16, 32, 64],
  24. target_means=[.0, .0, .0, .0],
  25. target_stds=[1.0, 1.0, 1.0, 1.0],
  26. loss_cls=dict(
  27. type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
  28. loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0)),
  29. bbox_roi_extractor=dict(
  30. type='SingleRoIExtractor',
  31. roi_layer=dict(type='RoIAlign', out_size=7, sample_num=2),
  32. out_channels=256,
  33. featmap_strides=[4, 8, 16, 32]),
  34. bbox_head=dict(
  35. type='SharedFCBBoxHead',
  36. num_fcs=2,
  37. in_channels=256,
  38. fc_out_channels=1024,
  39. roi_feat_size=7,
  40. num_classes=21,
  41. target_means=[0., 0., 0., 0.],
  42. target_stds=[0.1, 0.1, 0.2, 0.2],
  43. reg_class_agnostic=False,
  44. loss_cls=dict(
  45. type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
  46. loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0)))
  47. # model training and testing settings
  48. train_cfg = dict(
  49. rpn=dict(
  50. assigner=dict(
  51. type='MaxIoUAssigner',
  52. pos_iou_thr=0.7,
  53. neg_iou_thr=0.3,
  54. min_pos_iou=0.3,
  55. ignore_iof_thr=-1),
  56. sampler=dict(
  57. type='RandomSampler',
  58. num=256,
  59. pos_fraction=0.5,
  60. neg_pos_ub=-1,
  61. add_gt_as_proposals=False),
  62. allowed_border=0,
  63. pos_weight=-1,
  64. debug=False),
  65. rpn_proposal=dict(
  66. nms_across_levels=False,
  67. nms_pre=2000,
  68. nms_post=2000,
  69. max_num=2000,
  70. nms_thr=0.7,
  71. min_bbox_size=0),
  72. rcnn=dict(
  73. assigner=dict(
  74. type='MaxIoUAssigner',
  75. pos_iou_thr=0.5,
  76. neg_iou_thr=0.5,
  77. min_pos_iou=0.5,
  78. ignore_iof_thr=-1),
  79. sampler=dict(
  80. type='RandomSampler',
  81. num=512,
  82. pos_fraction=0.25,
  83. neg_pos_ub=-1,
  84. add_gt_as_proposals=True),
  85. pos_weight=-1,
  86. debug=False))
  87. test_cfg = dict(
  88. rpn=dict(
  89. nms_across_levels=False,
  90. nms_pre=1000,
  91. nms_post=1000,
  92. max_num=1000,
  93. nms_thr=0.7,
  94. min_bbox_size=0),
  95. rcnn=dict(
  96. score_thr=0.05, nms=dict(type='nms', iou_thr=0.5), max_per_img=100)
  97. # soft-nms is also supported for rcnn testing
  98. # e.g., nms=dict(type='soft_nms', iou_thr=0.5, min_score=0.05)
  99. )
  100. # dataset settings
  101. dataset_type = 'VOCDataset'
  102. data_root = 'data/VOCdevkit/'
  103. img_norm_cfg = dict(
  104. mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
  105. train_pipeline = [
  106. dict(type='LoadImageFromFile'),
  107. dict(type='LoadAnnotations', with_bbox=True),
  108. dict(type='Resize', img_scale=(1000, 600), keep_ratio=True),
  109. dict(type='RandomFlip', flip_ratio=0.5),
  110. dict(type='Normalize', **img_norm_cfg),
  111. dict(type='Pad', size_divisor=32),
  112. dict(type='DefaultFormatBundle'),
  113. dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
  114. ]
  115. test_pipeline = [
  116. dict(type='LoadImageFromFile'),
  117. dict(
  118. type='MultiScaleFlipAug',
  119. img_scale=(1000, 600),
  120. flip=False,
  121. transforms=[
  122. dict(type='Resize', keep_ratio=True),
  123. dict(type='RandomFlip'),
  124. dict(type='Normalize', **img_norm_cfg),
  125. dict(type='Pad', size_divisor=32),
  126. dict(type='ImageToTensor', keys=['img']),
  127. dict(type='Collect', keys=['img']),
  128. ])
  129. ]
  130. data = dict(
  131. imgs_per_gpu=2, # 根据 GPU 的数量修改
  132. workers_per_gpu=2,
  133. train=dict(
  134. type='RepeatDataset',
  135. times=3,
  136. dataset=dict(
  137. type=dataset_type,
  138. ann_file=[
  139. data_root + 'VOC2007/ImageSets/Main/trainval.txt',
  140. data_root + 'VOC2012/ImageSets/Main/trainval.txt'
  141. ],
  142. img_prefix=[data_root + 'VOC2007/',
  143. data_root + 'VOC2012/'
  144. ],
  145. pipeline=train_pipeline)),
  146. val=dict(
  147. type=dataset_type,
  148. ann_file=data_root + 'VOC2007/ImageSets/Main/test.txt',
  149. img_prefix=data_root + 'VOC2007/',
  150. pipeline=test_pipeline),
  151. test=dict(
  152. type=dataset_type,
  153. ann_file=data_root + 'VOC2007/ImageSets/Main/test.txt',
  154. img_prefix=data_root + 'VOC2007/',
  155. pipeline=test_pipeline))
  156. evaluation = dict(interval=1, metric='mAP')
  157. # optimizer
  158. optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)
  159. optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
  160. # learning policy
  161. lr_config = dict(policy='step', step=[3]) # actual epoch = 3 * 3 = 9
  162. checkpoint_config = dict(interval=1)
  163. # yapf:disable
  164. log_config = dict(
  165. interval=50,
  166. hooks=[
  167. dict(type='TextLoggerHook'),
  168. # dict(type='TensorboardLoggerHook')
  169. ])
  170. # yapf:enable
  171. # runtime settings
  172. total_epochs = 4 # actual epoch = 4 * 3 = 12
  173. dist_params = dict(backend='nccl')
  174. log_level = 'INFO'
  175. work_dir = './work_dirs/faster_rcnn_r50_fpn_1x_voc0712'
  176. load_from = None
  177. resume_from = None
  178. workflow = [('train', 1)]
  1. python tools/train.py configs/pascal_voc/faster_rcnn_r50_fpn_1x_voc0712.py

image.png

  • VOC 数据集 mAP、recall 计算
  1. # 生成 pkl 文件
  2. python tools/test.py ./configs/pascal_voc/faster_rcnn_r50_fpn_1x_voc0712.py ./work_dirs/faster_rcnn_r50_fpn_1x_voc0712/epoch_1.pth --out results.pk
  3. # 计算 mAP
  4. python ./tools/voc_eval.py results.pkl ./configs/pascal_voc/faster_rcnn_r50_fpn_1x_voc0712.py

image.png

tool/test.py

  1. import argparse
  2. import os
  3. import os.path as osp
  4. import pickle
  5. import shutil
  6. import tempfile
  7. import mmcv
  8. import torch
  9. import torch.distributed as dist
  10. from mmcv.parallel import MMDataParallel, MMDistributedDataParallel
  11. from mmcv.runner import get_dist_info, init_dist, load_checkpoint
  12. from mmdet.core import wrap_fp16_model
  13. from mmdet.datasets import build_dataloader, build_dataset
  14. from mmdet.models import build_detector
  15. def single_gpu_test(model, data_loader, show=False):
  16. model.eval()
  17. results = []
  18. dataset = data_loader.dataset
  19. prog_bar = mmcv.ProgressBar(len(dataset))
  20. for i, data in enumerate(data_loader):
  21. with torch.no_grad():
  22. result = model(return_loss=False, rescale=not show, **data)
  23. results.append(result)
  24. if show:
  25. model.module.show_result(data, result)
  26. batch_size = data['img'][0].size(0)
  27. for _ in range(batch_size):
  28. prog_bar.update()
  29. return results
  30. def multi_gpu_test(model, data_loader, tmpdir=None, gpu_collect=False):
  31. """Test model with multiple gpus.
  32. This method tests model with multiple gpus and collects the results
  33. under two different modes: gpu and cpu modes. By setting 'gpu_collect=True'
  34. it encodes results to gpu tensors and use gpu communication for results
  35. collection. On cpu mode it saves the results on different gpus to 'tmpdir'
  36. and collects them by the rank 0 worker.
  37. Args:
  38. model (nn.Module): Model to be tested.
  39. data_loader (nn.Dataloader): Pytorch data loader.
  40. tmpdir (str): Path of directory to save the temporary results from
  41. different gpus under cpu mode.
  42. gpu_collect (bool): Option to use either gpu or cpu to collect results.
  43. Returns:
  44. list: The prediction results.
  45. """
  46. model.eval()
  47. results = []
  48. dataset = data_loader.dataset
  49. rank, world_size = get_dist_info()
  50. if rank == 0:
  51. prog_bar = mmcv.ProgressBar(len(dataset))
  52. for i, data in enumerate(data_loader):
  53. with torch.no_grad():
  54. result = model(return_loss=False, rescale=True, **data)
  55. results.append(result)
  56. if rank == 0:
  57. batch_size = data['img'][0].size(0)
  58. for _ in range(batch_size * world_size):
  59. prog_bar.update()
  60. # collect results from all ranks
  61. if gpu_collect:
  62. results = collect_results_gpu(results, len(dataset))
  63. else:
  64. results = collect_results_cpu(results, len(dataset), tmpdir)
  65. return results
  66. def collect_results_cpu(result_part, size, tmpdir=None):
  67. rank, world_size = get_dist_info()
  68. # create a tmp dir if it is not specified
  69. if tmpdir is None:
  70. MAX_LEN = 512
  71. # 32 is whitespace
  72. dir_tensor = torch.full((MAX_LEN, ),
  73. 32,
  74. dtype=torch.uint8,
  75. device='cuda')
  76. if rank == 0:
  77. tmpdir = tempfile.mkdtemp()
  78. tmpdir = torch.tensor(
  79. bytearray(tmpdir.encode()), dtype=torch.uint8, device='cuda')
  80. dir_tensor[:len(tmpdir)] = tmpdir
  81. dist.broadcast(dir_tensor, 0)
  82. tmpdir = dir_tensor.cpu().numpy().tobytes().decode().rstrip()
  83. else:
  84. mmcv.mkdir_or_exist(tmpdir)
  85. # dump the part result to the dir
  86. mmcv.dump(result_part, osp.join(tmpdir, 'part_{}.pkl'.format(rank)))
  87. dist.barrier()
  88. # collect all parts
  89. if rank != 0:
  90. return None
  91. else:
  92. # load results of all parts from tmp dir
  93. part_list = []
  94. for i in range(world_size):
  95. part_file = osp.join(tmpdir, 'part_{}.pkl'.format(i))
  96. part_list.append(mmcv.load(part_file))
  97. # sort the results
  98. ordered_results = []
  99. for res in zip(*part_list):
  100. ordered_results.extend(list(res))
  101. # the dataloader may pad some samples
  102. ordered_results = ordered_results[:size]
  103. # remove tmp dir
  104. shutil.rmtree(tmpdir)
  105. return ordered_results
  106. def collect_results_gpu(result_part, size):
  107. rank, world_size = get_dist_info()
  108. # dump result part to tensor with pickle
  109. part_tensor = torch.tensor(
  110. bytearray(pickle.dumps(result_part)), dtype=torch.uint8, device='cuda')
  111. # gather all result part tensor shape
  112. shape_tensor = torch.tensor(part_tensor.shape, device='cuda')
  113. shape_list = [shape_tensor.clone() for _ in range(world_size)]
  114. dist.all_gather(shape_list, shape_tensor)
  115. # padding result part tensor to max length
  116. shape_max = torch.tensor(shape_list).max()
  117. part_send = torch.zeros(shape_max, dtype=torch.uint8, device='cuda')
  118. part_send[:shape_tensor[0]] = part_tensor
  119. part_recv_list = [
  120. part_tensor.new_zeros(shape_max) for _ in range(world_size)
  121. ]
  122. # gather all result part
  123. dist.all_gather(part_recv_list, part_send)
  124. if rank == 0:
  125. part_list = []
  126. for recv, shape in zip(part_recv_list, shape_list):
  127. part_list.append(
  128. pickle.loads(recv[:shape[0]].cpu().numpy().tobytes()))
  129. # sort the results
  130. ordered_results = []
  131. for res in zip(*part_list):
  132. ordered_results.extend(list(res))
  133. # the dataloader may pad some samples
  134. ordered_results = ordered_results[:size]
  135. return ordered_results
  136. class MultipleKVAction(argparse.Action):
  137. """
  138. argparse action to split an argument into KEY=VALUE form
  139. on the first = and append to a dictionary.
  140. """
  141. def _is_int(self, val):
  142. try:
  143. _ = int(val)
  144. return True
  145. except Exception:
  146. return False
  147. def _is_float(self, val):
  148. try:
  149. _ = float(val)
  150. return True
  151. except Exception:
  152. return False
  153. def _is_bool(self, val):
  154. return val.lower() in ['true', 'false']
  155. def __call__(self, parser, namespace, values, option_string=None):
  156. options = {}
  157. for val in values:
  158. parts = val.split('=')
  159. key = parts[0].strip()
  160. if len(parts) > 2:
  161. val = '='.join(parts[1:])
  162. else:
  163. val = parts[1].strip()
  164. # try parsing val to bool/int/float first
  165. if self._is_bool(val):
  166. import json
  167. val = json.loads(val.lower())
  168. elif self._is_int(val):
  169. val = int(val)
  170. elif self._is_float(val):
  171. val = float(val)
  172. options[key] = val
  173. setattr(namespace, self.dest, options)
  174. def parse_args():
  175. parser = argparse.ArgumentParser(
  176. description='MMDet test (and eval) a model')
  177. parser.add_argument('config', help='test config file path')
  178. parser.add_argument('checkpoint', help='checkpoint file')
  179. parser.add_argument('--out', help='output result file in pickle format')
  180. parser.add_argument(
  181. '--format_only',
  182. action='store_true',
  183. help='Format the output results without perform evaluation. It is'
  184. 'useful when you want to format the result to a specific format and '
  185. 'submit it to the test server')
  186. parser.add_argument(
  187. '--eval',
  188. type=str,
  189. nargs='+',
  190. help='evaluation metrics, which depends on the dataset, e.g., "bbox",'
  191. ' "segm", "proposal" for COCO, and "mAP", "recall" for PASCAL VOC')
  192. parser.add_argument('--show', action='store_true', help='show results')
  193. parser.add_argument(
  194. '--gpu_collect',
  195. action='store_true',
  196. help='whether to use gpu to collect results.')
  197. parser.add_argument(
  198. '--tmpdir',
  199. help='tmp directory used for collecting results from multiple '
  200. 'workers, available when gpu_collect is not specified')
  201. parser.add_argument(
  202. '--options', nargs='+', action=MultipleKVAction, help='custom options')
  203. parser.add_argument(
  204. '--launcher',
  205. choices=['none', 'pytorch', 'slurm', 'mpi'],
  206. default='none',
  207. help='job launcher')
  208. parser.add_argument('--local_rank', type=int, default=0)
  209. args = parser.parse_args()
  210. if 'LOCAL_RANK' not in os.environ:
  211. os.environ['LOCAL_RANK'] = str(args.local_rank)
  212. return args
  213. def main():
  214. args = parse_args()
  215. assert args.out or args.eval or args.format_only or args.show, \
  216. ('Please specify at least one operation (save/eval/format/show the '
  217. 'results) with the argument "--out", "--eval", "--format_only" '
  218. 'or "--show"')
  219. if args.eval and args.format_only:
  220. raise ValueError('--eval and --format_only cannot be both specified')
  221. if args.out is not None and not args.out.endswith(('.pkl', '.pickle')):
  222. raise ValueError('The output file must be a pkl file.')
  223. cfg = mmcv.Config.fromfile(args.config)
  224. # set cudnn_benchmark
  225. if cfg.get('cudnn_benchmark', False):
  226. torch.backends.cudnn.benchmark = True
  227. cfg.model.pretrained = None
  228. cfg.data.test.test_mode = True
  229. # init distributed env first, since logger depends on the dist info.
  230. if args.launcher == 'none':
  231. distributed = False
  232. else:
  233. distributed = True
  234. init_dist(args.launcher, **cfg.dist_params)
  235. # build the dataloader
  236. # TODO: support multiple images per gpu (only minor changes are needed)
  237. dataset = build_dataset(cfg.data.test)
  238. data_loader = build_dataloader(
  239. dataset,
  240. imgs_per_gpu=1,
  241. workers_per_gpu=cfg.data.workers_per_gpu,
  242. dist=distributed,
  243. shuffle=False)
  244. # build the model and load checkpoint
  245. model = build_detector(cfg.model, train_cfg=None, test_cfg=cfg.test_cfg)
  246. fp16_cfg = cfg.get('fp16', None)
  247. if fp16_cfg is not None:
  248. wrap_fp16_model(model)
  249. checkpoint = load_checkpoint(model, args.checkpoint, map_location='cpu')
  250. # old versions did not save class info in checkpoints, this walkaround is
  251. # for backward compatibility
  252. if 'CLASSES' in checkpoint['meta']:
  253. model.CLASSES = checkpoint['meta']['CLASSES']
  254. else:
  255. model.CLASSES = dataset.CLASSES
  256. if not distributed:
  257. model = MMDataParallel(model, device_ids=[0])
  258. outputs = single_gpu_test(model, data_loader, args.show)
  259. else:
  260. model = MMDistributedDataParallel(
  261. model.cuda(),
  262. device_ids=[torch.cuda.current_device()],
  263. broadcast_buffers=False)
  264. outputs = multi_gpu_test(model, data_loader, args.tmpdir,
  265. args.gpu_collect)
  266. rank, _ = get_dist_info()
  267. if rank == 0:
  268. if args.out:
  269. print('\nwriting results to {}'.format(args.out))
  270. mmcv.dump(outputs, args.out)
  271. kwargs = {} if args.options is None else args.options
  272. if args.format_only:
  273. dataset.format_results(outputs, **kwargs)
  274. if args.eval:
  275. dataset.evaluate(outputs, args.eval, **kwargs)
  276. if __name__ == '__main__':
  277. main()

代码分析

COCO数据分析

  • annotations.json 分析

以下分析 COCO 数据格式的数据,如果自己的数据,可能需要根据自己数据的需要拆分数据集。需要注意的是 annotations.json 文件的 annotations 字段必须要有以下几个字段,如果没有 segmentation 字段,需要添加 segmentation 字段,并赋值空列表。

  1. {'segmentation': [[312.29,
  2. 562.89,
  3. 402.25,
  4. 511.49,
  5. 400.96,
  6. 425.38,
  7. 398.39,...]],
  8. 'area': 54652.9556,
  9. 'iscrowd': 0,
  10. 'image_id': 480023,
  11. 'bbox': [116.95, 305.86, 285.3, 266.03],
  12. 'category_id': 58,
  13. 'id': 86}
  1. # 读取 annotations 文件
  2. import json
  3. with open('annotations.json') as f:
  4. a=json.load(f)
  5. a.keys() # dict_keys(['info', 'images', 'license', 'categories', 'annotations'])
  6. # 创建类别标签字典
  7. category_dic = dict([(i['id'],i['name']) for i in a['categories']])
  8. category_dic
  9. """ COCO 的 80 个类别
  10. {1: 'person',
  11. 2: 'bicycle',
  12. 3: 'car',
  13. 4: 'motorcycle',
  14. 5: 'airplane',
  15. 6: 'bus',
  16. 7: 'train',
  17. 8: 'truck',
  18. 9: 'boat',
  19. 10: 'traffic light',
  20. 11: 'fire hydrant',
  21. 13: 'stop sign', ...
  22. """
  23. # 统计每个类别数据量
  24. counts_label=dict([(i['name'],0) for i in a['categories']])
  25. for i in a['annotations']:
  26. counts_label[category_dic[i['category_id']]]+=1
  27. counts_label
  28. """
  29. {'person': 185316,
  30. 'bicycle': 4955,
  31. 'car': 30785,
  32. 'motorcycle': 6021,
  33. 'airplane': 3833,
  34. 'bus': 4327,
  35. 'train': 3159,
  36. 'truck': 7050,
  37. 'boat': 7590,
  38. 'traffic light': 9159 ...
  39. """
  40. a_copy = a.copy()
  41. a_copy["annotations"] = (list(filter(lambda x:x["category_id"]!=0, a["annotations"])))
  42. len(a_copy["annotations"])

训练主代码:tool/train.py

  1. from __future__ import division
  2. import os
  3. import sys
  4. import torch
  5. import argparse
  6. _FILE_PATH = os.path.dirname(os.path.abspath(__file__))
  7. sys.path.insert(0, os.path.join(_FILE_PATH, '../'))
  8. from mmdet import __version__
  9. from mmcv import Config
  10. from mmdet.apis import (get_root_logger, init_dist, set_random_seed, train_detector) # 所有部件的注册在此完成.
  11. from mmdet.datasets import build_dataset
  12. from mmdet.models import build_detector
  13. # 该函数用来获得命令行的各个参数
  14. def parse_args():
  15. # 创建一个解析对象
  16. parser = argparse.ArgumentParser(description='Train a detector')
  17. # add_argument向该对象中添加你要关注的命令行参数和选项, # help可以写帮助信息
  18. parser.add_argument('--config', help='train config file path',
  19. default='../configs/guided_anchoring/ga_rpn_r50_caffe_fpn_1x.py')
  20. # parser.add_argument('config', help='train config file path')
  21. parser.add_argument('--work_dir', help='the dir to save logs and models')
  22. parser.add_argument('--resume_from', help='the checkpoint file to resume from')
  23. # action表示值赋予键的方式,这里用到的是bool类型
  24. # 如果使用是分布式训练,且设置了 --validate,会在训练中建立 checkpoint 的时候对该 checkpoint 进行评估。
  25. #(未采用分布式训练时,--validate 无效,
  26. # 因为 train_detector 中调用的 mmdet.apis._non_dist_train 函数未对 validate 参数做任何处理)
  27. parser.add_argument('--validate', action='store_true', help='whether to evaluate the checkpoint during training')
  28. # 指使用的 GPU 数量,默认值为 1 颗
  29. parser.add_argument('--gpus', type=int, default=1, help='number of gpus to use'
  30. '(only applicable to non-distributed training)')
  31. # type指定参数类型
  32. parser.add_argument('--seed', type=int, default=None, help='random seed')
  33. # 分布式训练的任务启动器(job launcher),默认值为 none 表示不进行分布式训练;
  34. parser.add_argument('--launcher', choices=['none', 'pytorch', 'slurm', 'mpi'], default='none', help='job launcher')
  35. # 这个参数是torch.distributed.launch传递过来的,我们设置位置参数来接受,local_rank代表当前程序进程使用的GPU标号
  36. parser.add_argument('--local_rank', type=int, default=0)
  37. parser.add_argument('--autoscale-lr', action='store_true', help='automatically scale lr with the number of gpus')
  38. # 解析命令行, 这样后续的参数获取, 就可以通过args.xxx来获取, 比如:args.local_rank, args.work_dir
  39. args = parser.parse_args()
  40. # 分布式相关参数
  41. if 'LOCAL_RANK' not in os.environ:
  42. os.environ['LOCAL_RANK'] = str(args.local_rank)
  43. return args
  44. def main():
  45. # 获取命令行参数,实际上就是获取config配置文件
  46. args = parse_args()
  47. # 1. 从配置文件(python, yaml, json)解析配置信息, 并做适当更新, 包括预加载模型文件, 分布式相关等
  48. # 读取配置文件
  49. cfg = Config.fromfile(args.config)
  50. # set cudnn_benchmark
  51. # 在图片输入尺度固定时开启,可以加速,一般都是关的,只有在固定尺度的网络如SSD512中才开启.
  52. # 一般来讲, 应该遵循以下准则:
  53. # 1.如果网络的输入数据维度或类型上变化不大,设置torch.backends.cudnn.benchmark = true;
  54. # 2.可以增加运行效率;
  55. # 3.如果网络的输入数据在每次iteration都变化的话, 会导致cnDNN每次都会去寻找一遍最优配置,这样反而会降低运行效率。
  56. if cfg.get('cudnn_benchmark', False):
  57. torch.backends.cudnn.benchmark = True
  58. # update configs according to CLI args
  59. if args.work_dir is not None:
  60. # 创建工作目录存放训练文件,如果不键入,会自动从py配置文件中生成对应的目录,key为work_dir
  61. cfg.work_dir = args.work_dir
  62. if args.resume_from is not None:
  63. # 断点继续训练的权值文件,为None就没有这一步的设置
  64. cfg.resume_from = args.resume_from
  65. cfg.gpus = args.gpus
  66. if args.autoscale_lr:
  67. # apply the linear scaling rule (https://arxiv.org/abs/1706.02677)
  68. cfg.optimizer['lr'] = cfg.optimizer['lr'] * cfg.gpus / 8
  69. # init distributed env first, since logger depends on the dist info.
  70. if args.launcher == 'none':
  71. distributed = False
  72. else:
  73. distributed = True
  74. init_dist(args.launcher, **cfg.dist_params)
  75. # init logger before other steps
  76. logger = get_root_logger(cfg.log_level)
  77. # log_level在配置文件里有这个key,value=“INFO”训练一次batch就可以看到输出这个str, info表示输出等级
  78. logger.info('Distributed training: {}'.format(distributed))
  79. # set random seeds, 关于设置随机种子的原因,是为了能更好的benchmark实验。
  80. if args.seed is not None:
  81. logger.info('Set random seed to {}'.format(args.seed))
  82. set_random_seed(args.seed)
  83. # 2. 根据配置信息构建模型.build_detector函数调用build函数, build函数调用build_from_cfg(), 按type关键字从注册表中获取
  84. # 相应的模型对象, 并根据配置参数实例化对象(配置文件的模型参数只占了各模型构造参数的一小部分, 模型结构并非可以随意更改)
  85. # 这里将检测模型的组件: backbone, neck, detecthead, loss等一并构建好.
  86. model = build_detector(cfg.model, train_cfg=cfg.train_cfg, test_cfg=cfg.test_cfg)
  87. # 3. 数据集生成, 其中build_dataset()在mmdet/datasets/builder.py里实现, 这里同样是build_dataset()函数调用
  88. # build_from_cfg()函数实现.
  89. datasets = [build_dataset(cfg.data.train)]
  90. if len(cfg.workflow) == 2:
  91. datasets.append(build_dataset(cfg.data.val))
  92. if cfg.checkpoint_config is not None:
  93. # save mmdet version, config file content and class names in checkpoints as meta data
  94. # 将mmdet版本, 配置文件和检查点中的类别等保存, meta的含义表示源信息, 也即metarial的意思
  95. cfg.checkpoint_config.meta = dict(
  96. mmdet_version=__version__,
  97. config=cfg.text,
  98. CLASSES=datasets[0].CLASSES)
  99. # add an attribute for visualization convenience: 添加属性以方便可视化
  100. # model的CLASSES属性本来没有的, 但是 python 不用提前申明, 而是在赋值的时候自动定义变量。
  101. model.CLASSES = datasets[0].CLASSES # 这里为什么这么写, 是因为, 不知道配置的是什么数据集, 因此, 进行动态赋值。
  102. # 如果要训练自己的数据,需要修改文件,见『训练自己的数据部分』
  103. # 4. 检测器训练, 将build_detector()函数创建好的build模型model, build_dataset创建好的datasets, cfg文件等传递
  104. train_detector(model, datasets, cfg, distributed=distributed, validate=args.validate, logger=logger)
  105. if __name__ == '__main__':
  106. main()

配置文件

  • configs/faster_rcnn_r50_fpn_1x.py
  1. # https://zhuanlan.zhihu.com/p/102072353
  2. # model settings
  3. model = dict(
  4. type='FasterRCNN', # model 模型
  5. pretrained='torchvision://resnet50', # 预训练模型:imagenet-resnet50
  6. backbone=dict(
  7. type='ResNet', # backbone类型
  8. depth=50, # 网络层数
  9. num_stages=4, # resnet的stage数量
  10. out_indices=(0, 1, 2, 3), # 输出的stage的序号
  11. frozen_stages=1, # 冻结的stage数量,即该stage不更新参数,-1表示所有的stage都更新参数
  12. style='pytorch'), # 网络风格:如果设置pytorch,则stride为2的层是conv3x3的卷积层;如果设置caffe,则stride为2的层是第一个conv1x1的卷积层
  13. neck=dict(
  14. type='FPN',
  15. in_channels=[256, 512, 1024, 2048],
  16. out_channels=256,
  17. num_outs=5),
  18. rpn_head=dict(
  19. type='RPNHead',
  20. in_channels=256,
  21. feat_channels=256,
  22. anchor_scales=[8],
  23. anchor_ratios=[0.5, 1.0, 2.0],
  24. anchor_strides=[4, 8, 16, 32, 64],
  25. target_means=[.0, .0, .0, .0],
  26. target_stds=[1.0, 1.0, 1.0, 1.0],
  27. loss_cls=dict(
  28. type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
  29. loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0)),
  30. bbox_roi_extractor=dict(
  31. type='SingleRoIExtractor',
  32. roi_layer=dict(type='RoIAlign', out_size=7, sample_num=2),
  33. out_channels=256,
  34. featmap_strides=[4, 8, 16, 32]),
  35. bbox_head=dict(
  36. type='SharedFCBBoxHead',
  37. num_fcs=2,
  38. in_channels=256, # 输入通道数
  39. fc_out_channels=1024, # 输出通道数
  40. roi_feat_size=7,
  41. num_classes=11, # 分类器的类别数量+1,+1是因为多了一个背景的类别
  42. target_means=[0., 0., 0., 0.],
  43. target_stds=[0.1, 0.1, 0.2, 0.2],
  44. reg_class_agnostic=False,
  45. loss_cls=dict(
  46. type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
  47. loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0)))
  48. # model training and testing settings
  49. train_cfg = dict(
  50. rpn=dict(
  51. assigner=dict(
  52. type='MaxIoUAssigner',
  53. pos_iou_thr=0.7,
  54. neg_iou_thr=0.3,
  55. min_pos_iou=0.3,
  56. ignore_iof_thr=-1),
  57. sampler=dict(
  58. type='RandomSampler',
  59. num=256,
  60. pos_fraction=0.5,
  61. neg_pos_ub=-1,
  62. add_gt_as_proposals=False),
  63. allowed_border=0,
  64. pos_weight=-1,
  65. debug=False),
  66. rpn_proposal=dict(
  67. nms_across_levels=False,
  68. nms_pre=2000,
  69. nms_post=2000,
  70. max_num=2000,
  71. nms_thr=0.7,
  72. min_bbox_size=0),
  73. rcnn=dict(
  74. assigner=dict(
  75. type='MaxIoUAssigner',
  76. pos_iou_thr=0.5,
  77. neg_iou_thr=0.5,
  78. min_pos_iou=0.5,
  79. ignore_iof_thr=-1),
  80. sampler=dict(
  81. type='RandomSampler',
  82. num=512,
  83. pos_fraction=0.25,
  84. neg_pos_ub=-1,
  85. add_gt_as_proposals=True),
  86. pos_weight=-1,
  87. debug=False))
  88. test_cfg = dict(
  89. rpn=dict(
  90. nms_across_levels=False,
  91. nms_pre=1000,
  92. nms_post=1000,
  93. max_num=1000,
  94. nms_thr=0.7,
  95. min_bbox_size=0),
  96. rcnn=dict(
  97. score_thr=0.05, nms=dict(type='nms', iou_thr=0.5), max_per_img=100)
  98. # soft-nms is also supported for rcnn testing
  99. # e.g., nms=dict(type='soft_nms', iou_thr=0.5, min_score=0.05)
  100. )
  101. # dataset settings
  102. dataset_type = 'CocoDataset'
  103. data_root = "./data/"
  104. img_norm_cfg = dict(
  105. mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
  106. train_pipeline = [
  107. dict(type='LoadImageFromFile'),
  108. dict(type='LoadAnnotations', with_bbox=True),
  109. dict(type='Resize', img_scale=(492,658), keep_ratio=True),
  110. dict(type='RandomFlip', flip_ratio=0.5),
  111. dict(type='Normalize', **img_norm_cfg),
  112. dict(type='Pad', size_divisor=32),
  113. dict(type='DefaultFormatBundle'),
  114. dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
  115. ]
  116. test_pipeline = [
  117. dict(type='LoadImageFromFile'),
  118. dict(
  119. type='MultiScaleFlipAug',
  120. img_scale=(492,658), # 输入图像尺寸,最大边,最小边
  121. flip=False,
  122. transforms=[
  123. dict(type='Resize', keep_ratio=True),
  124. dict(type='RandomFlip'),
  125. dict(type='Normalize', **img_norm_cfg),
  126. dict(type='Pad', size_divisor=32),
  127. dict(type='ImageToTensor', keys=['img']),
  128. dict(type='Collect', keys=['img']),
  129. ])
  130. ]
  131. data = dict(
  132. imgs_per_gpu=8,
  133. workers_per_gpu=2,
  134. train=dict(
  135. type=dataset_type,
  136. ann_file=data_root + 'annotations/train_annotations.json',
  137. img_prefix=data_root + 'images/',
  138. pipeline=train_pipeline),
  139. val=dict(
  140. type=dataset_type,
  141. ann_file=data_root + 'annotations/val_annotations.json',
  142. img_prefix=data_root + 'images/',
  143. pipeline=test_pipeline),
  144. test=dict(
  145. type=dataset_type,
  146. ann_file=data_root + 'annotations/val_annotations.json',
  147. img_prefix=data_root + 'images/',
  148. pipeline=test_pipeline))
  149. # optimizer
  150. optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)
  151. optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
  152. # learning policy
  153. lr_config = dict(
  154. policy='step', # 优化策略
  155. warmup='linear', # 初始的学习率增加的策略,linear为线性增加
  156. warmup_iters=500, # 在初始的500次迭代中学习率逐渐增加
  157. warmup_ratio=1.0 / 3, # 起始的学习率
  158. step=[8, 11]) # 在第8和11个epoch时降低学习率
  159. checkpoint_config = dict(interval=1) # 每1个epoch存储一次模型
  160. # yapf:disable
  161. log_config = dict(
  162. interval=50, # 每50个batch输出一次信息
  163. hooks=[
  164. dict(type='TextLoggerHook'), # 控制台输出信息的风格
  165. # dict(type='TensorboardLoggerHook')
  166. ])
  167. # yapf:enable
  168. # runtime settings
  169. total_epochs = 12 # 最大epoch数
  170. dist_params = dict(backend='nccl') # 分布式参数
  171. log_level = 'INFO' # 输出信息的完整度级别
  172. work_dir = './work_dirs/faster_rcnn_r50_fpn_1x' # log文件和模型文件存储路径
  173. load_from = None # 加载模型的路径,None表示从预训练模型加载
  174. resume_from = None # 恢复训练模型的路径
  175. workflow = [('train', 1)] # 当前工作区名称

训练自己的数据

  • 修改 mmdet/datasets/coco.py
  1. class CocoDataset(CustomDataset):
  2. CLASSES = ('person', 'bicycle', 'car',...) # 修改成自己的类别
  3. def load_annotations(self, ann_file):
  4. self.coco = COCO(ann_file)
  5. self.cat_ids = self.coco.getCatIds()
  6. ....
  • 修改 mmdet/core/evaluation/class_names.py
  1. def coco_classes():
  2. return ['person', 'bicycle', ...] # 修改成自己的类别
  • 修改配置文件(包括 num_classes=类别数+1 )

优秀资源

mmdetection_modify.tar.gz