1.简介和安装

文档将介绍Tensorflow目标检测的API(link)。API可以用来检测图片或者视频流中的物体,可是是预先训练好的识别模型来检测物体,也可以是自己定义的需要分类的物体。首先,我们需要安装Tensorflow和相关的依赖。对于CPU的机器,直接安装 pip3 install tensorflow 就可以。对于使用GPU计算的机器,计算会更快,安装如下:

  • Ubuntu下安装 link
  • Windows下安装link

其他,可能需要提前安装的python3.7以及python 依赖库:

  1. pip3 install pillow
  2. pip3 install lxml
  3. pip3 install jupyter
  4. pip3 install matplotlib

接下来,需要从github上把Tensorflow 目标检测API项目克隆下来,或者直接下载zip包:

  1. git clone https://github.com/tensorflow/models.git

模型文件下载下来以后,到指定目录,运行cmd:

windows机器需要下载 protoc-3.4.0-win32.zip(link),解压以后,bin目录下会有protoc的执行文件。

接下来,我们就可以在model目录下,用protoc来编译模型文件.proto:

  1. "C:/Program Files/protoc/bin/protoc" object_detection/protos/*.proto --python_out=.

注意编译的目录位置

最后,在models/object_detection目录,打开终端terminal/cmd.exe,使用jupyter notebook,选择object_detection_tutorial.ipynb。选择run all,就可以得到如下结果:
Tensorflow目标检测(一) - 图1

2.视频流目标检测

本部分将包含如何基于示例代码,将其改造成基于摄像头识别的目标检测程序。首先,在object_detection目录下新建detect1.py文件。如果你就想要在notebook里面实现,也没有问题。
检测的机制是我们下载Google的预先训练模型ssd_mobilenet_v1_coco_11_06_2017 link,本模型预训练了大概90个分类,有人,茶杯,手提袋等物体的识别。下面的代码是从opencv来截取视频流,使用tensorflow加载ssd模型进行识别。
接下来,我们将使用opencv包来获取视频信息。如果还没有安装python opencv,可以参考link:

  1. import cv2
  2. cap = cv2.VideoCapture(0)

完整代码:

  1. import os
  2. import cv2
  3. import numpy as np
  4. import tensorflow as tf
  5. from utils import label_map_util
  6. from utils import visualization_utils as vis_util
  7. cap = cv2.VideoCapture(0)
  8. # http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_coco_11_06_2017.tar.gz
  9. MODEL_NAME = 'ssd_mobilenet_v1_coco_11_06_2017'
  10. MODEL_FILE = MODEL_NAME + '.tar.gz'
  11. # Path to frozen detection graph. This is the actual model that is used for the object detection.
  12. PATH_TO_CKPT = MODEL_NAME + '/frozen_inference_graph.pb'
  13. # List of the strings that is used to add correct label for each box.
  14. PATH_TO_LABELS = os.path.join('data', 'mscoco_label_map.pbtxt')
  15. detection_graph = tf.Graph()
  16. with detection_graph.as_default():
  17. od_graph_def = tf.GraphDef()
  18. with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
  19. serialized_graph = fid.read()
  20. od_graph_def.ParseFromString(serialized_graph)
  21. tf.import_graph_def(od_graph_def, name='')
  22. category_index = label_map_util.create_category_index_from_labelmap(PATH_TO_LABELS, use_display_name=True)
  23. with detection_graph.as_default():
  24. with tf.Session(graph=detection_graph) as sess:
  25. while True:
  26. ret, image_np = cap.read()
  27. # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
  28. image_np_expanded = np.expand_dims(image_np, axis=0)
  29. image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
  30. # Each box represents a part of the image where a particular object was detected.
  31. boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
  32. # Each score represent how level of confidence for each of the objects.
  33. # Score is shown on the result image, together with the class label.
  34. scores = detection_graph.get_tensor_by_name('detection_scores:0')
  35. classes = detection_graph.get_tensor_by_name('detection_classes:0')
  36. num_detections = detection_graph.get_tensor_by_name('num_detections:0')
  37. # Actual detection.
  38. (boxes, scores, classes, num_detections) = sess.run(
  39. [boxes, scores, classes, num_detections],
  40. feed_dict={image_tensor: image_np_expanded})
  41. # Visualization of the results of a detection.
  42. vis_util.visualize_boxes_and_labels_on_image_array(
  43. image_np,
  44. np.squeeze(boxes),
  45. np.squeeze(classes).astype(np.int32),
  46. np.squeeze(scores),
  47. category_index,
  48. use_normalized_coordinates=True,
  49. line_thickness=8)
  50. cv2.imshow('object detection', cv2.resize(image_np, (800, 600)))
  51. # 按q退出程序
  52. if cv2.waitKey(25) & 0xFF == ord('q'):
  53. cv2.destroyAllWindows()
  54. break

3.自定义目标检测

本示例,我们将训练模型来识别吊车(crane)。首先可以先从百度上搜到一些吊车的图片样本。样本越多,模型识别当然越准确,本例只是演示,只下载十几张图片来说明。首先,进入models/research目录,本地安装object_detection的api:

  1. 进入models/research目录,执行python setup.py install 本地安装object_detection api库。

新建一个目录object_detection,然后我们将训练相关的内容都暂时放在这里面:
本部分全部完成以后目录结构如下:
image.png

images里面的图片可以选取感兴趣的需要分类的图片,比如吊车:

1.jpg

3.1 标注数据集

本文是基于有监督的分类学习算法,自然要对样本进行分类。图片下载好以后,我们要在图片上来标注哪个是吊车,也就是打上label。打label的工具用的是LabelImagelink。将标注的xml文件和图片保存到images目录。
安装labelimage这个工具的时候,你可能还需要安装一些依赖库:

  1. pip3 install sip
  2. apt-get install pyqt5-dev-tools
  3. pip3 install -r requirements/requirements-linux-python3.txt
  4. make qt5py3
  5. pip3 install pandas
  6. python3 labelImg.py
  7. pip3 install pycocotools

打开文件夹为图片打标签:
image.png

3.2 构建训练的tfrecord文件

3.2.1.下载工具xml_to_csv工具

下载 https://github.com/datitran/raccoon_dataset 里面的xml_to_csv.py文件,即将xml转化为csv文件的工具基础代码
简单改造一下,指定本地的data目录:

  1. import os
  2. import glob
  3. import pandas as pd
  4. import xml.etree.ElementTree as ET
  5. def xml_to_csv(path):
  6. xml_list = []
  7. for xml_file in glob.glob(path + '/*.xml'):
  8. tree = ET.parse(xml_file)
  9. root = tree.getroot()
  10. for member in root.findall('object'):
  11. value = (root.find('filename').text,
  12. int(root.find('size')[0].text),
  13. int(root.find('size')[1].text),
  14. member[0].text,
  15. int(member[4][0].text),
  16. int(member[4][1].text),
  17. int(member[4][2].text),
  18. int(member[4][3].text)
  19. )
  20. xml_list.append(value)
  21. column_name = ['filename', 'width', 'height', 'class', 'xmin', 'ymin', 'xmax', 'ymax']
  22. xml_df = pd.DataFrame(xml_list, columns=column_name)
  23. return xml_df
  24. def main():
  25. for directory in ['train', 'test']:
  26. image_path = os.path.join(os.getcwd(),'images/{}'.format(directory))
  27. xml_df = xml_to_csv(image_path)
  28. xml_df.to_csv('data/{}_labels.csv'.format(directory), index=None)
  29. print('Successfully converted xml to csv.')
  30. main()

image.png

3.2.2 将csv文件转化为tfrecord文件

python3 generate_tfrecord.py —csv_input=data/test_labels.csv —output_path=data/test.record —image_dir=images/test

结果如下图:
image.png

3.3 模型训练

本部分,我们开始来训练自定义的模型算法。为此,我们需要图片训练集TFRecords和测试集TFRecords,其次我们需要模型配置文件和基础模型算法文件。这样,我们就可以利用tensorflow的转移学习基于一些预训练好的模型来继续训练出我们需要的算法(链接link)。转移学习可以快速得到训练算法,所需要的训练数据也相对较少。Tensorflow有不少预训练模型和模型对应的配置文件,本示例就选用了ssd_mobilenet_v1_coco_11_06_2017模型和相应的配置文件ssd_mobilenet_v1_pets.config:

3.3.1 预训练好的model模型文件

模型文件封装了识别用的基础算法,基于基础算法,我们利用object_detection api来训练我们感兴趣的算法模型 。直接下载模型:

  1. wget http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_coco_11_06_2017.tar.gz

下载下来,解压到ssd_mobilenet_v1_coco_11_06_2017目录下:

3.3.2 模型配置文件 configuration file

  1. wget https://raw.githubusercontent.com/tensorflow/models/master/object_detection/samples/configs/ssd_mobilenet_v1_pets.config

将模型文件下载下来放到training目录下:

原版的配置文件中,首先要搜索PATH_TO_BE_CONFIGURED,并且修改掉。
还需要修改batch size,这里设为24,其他的模型可能需要不同大小的批处理大小。如果出现内存错误,你可能就需要减小batch size。最后,还需要修改checkpoint的路径,num_classes修改为1,num_examples修改为12,以及一个label_map_path: “training/object-detect.pbtxt”

  1. # SSD with Mobilenet v1, configured for Oxford-IIIT Pets Dataset.
  2. # Users should configure the fine_tune_checkpoint field in the train config as
  3. # well as the label_map_path and input_path fields in the train_input_reader and
  4. # eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that
  5. # should be configured.
  6. model {
  7. ssd {
  8. # 将分类的数目设置为1
  9. num_classes: 1
  10. box_coder {
  11. faster_rcnn_box_coder {
  12. y_scale: 10.0
  13. x_scale: 10.0
  14. height_scale: 5.0
  15. width_scale: 5.0
  16. }
  17. }
  18. matcher {
  19. argmax_matcher {
  20. matched_threshold: 0.5
  21. unmatched_threshold: 0.5
  22. ignore_thresholds: false
  23. negatives_lower_than_unmatched: true
  24. force_match_for_each_row: true
  25. }
  26. }
  27. similarity_calculator {
  28. iou_similarity {
  29. }
  30. }
  31. anchor_generator {
  32. ssd_anchor_generator {
  33. num_layers: 6
  34. min_scale: 0.2
  35. max_scale: 0.95
  36. aspect_ratios: 1.0
  37. aspect_ratios: 2.0
  38. aspect_ratios: 0.5
  39. aspect_ratios: 3.0
  40. aspect_ratios: 0.3333
  41. }
  42. }
  43. image_resizer {
  44. fixed_shape_resizer {
  45. height: 300
  46. width: 300
  47. }
  48. }
  49. box_predictor {
  50. convolutional_box_predictor {
  51. min_depth: 0
  52. max_depth: 0
  53. num_layers_before_predictor: 0
  54. use_dropout: false
  55. dropout_keep_probability: 0.8
  56. kernel_size: 1
  57. box_code_size: 4
  58. apply_sigmoid_to_scores: false
  59. conv_hyperparams {
  60. activation: RELU_6,
  61. regularizer {
  62. l2_regularizer {
  63. weight: 0.00004
  64. }
  65. }
  66. initializer {
  67. truncated_normal_initializer {
  68. stddev: 0.03
  69. mean: 0.0
  70. }
  71. }
  72. batch_norm {
  73. train: true,
  74. scale: true,
  75. center: true,
  76. decay: 0.9997,
  77. epsilon: 0.001,
  78. }
  79. }
  80. }
  81. }
  82. feature_extractor {
  83. type: 'ssd_mobilenet_v1'
  84. min_depth: 16
  85. depth_multiplier: 1.0
  86. conv_hyperparams {
  87. activation: RELU_6,
  88. regularizer {
  89. l2_regularizer {
  90. weight: 0.00004
  91. }
  92. }
  93. initializer {
  94. truncated_normal_initializer {
  95. stddev: 0.03
  96. mean: 0.0
  97. }
  98. }
  99. batch_norm {
  100. train: true,
  101. scale: true,
  102. center: true,
  103. decay: 0.9997,
  104. epsilon: 0.001,
  105. }
  106. }
  107. }
  108. loss {
  109. classification_loss {
  110. weighted_sigmoid {
  111. }
  112. }
  113. localization_loss {
  114. weighted_smooth_l1 {
  115. }
  116. }
  117. hard_example_miner {
  118. num_hard_examples: 3000
  119. iou_threshold: 0.99
  120. loss_type: CLASSIFICATION
  121. max_negatives_per_positive: 3
  122. min_negatives_per_image: 0
  123. }
  124. classification_weight: 1.0
  125. localization_weight: 1.0
  126. }
  127. normalize_loss_by_num_matches: true
  128. post_processing {
  129. batch_non_max_suppression {
  130. score_threshold: 1e-8
  131. iou_threshold: 0.6
  132. max_detections_per_class: 100
  133. max_total_detections: 100
  134. }
  135. score_converter: SIGMOID
  136. }
  137. }
  138. }
  139. train_config: {
  140. batch_size: 24
  141. optimizer {
  142. rms_prop_optimizer: {
  143. learning_rate: {
  144. exponential_decay_learning_rate {
  145. initial_learning_rate: 0.004
  146. decay_steps: 800720
  147. decay_factor: 0.95
  148. }
  149. }
  150. momentum_optimizer_value: 0.9
  151. decay: 0.9
  152. epsilon: 1.0
  153. }
  154. }
  155. fine_tune_checkpoint: "ssd_mobilenet_v1_coco_11_06_2017/model.ckpt"
  156. from_detection_checkpoint: true
  157. load_all_detection_checkpoint_vars: true
  158. # Note: The below line limits the training process to 200K steps, which we
  159. # empirically found to be sufficient enough to train the pets dataset. This
  160. # effectively bypasses the learning rate schedule (the learning rate will
  161. # never decay). Remove the below line to train indefinitely.
  162. num_steps: 200000
  163. data_augmentation_options {
  164. random_horizontal_flip {
  165. }
  166. }
  167. data_augmentation_options {
  168. ssd_random_crop {
  169. }
  170. }
  171. }
  172. train_input_reader: {
  173. tf_record_input_reader {
  174. input_path: "data/train.record"
  175. }
  176. label_map_path: "training/object-detection.pbtxt"
  177. }
  178. eval_config: {
  179. metrics_set: "coco_detection_metrics"
  180. num_examples: 12
  181. }
  182. eval_input_reader: {
  183. tf_record_input_reader {
  184. input_path: "data/test.record"
  185. }
  186. label_map_path: "training/object-detection.pbtxt"
  187. shuffle: false
  188. num_readers: 1
  189. }

training目录下的object-detect.pbtxt的内容:

  1. item {
  2. id: 1
  3. name: 'macncheese'
  4. }

models/research目录下:

  1. sudo python3 setup.py build
  2. sudo python3 setup.py install
  3. export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim

3.3.3 训练

接下来,激动人心的时刻到了,开始训练!

  1. python3 model_main.py --logtostderr --train_dir=training/ --pipeline_config_path=training/ssd_mobilenet_v1_pets.config

见到如下画面说明成功训练中:

image.png
训练一段时间,输入如下:

image.png

从Tensorboard看一下训练结果的total_loss:
image.png

3.4 模型测试

本部分,我们将测试训练好的模型,来看看是否和预期的一致。首先我们要导出模型计算图graph文件,以便测试程序可以直接引入使用。我们可以利用object_detection里面的export_inference_graph.py,只需要传递checkpoint和pipeline config文件就可以导出graph文件了。

  1. python3 export_inference_graph.py \
  2. --input_type image_tensor \
  3. --pipeline_config_path training/ssd_mobilenet_v1_pets.config \
  4. --trained_checkpoint_prefix training/model.ckpt-10856 \
  5. --output_directory crane_inference_graph