Tensorflow目标检测（一） - 《AI》

1.简介和安装
2.视频流目标检测
3.自定义目标检测

1.简介和安装

文档将介绍Tensorflow目标检测的API（link）。API可以用来检测图片或者视频流中的物体，可是是预先训练好的识别模型来检测物体，也可以是自己定义的需要分类的物体。首先，我们需要安装Tensorflow和相关的依赖。对于CPU的机器，直接安装 pip3 install tensorflow 就可以。对于使用GPU计算的机器，计算会更快，安装如下：

Ubuntu下安装 link
Windows下安装link

其他，可能需要提前安装的python3.7以及python 依赖库：

pip3 install pillow
pip3 install lxml
pip3 install jupyter
pip3 install matplotlib

接下来，需要从github上把Tensorflow 目标检测API项目克隆下来，或者直接下载zip包：

git clone https://github.com/tensorflow/models.git

模型文件下载下来以后，到指定目录，运行cmd:

windows机器需要下载 protoc-3.4.0-win32.zip(link），解压以后，bin目录下会有protoc的执行文件。

接下来，我们就可以在model目录下，用protoc来编译模型文件.proto：

"C:/Program Files/protoc/bin/protoc" object_detection/protos/*.proto --python_out=.

注意编译的目录位置

最后，在models/object_detection目录，打开终端terminal/cmd.exe，使用jupyter notebook，选择object_detection_tutorial.ipynb。选择run all，就可以得到如下结果：
Tensorflow目标检测（一） - 图1

2.视频流目标检测

本部分将包含如何基于示例代码，将其改造成基于摄像头识别的目标检测程序。首先，在object_detection目录下新建detect1.py文件。如果你就想要在notebook里面实现，也没有问题。
检测的机制是我们下载Google的预先训练模型ssd_mobilenet_v1_coco_11_06_2017 link，本模型预训练了大概90个分类，有人，茶杯，手提袋等物体的识别。下面的代码是从opencv来截取视频流，使用tensorflow加载ssd模型进行识别。
接下来，我们将使用opencv包来获取视频信息。如果还没有安装python opencv，可以参考link:

import cv2
cap = cv2.VideoCapture(0)

完整代码：

import os
import cv2
import numpy as np
import tensorflow as tf
from utils import label_map_util
from utils import visualization_utils as vis_util
cap = cv2.VideoCapture(0)
# http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_coco_11_06_2017.tar.gz
MODEL_NAME = 'ssd_mobilenet_v1_coco_11_06_2017'
MODEL_FILE = MODEL_NAME + '.tar.gz'
# Path to frozen detection graph. This is the actual model that is used for the object detection.
PATH_TO_CKPT = MODEL_NAME + '/frozen_inference_graph.pb'
# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = os.path.join('data', 'mscoco_label_map.pbtxt')
detection_graph = tf.Graph()
with detection_graph.as_default():
    od_graph_def = tf.GraphDef()
    with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
        serialized_graph = fid.read()
        od_graph_def.ParseFromString(serialized_graph)
        tf.import_graph_def(od_graph_def, name='')
category_index = label_map_util.create_category_index_from_labelmap(PATH_TO_LABELS, use_display_name=True)
with detection_graph.as_default():
    with tf.Session(graph=detection_graph) as sess:
        while True:
            ret, image_np = cap.read()
            # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
            image_np_expanded = np.expand_dims(image_np, axis=0)
            image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
            # Each box represents a part of the image where a particular object was detected.
            boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
            # Each score represent how level of confidence for each of the objects.
            # Score is shown on the result image, together with the class label.
            scores = detection_graph.get_tensor_by_name('detection_scores:0')
            classes = detection_graph.get_tensor_by_name('detection_classes:0')
            num_detections = detection_graph.get_tensor_by_name('num_detections:0')
            # Actual detection.
            (boxes, scores, classes, num_detections) = sess.run(
                [boxes, scores, classes, num_detections],
                feed_dict={image_tensor: image_np_expanded})
            # Visualization of the results of a detection.
            vis_util.visualize_boxes_and_labels_on_image_array(
                image_np,
                np.squeeze(boxes),
                np.squeeze(classes).astype(np.int32),
                np.squeeze(scores),
                category_index,
                use_normalized_coordinates=True,
                line_thickness=8)
            cv2.imshow('object detection', cv2.resize(image_np, (800, 600)))
            # 按q退出程序
            if cv2.waitKey(25) & 0xFF == ord('q'):
                cv2.destroyAllWindows()
                break

3.自定义目标检测

本示例，我们将训练模型来识别吊车(crane)。首先可以先从百度上搜到一些吊车的图片样本。样本越多，模型识别当然越准确，本例只是演示，只下载十几张图片来说明。首先，进入models/research目录，本地安装object_detection的api:

进入models/research目录，执行python setup.py install 本地安装object_detection api库。

新建一个目录object_detection，然后我们将训练相关的内容都暂时放在这里面：
本部分全部完成以后目录结构如下：

images里面的图片可以选取感兴趣的需要分类的图片，比如吊车：

3.1 标注数据集

本文是基于有监督的分类学习算法，自然要对样本进行分类。图片下载好以后，我们要在图片上来标注哪个是吊车，也就是打上label。打label的工具用的是LabelImagelink。将标注的xml文件和图片保存到images目录。
安装labelimage这个工具的时候，你可能还需要安装一些依赖库：

pip3 install sip
apt-get install pyqt5-dev-tools
pip3 install -r requirements/requirements-linux-python3.txt
make qt5py3
pip3 install pandas
python3 labelImg.py
pip3 install pycocotools

打开文件夹为图片打标签：

3.2 构建训练的tfrecord文件

3.2.1.下载工具xml_to_csv工具

下载 https://github.com/datitran/raccoon_dataset 里面的xml_to_csv.py文件，即将xml转化为csv文件的工具基础代码
简单改造一下，指定本地的data目录：

import os
import glob
import pandas as pd
import xml.etree.ElementTree as ET
def xml_to_csv(path):
    xml_list = []
    for xml_file in glob.glob(path + '/*.xml'):
        tree = ET.parse(xml_file)
        root = tree.getroot()
        for member in root.findall('object'):
            value = (root.find('filename').text,
                     int(root.find('size')[0].text),
                     int(root.find('size')[1].text),
                     member[0].text,
                     int(member[4][0].text),
                     int(member[4][1].text),
                     int(member[4][2].text),
                     int(member[4][3].text)
                     )
            xml_list.append(value)
    column_name = ['filename', 'width', 'height', 'class', 'xmin', 'ymin', 'xmax', 'ymax']
    xml_df = pd.DataFrame(xml_list, columns=column_name)
    return xml_df
def main():
    for directory in ['train', 'test']:
        image_path = os.path.join(os.getcwd(),'images/{}'.format(directory))
        xml_df = xml_to_csv(image_path)
        xml_df.to_csv('data/{}_labels.csv'.format(directory), index=None)
        print('Successfully converted xml to csv.')
main()

3.2.2 将csv文件转化为tfrecord文件

python3 generate_tfrecord.py —csv_input=data/test_labels.csv —output_path=data/test.record —image_dir=images/test

结果如下图：

3.3 模型训练

本部分，我们开始来训练自定义的模型算法。为此，我们需要图片训练集TFRecords和测试集TFRecords，其次我们需要模型配置文件和基础模型算法文件。这样，我们就可以利用tensorflow的转移学习基于一些预训练好的模型来继续训练出我们需要的算法(链接link)。转移学习可以快速得到训练算法，所需要的训练数据也相对较少。Tensorflow有不少预训练模型和模型对应的配置文件，本示例就选用了ssd_mobilenet_v1_coco_11_06_2017模型和相应的配置文件ssd_mobilenet_v1_pets.config：

3.3.1 预训练好的model模型文件

模型文件封装了识别用的基础算法，基于基础算法，我们利用object_detection api来训练我们感兴趣的算法模型。直接下载模型：

wget http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_coco_11_06_2017.tar.gz

下载下来，解压到ssd_mobilenet_v1_coco_11_06_2017目录下：

3.3.2 模型配置文件 configuration file

wget https://raw.githubusercontent.com/tensorflow/models/master/object_detection/samples/configs/ssd_mobilenet_v1_pets.config

将模型文件下载下来放到training目录下：

原版的配置文件中，首先要搜索PATH_TO_BE_CONFIGURED，并且修改掉。
还需要修改batch size，这里设为24，其他的模型可能需要不同大小的批处理大小。如果出现内存错误，你可能就需要减小batch size。最后，还需要修改checkpoint的路径，num_classes修改为1，num_examples修改为12，以及一个label_map_path: “training/object-detect.pbtxt”

# SSD with Mobilenet v1, configured for Oxford-IIIT Pets Dataset.
# Users should configure the fine_tune_checkpoint field in the train config as
# well as the label_map_path and input_path fields in the train_input_reader and
# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that
# should be configured.
model {
  ssd {
      # 将分类的数目设置为1
    num_classes: 1
    box_coder {
      faster_rcnn_box_coder {
        y_scale: 10.0
        x_scale: 10.0
        height_scale: 5.0
        width_scale: 5.0
      }
    }
    matcher {
      argmax_matcher {
        matched_threshold: 0.5
        unmatched_threshold: 0.5
        ignore_thresholds: false
        negatives_lower_than_unmatched: true
        force_match_for_each_row: true
      }
    }
    similarity_calculator {
      iou_similarity {
      }
    }
    anchor_generator {
      ssd_anchor_generator {
        num_layers: 6
        min_scale: 0.2
        max_scale: 0.95
        aspect_ratios: 1.0
        aspect_ratios: 2.0
        aspect_ratios: 0.5
        aspect_ratios: 3.0
        aspect_ratios: 0.3333
      }
    }
    image_resizer {
      fixed_shape_resizer {
        height: 300
        width: 300
      }
    }
    box_predictor {
      convolutional_box_predictor {
        min_depth: 0
        max_depth: 0
        num_layers_before_predictor: 0
        use_dropout: false
        dropout_keep_probability: 0.8
        kernel_size: 1
        box_code_size: 4
        apply_sigmoid_to_scores: false
        conv_hyperparams {
          activation: RELU_6,
          regularizer {
            l2_regularizer {
              weight: 0.00004
            }
          }
          initializer {
            truncated_normal_initializer {
              stddev: 0.03
              mean: 0.0
            }
          }
          batch_norm {
            train: true,
            scale: true,
            center: true,
            decay: 0.9997,
            epsilon: 0.001,
          }
        }
      }
    }
    feature_extractor {
      type: 'ssd_mobilenet_v1'
      min_depth: 16
      depth_multiplier: 1.0
      conv_hyperparams {
        activation: RELU_6,
        regularizer {
          l2_regularizer {
            weight: 0.00004
          }
        }
        initializer {
          truncated_normal_initializer {
            stddev: 0.03
            mean: 0.0
          }
        }
        batch_norm {
          train: true,
          scale: true,
          center: true,
          decay: 0.9997,
          epsilon: 0.001,
        }
      }
    }
    loss {
      classification_loss {
        weighted_sigmoid {
        }
      }
      localization_loss {
        weighted_smooth_l1 {
        }
      }
      hard_example_miner {
        num_hard_examples: 3000
        iou_threshold: 0.99
        loss_type: CLASSIFICATION
        max_negatives_per_positive: 3
        min_negatives_per_image: 0
      }
      classification_weight: 1.0
      localization_weight: 1.0
    }
    normalize_loss_by_num_matches: true
    post_processing {
      batch_non_max_suppression {
        score_threshold: 1e-8
        iou_threshold: 0.6
        max_detections_per_class: 100
        max_total_detections: 100
      }
      score_converter: SIGMOID
    }
  }
}
train_config: {
  batch_size: 24
  optimizer {
    rms_prop_optimizer: {
      learning_rate: {
        exponential_decay_learning_rate {
          initial_learning_rate: 0.004
          decay_steps: 800720
          decay_factor: 0.95
        }
      }
      momentum_optimizer_value: 0.9
      decay: 0.9
      epsilon: 1.0
    }
  }
  fine_tune_checkpoint: "ssd_mobilenet_v1_coco_11_06_2017/model.ckpt"
  from_detection_checkpoint: true
  load_all_detection_checkpoint_vars: true
  # Note: The below line limits the training process to 200K steps, which we
  # empirically found to be sufficient enough to train the pets dataset. This
  # effectively bypasses the learning rate schedule (the learning rate will
  # never decay). Remove the below line to train indefinitely.
  num_steps: 200000
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  data_augmentation_options {
    ssd_random_crop {
    }
  }
}
train_input_reader: {
  tf_record_input_reader {
    input_path: "data/train.record"
  }
  label_map_path: "training/object-detection.pbtxt"
}
eval_config: {
  metrics_set: "coco_detection_metrics"
  num_examples: 12
}
eval_input_reader: {
  tf_record_input_reader {
    input_path: "data/test.record"
  }
  label_map_path: "training/object-detection.pbtxt"
  shuffle: false
  num_readers: 1
}

training目录下的object-detect.pbtxt的内容：

item {
  id: 1
  name: 'macncheese'
}

models/research目录下：

sudo python3 setup.py build
sudo python3 setup.py install
export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim

3.3.3 训练

接下来，激动人心的时刻到了，开始训练！

python3 model_main.py --logtostderr --train_dir=training/ --pipeline_config_path=training/ssd_mobilenet_v1_pets.config

见到如下画面说明成功训练中:

训练一段时间，输入如下:

从Tensorboard看一下训练结果的total_loss:

3.4 模型测试

本部分，我们将测试训练好的模型，来看看是否和预期的一致。首先我们要导出模型计算图graph文件，以便测试程序可以直接引入使用。我们可以利用object_detection里面的export_inference_graph.py，只需要传递checkpoint和pipeline config文件就可以导出graph文件了。

python3 export_inference_graph.py \
    --input_type image_tensor \
    --pipeline_config_path training/ssd_mobilenet_v1_pets.config \
    --trained_checkpoint_prefix training/model.ckpt-10856 \
    --output_directory crane_inference_graph