1127问题汇总 - 《图像处理の深度学习》

git代码地址：https://github.com/zldrobit/yolov5/tree/tf-android

导出pb文件和tflite文件（fp16和int8）

PYTHONPATH=. python models/tf.py --weights weights/yolov5s.pt --cfg models/yolov5s.yaml --img 320
PYTHONPATH=. python3  models/tf.py --weight weights/yolov5s.pt --cfg models/yolov5s.yaml --tfl-int8 --source /data/dataset/coco/coco2017/train2017 --ncalib 100

用导出的文件进行检测

python3 detect.py --weight weights/yolov5s.pb          --img 320
                           weights/yolov5s_saved_model
                           weights/yolov5s-fp16.tflite
                           weights/yolov5s-int8.tflite           --tfl-int8

或将tflite模型放到Android项目的asset文件夹中；然后替换该项目下DetectorFactory.jave中的图像输入大小以及对应的模型输出大小；最后build and run 该Android工程。

This reduces around 75% time of fp16 model inference on Snapdragon 820 CPU (4 threads) from 1.9s to 0.5s, 
                    70%                              on Snapdragon 820 GPU from 1.3s to 0.4s, 
                    70%      of int8                 on Snapdargon 820 CPU (4 threads) from 1.7s to lesser than 0.5s.

该代码已在pytorch1.8和TensorFlow2.4.0与2.4.1上测试成功
#今天再转的时候，先安装一下requirement

dataloaders中的auto的作用是将图像pad为正方形，默认为true。https://github.com/ultralytics/yolov5/pull/1127#issuecomment-708084192
- pytorch支持长宽不一样的图像推理检测，但是TensorFlow和tflite需要将图像填充到长宽一致后进行推理。
- int8的模型量化需要对经过预处理（调整大小/归一化）的图像进行校准，因此这时需要将auto设为false。
- auto=False是为了保证padded的图像尺寸等于letterbox中的newshape，使得the input image sizes after preprocess are the same。TensorFlow和tflite中的输入图像size是固定的，因此必须这样设置。https://github.com/ultralytics/yolov5/pull/1127#issuecomment-713223999
  - letterbox：在保持纵横比的前提下对图像做resize
rectangular inference (height != width)有助于推理速度的提升，pytorch是先将图像长边resize到指定长度，再将短边padding到32的倍数。
该工程中将产生yolov5s.tflite部分的代码被注释掉，原因是the default inference precision in TFLite Android is fp16。如果你希望得到该tflite模型，打开该注释即可。
—ncalib参数是用于int8模型量化的。
如果希望得到长宽不相等的模型，只需将参数—img设为对应的宽高即可，如--img 320 192
自己训练的yolov5所导出的tflite模型预测的位置有偏移。建议措施1：比较yaml文件和模型中的anchor（可通过以下命令进行打印，或者这个链接所提供的方法）。如果anchor不匹配，则需要用你训练时包含anchor的yaml文件，重新导出tflite即可。建议措施2：查看一下你训练模型使用的图像大小和导出tflite模型使用的图像大小是否相等。
```
print(torch.load('your_custom_model.pt')['model'].model[-1].anchors)
```
导出tflite模型大小问题。模型大小与类的数量无关；aldrobit提到了full模型的大小与fp16模型的大小应该接近，因为it is stored in fp16 precision。
- 但是提问者的best.pt size = 15.7 mb；best-fp16.tflite size = 15.6 mb；best-int8.tflite size = 8.3 mb。
- 我的best.pt size = 27 mb；best-fp16.tflite size = 7.06 mb；best-int8.tflite size = 7.15 mb。就觉得比较奇怪，但是在pc端模型预测结果除了部分漏检，其他没什么问题
2020.10.27号时，该代码只能为v2 / v3版本的YOLOv5提供转换功能。https://github.com/ultralytics/yolov5/pull/1127#issuecomment-717232187
从netron中可以看到tflite文件中带有很多‘Dequantize’操作，该操作是用于反量化fp16卷积权重，它们不会减慢对GPU的推论
- 这里我是否可以理解为，在cpu上运行fp16的模型时，会需要借助上述操作将fp16模型的权重进行反量化，最后以fp32模型在CPU上运行。
如果希望在安卓程序上运行tflite模型，建议带参数**--no-tfl-detect**。否则导出的模型智能在fp32/16的CPU模式下工作。
- 请使用以下代码导出能在GPU上运行的fp16模型
```
PYTHONPATH=. python3 models/tf.py --weight weights/yolov5s.pt --cfg models/yolov5s.yaml --no-tfl-detect
```
tflite文件中的包含4个输出：’identity’ 和’identity_1’ - ‘identity_3’ ，后者是检测层的三个输入，保留它们主要是为了便于调试。
NMS cannot be simply exported to TFLite model，但是作者在安卓工程中实现了nms；且将其增加至TF SavedModel and GraphDef中了。
- 这里的GraphDef是什么？请参考Tensorflow－GraphDef、MetaGraph、CheckPoint
- 不能直接添加nms到tflite模型的原因为：calculation errors of TFLite GPU Delegate and int8 quantization problem make exporting detection layer in TFLite unfeasible.
关于pytorch模型转tflite的两种路径
1. PyTorch> ONNX> TF> TFLite
2. PyTorch> TF（tf.keras TF 2.0 API）> TFLite
- Zldrobit认为第二种方法更好。因为第一种方式需要利用onnx-tensorflow中额外的transpose操作来交换NCHW和NHW通道顺序，这大约需要推理时间的1/3；此外，第一种方式导出来的代码时TensorFlow2的，这与TensorFlow1.x不兼容。
关于int8模型导出时需要用到的coco数据集2017。需从http://images.cocodataset.org/zips/val2017.zip或http://images.cocodataset.org/zips/train2017.zip（100G+可能需要大校准的大小）下载并解压缩图像
其他人推荐的 ncnn上运行的YOLOv5模型项目，带中文说明文档。