离线量化

以下模型均来自torchvision预训练模型,采用mnncompress中LSQ的offline模式进行量化,batch size为64,用10个batch,640张训练图片进行量化,表中标的模型表示量化时跳过了第一层或者前两层不量化,以提升精度。测速均采用*华为P20 Pro 单线程

模型 原始模型指标 压缩模型指标 ARMV7 (ms) ARMV8 (ms)
ResNet-18 69.758%,45M 69.740%,12M 196.6 —> 208.8 187.6 —> 167.0
ResNet-50 76.130%,98M 76.030%,25M 606.0 —> 470.5 550.8 —> 379.6
SqueezeNet 1.0 58.092%,4.8M 57.800%,1.3M 122.1 —> 104.1 120.8 —> 88.3
ShuffleNet V2 x1.0 69.362%,8.7M 68.616%,2.3M 33.3 —> 33.8 29.0 —> 26.0
MobileNet V2 71.878%,14M 71.150%,3.5M 69.1 —> 50.2 62.2 —> 42.1
*MobileNet V3 Large 74.042%,21M 73.030%,5.4M 68.6 —> 63.1 64.9 —> 50.1
MNASNet 1.0 73.456%,17M 72.692%,4.3M 70.6 —> 51.2 63.0 —> 42.9
*EfficientNet-B0 77.692%,21M 70.486%,5.3M 134.3 —> 113.0 128.7 —> 100.4
EfficientNet-B1 78.642%,30M 73.546%,7.8M 199.5 —> 166.9 185.2 —> 145.9
regnet_x_400mf 72.834%,21.3M 72.660%,8.0M 83.2 —> 67.6 75.1 —> 58.0

训练量化

训练量化用来提升量化的精度,其速度和离线量化版本模型一致。建议优先使用离线量化方法,精度不够的情况下再使用训练量化方法。以下数据中的标准模型来自torchvision预训练模型,有些模型如efficientnet的训练成本较大,故未给出训练量化版本。用户自己训练的模型知道训练参数,结果一般可以更好。

模型 原始模型指标 压缩模型指标 备注
ResNet-18 69.758%,45M 69.840%,12M 训练一个epoch即可
MobileNet V2 71.878%,14M 71.762%,3.5M github: pytorch/vision reference/classification,复现命令:
torchrun —nproc_per_node=8 train.py —model mobilenet_v2 —data-path /mnt/data/ —epochs 100 —lr 0.01 —wd 0.00004 —lr-step-size 30 —lr-gamma 0.1 —pretrained —quant —sync-bn -b 128
*MobileNet V3 Large 74.042%,21M 73.924%,5.4M 跳过第一层,github: pytorch/vision reference/classification,复现命令:
torchrun —nproc_per_node=8 train.py —opt rmsprop —auto-augment imagenet —random-erase 0.2 —model mobilenet_v3_large —data-path /mnt/data/ —epochs 100 —batch-size 128 —lr 0.01 —wd 0.00001 —lr-step-size 30 —lr-gamma 0.1 —pretrained —quant —sync-bn

剪枝

模型 原始模型指标 压缩模型指标 ARMV7 (ms) ARMV8 (ms)
MobileNet V2 71.878%,14M 71.272%,2.8M,
50% SIMDOC稀疏
69.1 —> 64.9 62.2 —> 58.8
*MobileNet V3 Large 74.042%,21M 73.568%,4.2M
50% SIMDOC稀疏
68.6 —> 66.5 64.9 —> 62.3

低秩分解

模型 原始模型指标 压缩模型指标 ARMV7 (ms) ARMV8 (ms)
MobileNet V2 71.878%,14M 69.874%,11M 69.1 —> 60.6 62.2 —> 54.6
MobileNet V3 Large 74.042%,21M 72.748%,18M 68.6 —> 59.7 64.9 —> 55.7

用户案例

模型 压缩方案 原始模型指标 压缩模型指标
神经渲染relight模型 低秩分解 30.1,138M,238ms 29.98,58M,169ms / 29.68,17M,116ms
语音识别AOA V3 EMA训练量化 cer 18.3,50.6M cer 18.26,18.7M