离线量化
以下模型均来自torchvision预训练模型,采用mnncompress中LSQ的offline模式进行量化,batch size为64,用10个batch,640张训练图片进行量化,表中标的模型表示量化时跳过了第一层或者前两层不量化,以提升精度。测速均采用*华为P20 Pro 单线程。
模型 | 原始模型指标 | 压缩模型指标 | ARMV7 (ms) | ARMV8 (ms) |
---|---|---|---|---|
ResNet-18 | 69.758%,45M | 69.740%,12M | 196.6 —> 208.8 | 187.6 —> 167.0 |
ResNet-50 | 76.130%,98M | 76.030%,25M | 606.0 —> 470.5 | 550.8 —> 379.6 |
SqueezeNet 1.0 | 58.092%,4.8M | 57.800%,1.3M | 122.1 —> 104.1 | 120.8 —> 88.3 |
ShuffleNet V2 x1.0 | 69.362%,8.7M | 68.616%,2.3M | 33.3 —> 33.8 | 29.0 —> 26.0 |
MobileNet V2 | 71.878%,14M | 71.150%,3.5M | 69.1 —> 50.2 | 62.2 —> 42.1 |
*MobileNet V3 Large | 74.042%,21M | 73.030%,5.4M | 68.6 —> 63.1 | 64.9 —> 50.1 |
MNASNet 1.0 | 73.456%,17M | 72.692%,4.3M | 70.6 —> 51.2 | 63.0 —> 42.9 |
*EfficientNet-B0 | 77.692%,21M | 70.486%,5.3M | 134.3 —> 113.0 | 128.7 —> 100.4 |
EfficientNet-B1 | 78.642%,30M | 73.546%,7.8M | 199.5 —> 166.9 | 185.2 —> 145.9 |
regnet_x_400mf | 72.834%,21.3M | 72.660%,8.0M | 83.2 —> 67.6 | 75.1 —> 58.0 |
训练量化
训练量化用来提升量化的精度,其速度和离线量化版本模型一致。建议优先使用离线量化方法,精度不够的情况下再使用训练量化方法。以下数据中的标准模型来自torchvision预训练模型,有些模型如efficientnet的训练成本较大,故未给出训练量化版本。用户自己训练的模型知道训练参数,结果一般可以更好。
模型 | 原始模型指标 | 压缩模型指标 | 备注 |
---|---|---|---|
ResNet-18 | 69.758%,45M | 69.840%,12M | 训练一个epoch即可 |
MobileNet V2 | 71.878%,14M | 71.762%,3.5M | github: pytorch/vision reference/classification,复现命令: torchrun —nproc_per_node=8 train.py —model mobilenet_v2 —data-path /mnt/data/ —epochs 100 —lr 0.01 —wd 0.00004 —lr-step-size 30 —lr-gamma 0.1 —pretrained —quant —sync-bn -b 128 |
*MobileNet V3 Large | 74.042%,21M | 73.924%,5.4M | 跳过第一层,github: pytorch/vision reference/classification,复现命令: torchrun —nproc_per_node=8 train.py —opt rmsprop —auto-augment imagenet —random-erase 0.2 —model mobilenet_v3_large —data-path /mnt/data/ —epochs 100 —batch-size 128 —lr 0.01 —wd 0.00001 —lr-step-size 30 —lr-gamma 0.1 —pretrained —quant —sync-bn |
剪枝
模型 | 原始模型指标 | 压缩模型指标 | ARMV7 (ms) | ARMV8 (ms) |
---|---|---|---|---|
MobileNet V2 | 71.878%,14M | 71.272%,2.8M, 50% SIMDOC稀疏 |
69.1 —> 64.9 | 62.2 —> 58.8 |
*MobileNet V3 Large | 74.042%,21M | 73.568%,4.2M 50% SIMDOC稀疏 |
68.6 —> 66.5 | 64.9 —> 62.3 |
低秩分解
模型 | 原始模型指标 | 压缩模型指标 | ARMV7 (ms) | ARMV8 (ms) |
---|---|---|---|---|
MobileNet V2 | 71.878%,14M | 69.874%,11M | 69.1 —> 60.6 | 62.2 —> 54.6 |
MobileNet V3 Large | 74.042%,21M | 72.748%,18M | 68.6 —> 59.7 | 64.9 —> 55.7 |
用户案例
模型 | 压缩方案 | 原始模型指标 | 压缩模型指标 |
---|---|---|---|
神经渲染relight模型 | 低秩分解 | 30.1,138M,238ms | 29.98,58M,169ms / 29.68,17M,116ms |
语音识别AOA V3 | EMA训练量化 | cer 18.3,50.6M | cer 18.26,18.7M |