(实验性)在 PyTorch 中使用 Eager 模式进行静态量化
原文: https://pytorch.org/tutorials/advanced/static_quantization_tutorial.html
注意
单击此处的下载完整的示例代码
由编辑:赛斯·魏德曼
本教程介绍了如何进行训练后的静态量化,并说明了两种更先进的技术-每通道量化和量化感知训练-可以进一步提高模型的准确性。 请注意,目前仅支持 CPU 量化,因此在本教程中我们将不使用 GPU / CUDA。
在本教程结束时,您将看到 PyTorch 中的量化如何导致模型大小显着减小同时提高速度。 此外,您将在此处看到如何轻松应用中显示的一些高级量化技术,从而使量化后的模型获得的准确性降低得多。
警告:我们使用了许多其他 PyTorch 仓库中的样板代码,例如,定义MobileNetV2模型架构,定义数据加载器等。 我们当然鼓励您阅读它; 但是如果要使用量化功能,请随时跳至“ 4。 训练后静态量化”部分。
我们将从进行必要的导入开始:
import numpy as npimport torchimport torch.nn as nnimport torchvisionfrom torch.utils.data import DataLoaderfrom torchvision import datasetsimport torchvision.transforms as transformsimport osimport timeimport sysimport torch.quantization# # Setup warningsimport warningswarnings.filterwarnings(action='ignore',category=DeprecationWarning,module=r'.*')warnings.filterwarnings(action='default',module=r'torch.quantization')# Specify random seed for repeatable resultstorch.manual_seed(191009)
1.模型架构
我们首先定义 MobileNetV2 模型体系结构,并进行了一些值得注意的修改以实现量化:
- 用
nn.quantized.FloatFunctional代替加法运算 - 在网络的开头和结尾处插入
QuantStub和DeQuantStub。 - 用 ReLU 替换 ReLU6
注意:此代码取自此处。
from torch.quantization import QuantStub, DeQuantStubdef _make_divisible(v, divisor, min_value=None):"""This function is taken from the original tf repo.It ensures that all layers have a channel number that is divisible by 8It can be seen here:https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py:param v::param divisor::param min_value::return:"""if min_value is None:min_value = divisornew_v = max(min_value, int(v + divisor / 2) // divisor * divisor)# Make sure that round down does not go down by more than 10%.if new_v < 0.9 * v:new_v += divisorreturn new_vclass ConvBNReLU(nn.Sequential):def __init__(self, in_planes, out_planes, kernel_size=3, stride=1, groups=1):padding = (kernel_size - 1) // 2super(ConvBNReLU, self).__init__(nn.Conv2d(in_planes, out_planes, kernel_size, stride, padding, groups=groups, bias=False),nn.BatchNorm2d(out_planes, momentum=0.1),# Replace with ReLUnn.ReLU(inplace=False))class InvertedResidual(nn.Module):def __init__(self, inp, oup, stride, expand_ratio):super(InvertedResidual, self).__init__()self.stride = strideassert stride in [1, 2]hidden_dim = int(round(inp * expand_ratio))self.use_res_connect = self.stride == 1 and inp == ouplayers = []if expand_ratio != 1:# pwlayers.append(ConvBNReLU(inp, hidden_dim, kernel_size=1))layers.extend([# dwConvBNReLU(hidden_dim, hidden_dim, stride=stride, groups=hidden_dim),# pw-linearnn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),nn.BatchNorm2d(oup, momentum=0.1),])self.conv = nn.Sequential(*layers)# Replace torch.add with floatfunctionalself.skip_add = nn.quantized.FloatFunctional()def forward(self, x):if self.use_res_connect:return self.skip_add.add(x, self.conv(x))else:return self.conv(x)class MobileNetV2(nn.Module):def __init__(self, num_classes=1000, width_mult=1.0, inverted_residual_setting=None, round_nearest=8):"""MobileNet V2 main classArgs:num_classes (int): Number of classeswidth_mult (float): Width multiplier - adjusts number of channels in each layer by this amountinverted_residual_setting: Network structureround_nearest (int): Round the number of channels in each layer to be a multiple of this numberSet to 1 to turn off rounding"""super(MobileNetV2, self).__init__()block = InvertedResidualinput_channel = 32last_channel = 1280if inverted_residual_setting is None:inverted_residual_setting = [# t, c, n, s[1, 16, 1, 1],[6, 24, 2, 2],[6, 32, 3, 2],[6, 64, 4, 2],[6, 96, 3, 1],[6, 160, 3, 2],[6, 320, 1, 1],]# only check the first element, assuming user knows t,c,n,s are requiredif len(inverted_residual_setting) == 0 or len(inverted_residual_setting[0]) != 4:raise ValueError("inverted_residual_setting should be non-empty ""or a 4-element list, got {}".format(inverted_residual_setting))# building first layerinput_channel = _make_divisible(input_channel * width_mult, round_nearest)self.last_channel = _make_divisible(last_channel * max(1.0, width_mult), round_nearest)features = [ConvBNReLU(3, input_channel, stride=2)]# building inverted residual blocksfor t, c, n, s in inverted_residual_setting:output_channel = _make_divisible(c * width_mult, round_nearest)for i in range(n):stride = s if i == 0 else 1features.append(block(input_channel, output_channel, stride, expand_ratio=t))input_channel = output_channel# building last several layersfeatures.append(ConvBNReLU(input_channel, self.last_channel, kernel_size=1))# make it nn.Sequentialself.features = nn.Sequential(*features)self.quant = QuantStub()self.dequant = DeQuantStub()# building classifierself.classifier = nn.Sequential(nn.Dropout(0.2),nn.Linear(self.last_channel, num_classes),)# weight initializationfor m in self.modules():if isinstance(m, nn.Conv2d):nn.init.kaiming_normal_(m.weight, mode='fan_out')if m.bias is not None:nn.init.zeros_(m.bias)elif isinstance(m, nn.BatchNorm2d):nn.init.ones_(m.weight)nn.init.zeros_(m.bias)elif isinstance(m, nn.Linear):nn.init.normal_(m.weight, 0, 0.01)nn.init.zeros_(m.bias)def forward(self, x):x = self.quant(x)x = self.features(x)x = x.mean([2, 3])x = self.classifier(x)x = self.dequant(x)return x# Fuse Conv+BN and Conv+BN+Relu modules prior to quantization# This operation does not change the numericsdef fuse_model(self):for m in self.modules():if type(m) == ConvBNReLU:torch.quantization.fuse_modules(m, ['0', '1', '2'], inplace=True)if type(m) == InvertedResidual:for idx in range(len(m.conv)):if type(m.conv[idx]) == nn.Conv2d:torch.quantization.fuse_modules(m.conv, [str(idx), str(idx + 1)], inplace=True)
2.助手功能
接下来,我们定义一些帮助程序功能以帮助模型评估。 这些主要来自这里。
class AverageMeter(object):"""Computes and stores the average and current value"""def __init__(self, name, fmt=':f'):self.name = nameself.fmt = fmtself.reset()def reset(self):self.val = 0self.avg = 0self.sum = 0self.count = 0def update(self, val, n=1):self.val = valself.sum += val * nself.count += nself.avg = self.sum / self.countdef __str__(self):fmtstr = '{name} {val' + self.fmt + '} ({avg' + self.fmt + '})'return fmtstr.format(**self.__dict__)def accuracy(output, target, topk=(1,)):"""Computes the accuracy over the k top predictions for the specified values of k"""with torch.no_grad():maxk = max(topk)batch_size = target.size(0)_, pred = output.topk(maxk, 1, True, True)pred = pred.t()correct = pred.eq(target.view(1, -1).expand_as(pred))res = []for k in topk:correct_k = correct[:k].view(-1).float().sum(0, keepdim=True)res.append(correct_k.mul_(100.0 / batch_size))return resdef evaluate(model, criterion, data_loader, neval_batches):model.eval()top1 = AverageMeter('Acc@1', ':6.2f')top5 = AverageMeter('Acc@5', ':6.2f')cnt = 0with torch.no_grad():for image, target in data_loader:output = model(image)loss = criterion(output, target)cnt += 1acc1, acc5 = accuracy(output, target, topk=(1, 5))print('.', end = '')top1.update(acc1[0], image.size(0))top5.update(acc5[0], image.size(0))if cnt >= neval_batches:return top1, top5return top1, top5def load_model(model_file):model = MobileNetV2()state_dict = torch.load(model_file)model.load_state_dict(state_dict)model.to('cpu')return modeldef print_size_of_model(model):torch.save(model.state_dict(), "temp.p")print('Size (MB):', os.path.getsize("temp.p")/1e6)os.remove('temp.p')
3.定义数据集和数据加载器
作为最后的主要设置步骤,我们为训练和测试集定义了数据加载器。
ImageNet 数据
我们为本教程创建的特定数据集仅包含来自 ImageNet 数据的 1000 张图像,每个类别都有一张(该数据集的大小刚好超过 250 MB,可以相对轻松地下载)。 此自定义数据集的 URL 为:
https://s3.amazonaws.com/pytorch-tutorial-assets/imagenet_1k.zip
要使用 Python 在本地下载此数据,可以使用:
import requestsurl = 'https://s3.amazonaws.com/pytorch-tutorial-assets/imagenet_1k.zip`filename = '~/Downloads/imagenet_1k_data.zip'r = requests.get(url)with open(filename, 'wb') as f:f.write(r.content)
为了运行本教程,我们下载了这些数据,并使用 Makefile 中的这些行将其移到正确的位置。
另一方面,要使用整个 ImageNet 数据集运行本教程中的代码,可以在后面的之后使用torchvision下载数据。 例如,要下载训练集并对其进行一些标准转换,可以使用:
import torchvisionimport torchvision.transforms as transformsimagenet_dataset = torchvision.datasets.ImageNet('~/.data/imagenet',split='train',download=True,transforms.Compose([transforms.RandomResizedCrop(224),transforms.RandomHorizontalFlip(),transforms.ToTensor(),transforms.Normalize(mean=[0.485, 0.456, 0.406],std=[0.229, 0.224, 0.225]),])
下载完数据后,我们在下面显示了一些函数,这些函数定义了用于读取该数据的数据加载器。 这些功能主要来自此处。
def prepare_data_loaders(data_path):traindir = os.path.join(data_path, 'train')valdir = os.path.join(data_path, 'val')normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],std=[0.229, 0.224, 0.225])dataset = torchvision.datasets.ImageFolder(traindir,transforms.Compose([transforms.RandomResizedCrop(224),transforms.RandomHorizontalFlip(),transforms.ToTensor(),normalize,]))dataset_test = torchvision.datasets.ImageFolder(valdir,transforms.Compose([transforms.Resize(256),transforms.CenterCrop(224),transforms.ToTensor(),normalize,]))train_sampler = torch.utils.data.RandomSampler(dataset)test_sampler = torch.utils.data.SequentialSampler(dataset_test)data_loader = torch.utils.data.DataLoader(dataset, batch_size=train_batch_size,sampler=train_sampler)data_loader_test = torch.utils.data.DataLoader(dataset_test, batch_size=eval_batch_size,sampler=test_sampler)return data_loader, data_loader_test
接下来,我们将加载经过预先训练的 MobileNetV2 模型。 我们在中提供从torchvision 中下载数据的 URL。
data_path = 'data/imagenet_1k'saved_model_dir = 'data/'float_model_file = 'mobilenet_pretrained_float.pth'scripted_float_model_file = 'mobilenet_quantization_scripted.pth'scripted_quantized_model_file = 'mobilenet_quantization_scripted_quantized.pth'train_batch_size = 30eval_batch_size = 30data_loader, data_loader_test = prepare_data_loaders(data_path)criterion = nn.CrossEntropyLoss()float_model = load_model(saved_model_dir + float_model_file).to('cpu')
接下来,我们将“融合模块”; 通过节省内存访问量,这可以使模型更快,同时还可以提高数值精度。 尽管这可以用于任何模型,但在量化模型中尤为常见。
print('\n Inverted Residual Block: Before fusion \n\n', float_model.features[1].conv)float_model.eval()# Fuses modulesfloat_model.fuse_model()# Note fusion of Conv+BN+Relu and Conv+Reluprint('\n Inverted Residual Block: After fusion\n\n',float_model.features[1].conv)
出:
Inverted Residual Block: Before fusionSequential((0): ConvBNReLU((0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False)(1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(2): ReLU())(1): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)(2): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True))Inverted Residual Block: After fusionSequential((0): ConvBNReLU((0): ConvReLU2d((0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32)(1): ReLU())(1): Identity()(2): Identity())(1): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1))(2): Identity())
最后,要获得“基准”精度,让我们看看带有融合模块的未量化模型的精度
num_eval_batches = 10print("Size of baseline model")print_size_of_model(float_model)top1, top5 = evaluate(float_model, criterion, data_loader_test, neval_batches=num_eval_batches)print('Evaluation accuracy on %d images, %2.2f'%(num_eval_batches * eval_batch_size, top1.avg))torch.jit.save(torch.jit.script(float_model), saved_model_dir + scripted_float_model_file)
Out:
Size of baseline modelSize (MB): 13.981375..........Evaluation accuracy on 300 images, 78.00
我们看到 300 张图像的准确率达到 78%,这是 ImageNet 的坚实基础,尤其是考虑到我们的模型只有 14.0 MB 时。
这将是我们比较的基准。 接下来,让我们尝试不同的量化方法
4.训练后静态量化
训练后的静态量化不仅涉及像动态量化中那样将权重从 float 转换为 int,而且还执行额外的步骤,即首先通过网络馈送一批数据并计算不同激活的结果分布(具体而言,这是 通过在记录此数据的不同点插入<cite>观察者</cite>模块来完成)。 然后使用这些分布来确定在推理时如何具体量化不同的激活(一种简单的技术将简单地将整个激活范围划分为 256 个级别,但我们也支持更复杂的方法)。 重要的是,此附加步骤使我们能够在操作之间传递量化值,而不是在每次操作之间将这些值转换为浮点数,然后再转换为整数,从而显着提高了速度。
num_calibration_batches = 10myModel = load_model(saved_model_dir + float_model_file).to('cpu')myModel.eval()# Fuse Conv, bn and relumyModel.fuse_model()# Specify quantization configuration# Start with simple min/max range estimation and per-tensor quantization of weightsmyModel.qconfig = torch.quantization.default_qconfigprint(myModel.qconfig)torch.quantization.prepare(myModel, inplace=True)# Calibrate firstprint('Post Training Quantization Prepare: Inserting Observers')print('\n Inverted Residual Block:After observer insertion \n\n', myModel.features[1].conv)# Calibrate with the training setevaluate(myModel, criterion, data_loader, neval_batches=num_calibration_batches)print('Post Training Quantization: Calibration done')# Convert to quantized modeltorch.quantization.convert(myModel, inplace=True)print('Post Training Quantization: Convert done')print('\n Inverted Residual Block: After fusion and quantization, note fused modules: \n\n',myModel.features[1].conv)print("Size of model after quantization")print_size_of_model(myModel)top1, top5 = evaluate(myModel, criterion, data_loader_test, neval_batches=num_eval_batches)print('Evaluation accuracy on %d images, %2.2f'%(num_eval_batches * eval_batch_size, top1.avg))
Out:
QConfig(activation=functools.partial(<class 'torch.quantization.observer.MinMaxObserver'>, reduce_range=True), weight=functools.partial(<class 'torch.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric))Post Training Quantization Prepare: Inserting ObserversInverted Residual Block:After observer insertionSequential((0): ConvBNReLU((0): ConvReLU2d((0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32(activation_post_process): MinMaxObserver(min_val=None, max_val=None))(1): ReLU((activation_post_process): MinMaxObserver(min_val=None, max_val=None)))(1): Identity()(2): Identity())(1): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1)(activation_post_process): MinMaxObserver(min_val=None, max_val=None))(2): Identity())..........Post Training Quantization: Calibration donePost Training Quantization: Convert doneInverted Residual Block: After fusion and quantization, note fused modules:Sequential((0): ConvBNReLU((0): QuantizedConvReLU2d(32, 32, kernel_size=(3, 3), stride=(1, 1), scale=0.15092508494853973, zero_point=0, padding=(1, 1), groups=32)(1): Identity()(2): Identity())(1): QuantizedConv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), scale=0.1737997829914093, zero_point=72)(2): Identity())Size of model after quantizationSize (MB): 3.58906..........Evaluation accuracy on 300 images, 63.33
对于这个量化模型,我们发现在这 300 张相同的图像上,准确率仅低至〜62%。 但是,我们确实将模型的大小减小到了 3.6 MB 以下,几乎减少了 4 倍。
此外,我们可以简单地通过使用不同的量化配置来显着提高准确性。 我们使用推荐的配置对 x86 架构进行量化,重复相同的练习。 此配置执行以下操作:
- 量化每个通道的权重
- 使用直方图观察器,该直方图观察器收集激活的直方图,然后以最佳方式选择量化参数。
per_channel_quantized_model = load_model(saved_model_dir + float_model_file)per_channel_quantized_model.eval()per_channel_quantized_model.fuse_model()per_channel_quantized_model.qconfig = torch.quantization.get_default_qconfig('fbgemm')print(per_channel_quantized_model.qconfig)torch.quantization.prepare(per_channel_quantized_model, inplace=True)evaluate(per_channel_quantized_model,criterion, data_loader, num_calibration_batches)torch.quantization.convert(per_channel_quantized_model, inplace=True)top1, top5 = evaluate(per_channel_quantized_model, criterion, data_loader_test, neval_batches=num_eval_batches)print('Evaluation accuracy on %d images, %2.2f'%(num_eval_batches * eval_batch_size, top1.avg))torch.jit.save(torch.jit.script(per_channel_quantized_model), saved_model_dir + scripted_quantized_model_file)
Out:
QConfig(activation=functools.partial(<class 'torch.quantization.observer.HistogramObserver'>, reduce_range=True), weight=functools.partial(<class 'torch.quantization.observer.PerChannelMinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_channel_symmetric))....................Evaluation accuracy on 300 images, 77.33
仅更改这种量化配置方法,就可以将准确性提高到 76%以上! 尽管如此,这仍比上述 78%的基准差 1-2%。 因此,让我们尝试量化意识的训练。
5.量化意识训练
量化意识训练(QAT)是通常导致最高准确性的量化方法。 使用 QAT,在训练的正向和反向过程中,所有权重和激活都被“伪量化”:也就是说,浮点值会四舍五入以模拟 int8 值,但所有计算仍将使用浮点数进行。 因此,在训练过程中进行所有权重调整,同时“意识到”模型将最终被量化的事实。 因此,在量化之后,此方法通常比动态量化或训练后静态量化具有更高的准确性。
实际执行 QAT 的总体工作流程与之前非常相似:
- 我们可以使用与以前相同的模型:量化意识训练不需要额外的准备。
- 我们需要使用
qconfig来指定要在权重和激活之后插入哪种伪量化,而不是指定观察者
我们首先定义一个训练函数:
def train_one_epoch(model, criterion, optimizer, data_loader, device, ntrain_batches):model.train()top1 = AverageMeter('Acc@1', ':6.2f')top5 = AverageMeter('Acc@5', ':6.2f')avgloss = AverageMeter('Loss', '1.5f')cnt = 0for image, target in data_loader:start_time = time.time()print('.', end = '')cnt += 1image, target = image.to(device), target.to(device)output = model(image)loss = criterion(output, target)optimizer.zero_grad()loss.backward()optimizer.step()acc1, acc5 = accuracy(output, target, topk=(1, 5))top1.update(acc1[0], image.size(0))top5.update(acc5[0], image.size(0))avgloss.update(loss, image.size(0))if cnt >= ntrain_batches:print('Loss', avgloss.avg)print('Training: * Acc@1 {top1.avg:.3f} Acc@5 {top5.avg:.3f}'.format(top1=top1, top5=top5))returnprint('Full imagenet train set: * Acc@1 {top1.global_avg:.3f} Acc@5 {top5.global_avg:.3f}'.format(top1=top1, top5=top5))return
我们像以前一样融合模块
qat_model = load_model(saved_model_dir + float_model_file)qat_model.fuse_model()optimizer = torch.optim.SGD(qat_model.parameters(), lr = 0.0001)qat_model.qconfig = torch.quantization.get_default_qat_qconfig('fbgemm')
最后,prepare_qat执行“伪量化”,为量化感知训练准备模型
torch.quantization.prepare_qat(qat_model, inplace=True)print('Inverted Residual Block: After preparation for QAT, note fake-quantization modules \n',qat_model.features[1].conv)
Out:
Inverted Residual Block: After preparation for QAT, note fake-quantization modulesSequential((0): ConvBNReLU((0): ConvBnReLU2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False(activation_post_process): FakeQuantize(fake_quant_enabled=True, observer_enabled=True, scale=None, zero_point=None(activation_post_process): MovingAverageMinMaxObserver(min_val=None, max_val=None))(weight_fake_quant): FakeQuantize(fake_quant_enabled=True, observer_enabled=True, scale=None, zero_point=None(activation_post_process): MovingAveragePerChannelMinMaxObserver(min_val=None, max_val=None)))(1): Identity()(2): Identity())(1): ConvBn2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False(activation_post_process): FakeQuantize(fake_quant_enabled=True, observer_enabled=True, scale=None, zero_point=None(activation_post_process): MovingAverageMinMaxObserver(min_val=None, max_val=None))(weight_fake_quant): FakeQuantize(fake_quant_enabled=True, observer_enabled=True, scale=None, zero_point=None(activation_post_process): MovingAveragePerChannelMinMaxObserver(min_val=None, max_val=None)))(2): Identity())
高精度训练量化模型需要在推断时对数字进行精确建模。 因此,对于量化感知训练,我们通过以下方式修改训练循环:
- 在训练快要结束时切换批处理规范以使用运行均值和方差,以更好地匹配推理数字。
- 我们还冻结了量化器参数(比例和零点),并对权重进行了微调。
num_train_batches = 20# Train and check accuracy after each epochfor nepoch in range(8):train_one_epoch(qat_model, criterion, optimizer, data_loader, torch.device('cpu'), num_train_batches)if nepoch > 3:# Freeze quantizer parametersqat_model.apply(torch.quantization.disable_observer)if nepoch > 2:# Freeze batch norm mean and variance estimatesqat_model.apply(torch.nn.intrinsic.qat.freeze_bn_stats)# Check the accuracy after each epochquantized_model = torch.quantization.convert(qat_model.eval(), inplace=False)quantized_model.eval()top1, top5 = evaluate(quantized_model,criterion, data_loader_test, neval_batches=num_eval_batches)print('Epoch %d :Evaluation accuracy on %d images, %2.2f'%(nepoch, num_eval_batches * eval_batch_size, top1.avg))
Out:
....................Loss tensor(2.0660, grad_fn=<DivBackward0>)Training: * Acc@1 53.000 Acc@5 77.167..........Epoch 0 :Evaluation accuracy on 300 images, 78.67....................Loss tensor(2.0398, grad_fn=<DivBackward0>)Training: * Acc@1 56.000 Acc@5 77.667..........Epoch 1 :Evaluation accuracy on 300 images, 74.67....................Loss tensor(2.0917, grad_fn=<DivBackward0>)Training: * Acc@1 52.833 Acc@5 77.333..........Epoch 2 :Evaluation accuracy on 300 images, 75.33....................Loss tensor(1.9406, grad_fn=<DivBackward0>)Training: * Acc@1 55.000 Acc@5 79.333..........Epoch 3 :Evaluation accuracy on 300 images, 77.67....................Loss tensor(1.8255, grad_fn=<DivBackward0>)Training: * Acc@1 59.833 Acc@5 82.000..........Epoch 4 :Evaluation accuracy on 300 images, 77.00....................Loss tensor(1.8275, grad_fn=<DivBackward0>)Training: * Acc@1 58.167 Acc@5 80.167..........Epoch 5 :Evaluation accuracy on 300 images, 76.67....................Loss tensor(1.9429, grad_fn=<DivBackward0>)Training: * Acc@1 56.333 Acc@5 79.833..........Epoch 6 :Evaluation accuracy on 300 images, 76.33....................Loss tensor(1.8643, grad_fn=<DivBackward0>)Training: * Acc@1 57.333 Acc@5 81.000..........Epoch 7 :Evaluation accuracy on 300 images, 75.67
在这里,我们只对少数几个时期执行量化感知训练。 尽管如此,量化感知训练在整个 imagenet 数据集上的准确性仍超过 71%,接近 71.9%的浮点准确性。
有关量化意识训练的更多信息:
- QAT 是后期训练量化技术的超集,可以进行更多调试。 例如,我们可以分析模型的准确性是否受到权重或激活量化的限制。
- 由于我们使用伪量化来对实际量化算术的数值建模,因此我们还可以在浮点中模拟量化模型的准确性。
- 我们也可以轻松地模拟训练后量化。
量化加速
最后,让我们确认一下我们上面提到的内容:量化模型实际上执行推理的速度更快吗? 让我们测试一下:
def run_benchmark(model_file, img_loader):elapsed = 0model = torch.jit.load(model_file)model.eval()num_batches = 5# Run the scripted model on a few batches of imagesfor i, (images, target) in enumerate(img_loader):if i < num_batches:start = time.time()output = model(images)end = time.time()elapsed = elapsed + (end-start)else:breaknum_images = images.size()[0] * num_batchesprint('Elapsed time: %3.0f ms' % (elapsed/num_images*1000))return elapsedrun_benchmark(saved_model_dir + scripted_float_model_file, data_loader_test)run_benchmark(saved_model_dir + scripted_quantized_model_file, data_loader_test)
Out:
Elapsed time: 16 msElapsed time: 10 ms
在 MacBook Pro 上本地运行此程序,常规模型的运行时间为 61 毫秒,而量化模型的运行时间仅为 20 毫秒,这说明了量化模型与浮点模型相比,典型的 2-4 倍加速。
结论
在本教程中,我们展示了两种量化方法-训练后静态量化和量化感知训练-描述它们在“幕后”进行的操作以及如何在 PyTorch 中使用它们。
谢谢阅读! 与往常一样,我们欢迎您提供任何反馈,因此,如果有任何问题,请在此处创建一个问题。
脚本的总运行时间:(9 分钟 43.065 秒)
Download Python source code: static_quantization_tutorial.py Download Jupyter notebook: static_quantization_tutorial.ipynb
由狮身人面像画廊生成的画廊
