【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation

Interpretable Explanations of Black Boxes by Meaningful Perturbation

Interpretable Explanations of Black Boxes by Meaningful Perturbation

Motivation & Contribution? 研究动机和贡献

Motivation：

目前大多数研究对分类器解释的算法主要是启发式的，含义不够清楚。[19、16、8、7、9]

从正式的角度重新定义了可解释性的含义，可解释性是为了探索出能够解释任意一个黑盒函数f（例如神经网络分类器）的原理或者方法。这种函数通常会从数据中自动学习，我们希望能够理解这种函数是什么，以及它怎样学习的。回答“是什么”的问题意味着决定输入到输出映射之间的属性（定下网络结构？）。回答“怎么做”的问题意味着研究函数映射实现这些属性的内部机制（网络参数的选择等？）本文主要关注的是研究黑盒函数是什么的问题，并且认为这个问题能够由描述“捕捉输入输出关系的函数f”的一种可解释性规则来回答。

本文提出的方法是模型不可知的，基于显示和可解释的图像扰动。

贡献：

提出原预测器解释的通用框架——>[18]工作的扩展。
明确了在设计自动可解释系统时会遇到的一些误区。(神经网络“artifacts”才是可解释性的要点)
在本文提出的框架中重新解释了网络显著性。——>提供了自然并且能被广泛使用的基于梯度的显著技术。

Related Work？相关工作

本文的工作是基于[15]中基于梯度的方法，能够把类标签梯度反向传播到图像层。

另外一种技术把网络激活嵌入可视化(CAM)，以及Grad-CAM。【CAM与Grad-CAM用于解释CNN模型，这两个算法均可得出class activation map(类似热力图)。】

【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图1

需注意，CAM与Grad-CAM的可视化只可以解释为什么CNN如此分类，但是不能解释CNN为什么可以定位到类别相关的区域。

Global Average Pooling的工作机制

【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图2

设类别数为 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图3$ ，最后一层含有 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图4$ 个特征图，求每张特征图所有像素的平均值，后接入一个有 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图5$ 个神经元的全连接层，这里有两个疑问

**为什么要有nn个特征图
论文的解释为“the feature maps can be easily interpreted as categories confidence maps.”。
这么做效果好是前提，对此的解释便是，每个特征图主要提取了某一类别相关的某些特征，例如第 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图6$ 张特征图主要提取图中与飞机相关的部分，第 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图7$ 张特征图主要提取图中与汽车相关的部分。
论文在CIFAR10上训练完模型后，最后一层特征图可视化的结果如下：

CAM

CNN一般有特征提取器与分类器组成，特征提取器负责提取图像特征，分类器依据特征提取器提取的特征进行分类，目前常用的分类器为MLP，目前主流的做法是特征提取器后接一个GAP+类别数目大小的全连阶层。

CNN最后一层特征图富含有最为丰富类别语意信息（可以理解为高度抽象的类别特征），因此，CAM基于最后一层特征图进行可视化。

CAM将CNN的分类器替换为GAP+类别数目大小的全连接层（以下称为分类层）后重新训练模型，设最后一层有 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图9$ 张特征图，记为A^1 , A^2 , . . . A^n ，分类层中一个神经元有 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图10$ 个权重，一个神经元对应一类，设第 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图11$ 个神经元的权重为 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图12$ ，则第 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图13$ 类的的class activation map（CAM）生成方式为：

                                ![](https://img-blog.csdnimg.cn/20201205144755226.png#pic_center#crop=0&crop=0&crop=1&crop=1&id=BCALN&originHeight=55&originWidth=334&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

图示如下：
【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图14

生成的Class Activation Mapping大小与最后一层特征图的大小一致，接着进行上采样即可得到与原图大小一致的Class Activation Mapping。

为什么如此计算可以得出类别相关区域

用GAP表示全局平均池化函数，沿用上述符号，第 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图15$ 类的分类得分为 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图16$ ，GAP的权重为 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图17$ ，特征图大小为 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图18$ ， $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图19$ ，第 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图20$ 个特征图第 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图21$ 行第 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图22$ 列的像素值为 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图23$ ，则有

                                                                       ![](https://g.yuque.com/gr/latex?%5Cbegin%7Baligned%7D%20S_%7Bc%7D%20%26%3D%5Csum_%7Bi%3D1%7D%5E%7Bn%7D%20w_%7Bi%7D%5E%7Bc%7D%20G%20A%20P%5Cleft(A_%7Bi%7D%5Cright)%20%5C%5C%20%26%3D%5Csum_%7Bi%3D1%7D%5E%7Bn%7D%20w_%7Bi%7D%5E%7Bc%7D%20%5Cfrac%7B1%7D%7BZ%7D%20%5Csum_%7Bk%3D1%7D%5E%7Bc_%7B1%7D%7D%20%5Csum_%7Bj%3D1%7D%5E%7Bc_%7B2%7D%7D%20A_%7Bk%20j%7D%5E%7Bi%7D%20%5C%5C%20%26%3D%5Cfrac%7B1%7D%7BZ%7D%20%5Csum_%7Bi%3D1%7D%5E%7Bn%7D%20%5Csum_%7Bk%3D1%7D%5E%7Bc_%7B1%7D%7D%20%5Csum_%7Bj%3D1%7D%5E%7Bc_%7B2%7D%7D%20w_%7Bi%7D%5E%7Bc%7D%20A_%7Bk%20j%7D%5E%7Bi%7D%20%5Cend%7Baligned%7D#card=math&code=%5Cbegin%7Baligned%7D%20S_%7Bc%7D%20%26%3D%5Csum_%7Bi%3D1%7D%5E%7Bn%7D%20w_%7Bi%7D%5E%7Bc%7D%20G%20A%20P%5Cleft%28A_%7Bi%7D%5Cright%29%20%5C%5C%20%26%3D%5Csum_%7Bi%3D1%7D%5E%7Bn%7D%20w_%7Bi%7D%5E%7Bc%7D%20%5Cfrac%7B1%7D%7BZ%7D%20%5Csum_%7Bk%3D1%7D%5E%7Bc_%7B1%7D%7D%20%5Csum_%7Bj%3D1%7D%5E%7Bc_%7B2%7D%7D%20A_%7Bk%20j%7D%5E%7Bi%7D%20%5C%5C%20%26%3D%5Cfrac%7B1%7D%7BZ%7D%20%5Csum_%7Bi%3D1%7D%5E%7Bn%7D%20%5Csum_%7Bk%3D1%7D%5E%7Bc_%7B1%7D%7D%20%5Csum_%7Bj%3D1%7D%5E%7Bc_%7B2%7D%7D%20w_%7Bi%7D%5E%7Bc%7D%20A_%7Bk%20j%7D%5E%7Bi%7D%20%5Cend%7Baligned%7D&id=uuCyE)

特征图中的一个像素对应原图中的一个区域，而像素值表示该区域提取到的特征，由上式可知 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图24$ 的大小由特征图中像素值与权重决定，特征图中像素值与权重的乘积大于0，有利于将样本分到该类，即CNN认为原图中的该区域具有类别相关特征。式1.0就是计算特征图中的每个像素值是否具有类别相关特征，如果有，我们可以通过上采样，这个像素对应的是原图中的哪一部分

GAP的出发点也是如此，即在训练过程中让网络学会判断原图中哪个区域具有类别相关特征，由于GAP去除了多余的全连接层，并且没有引入参数，因此GAP可以降低过拟合的风险

可视化的结果也表明，CNN正确分类的确是因为注意到了原图中正确的类别相关特征。

和相关工作比较：本文提出的方法优点在于？

CAM和gradCAM及其改进方法不是模型不可知的，大多受限于神经网络[除了15,1]，需要修改网络结构[19,16,8,22]或者访问中间层[22,14,1,20]才能得到可视化的结果。
通过掩盖输入部分内容导致分类结果大幅度改变的方法会受到目标区域和其相似区域的限制，而本文提出的方法可微分，考虑不同图像区域的联系。
本文提出的局部解释方法和显著方法与LIME框架相关，因为两种方法都使用了关于来自输入 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图25$ 周围的邻域的输入的函数输出。这个输入 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图26$ 是通过扰动图像来生成的。但是LIME需要迭代5000次才能收敛得到由超像素定义的粗略热图，本文的方法只需要300次迭代。

【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图27

How to do? 怎么实现

基本原理

黑盒是一个映射函数 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图28$ ，映射输入空间 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图29$ 到输出空间 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图30$ ，这种映射通常是在一个不透明的学习过程中来完成。

输入：一张彩色图片 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图31$ 离散域： $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图32$

输出： $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图33$ 是一个布尔值(值为-1，1)，能够判断这张图是否包含某个特定的对象。

以元预测器去解释

参数定义

假设利用以下规则去解释一个知更鸟分类器：

$【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图34$ %3D%7Bx%E2%88%88%5Cmathcal%7BX%7D%20%5CLeftrightarrow%20f(x)%20%3D%20%2B1%7D#card=math&code=Q_%7B1%7D%28x%3Bf%29%3D%7Bx%E2%88%88%5Cmathcal%7BX%7D%20%5CLeftrightarrow%20f%28x%29%20%3D%20%2B1%7D&id=rfqc5) 其中 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图35$ 是所有知更鸟图片的子集

因为 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图36$ 是不完美的，任意这样的规则也只能够近似估计。

因此我们用预期预测误差来衡量解释的真实性:

$【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图37$ %7D%5Cright%5D#card=math&code=%5Cmathcal%7BL%7D%7B1%7D%3D%5Cmathbb%7BE%7D%5Cleft%5B1-%5Cdelta%7BQ_%7B1%7D%28x%20%3B%20f%29%7D%5Cright%5D&id=rLnwC) 其中 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图38$ 表示的是事件 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图39$ 的指示函数。

$【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图40$ 隐式要求一个在可能的图片空间 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图41$ 上的分布 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图42$ #card=math&code=f%28x%29&id=zj3QO)

$【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图43$ 表示分类器的预期预测误差

除非我们不知道 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图44$ 被训练作知更鸟分类器，否则 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图45$ 不会有一定的洞察力，但是其因为 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图46$ 而可解释。

学习解释

把解释表述为元预测变量的一个好处是：能够把真实性作为预测准确性进行衡量。那么机器学习算法就能够从大量的 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图47$ 里面选择一个最合适的 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图48$ 来应用到某个特定的分类器 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图49$ ，从而自动发掘解释性。

寻找最合适的 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图50$ 类似于传统的学习问题，可以通过作为规则化经验风险最小化的计算公式来实现

【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图51

$【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图52$ #card=math&code=%5Cmathcal%7BR%7D%28Q%29&id=gKSn3) 的一共有两个目的：让解释规则 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图53$ 泛化到 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图54$ 个样本里面考虑优化，然后找出一个最简单的 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图55$ ，完成解释。

最大限度的解释

只满足简单和可解释这两个要求，通常不足以找到一个好的解释，因此好的解释必须要和信息结合起来才能使人信服。所以又提出了 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图56$ 规则的变形，如下

$【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图57$ %3D%5Cleft%5C%7Bx%20%5Csim%7B%5Ctheta%7D%5Cright.#card=math&code=Q%7B2%7D%3A%20Q%7B3%7D%5Cleft%28x%2C%20x%5E%7B%5Cprime%7D%20%3B%20f%2C%20%5Ctheta%5Cright%29%3D%5Cleft%5C%7Bx%20%5Csim%7B%5Ctheta%7D%5Cright.&id=HVnwt) $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图58$ %3Df%5Cleft(x%5E%7B%5Cprime%7D%5Cright)%5Cright%5C%7D#card=math&code=%5Cleft.x%5E%7B%5Cprime%7D%20%5CRightarrow%20f%28x%29%3Df%5Cleft%28x%5E%7B%5Cprime%7D%5Cright%29%5Cright%5C%7D&id=tDoD7)

$【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图59$ 表示 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图60$ 和 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图61$ 是旋转角度 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图62$ 的关系。对大角度的解释通常能够应用到对小角度的解释上，这个可以通过 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图63$ 来满足。所以正则化 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图64$ )%3D%20-%5Ctheta#card=math&code=%5Cmathcal%7BR%7D%28Q_3%28%C2%B7%3B%5Ctheta%29%29%3D%20-%5Ctheta&id=Q9KHC) 可以寻找一个最大的角度，使得这个解释规则能够结合尽可能多的信息。

局部解释

局部解释指的是规则 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图65$ #card=math&code=Q%28x%3Bf%2Cx%7B0%7D%29&id=TMa6f) 预测 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图66$ 在 ![](https://g.yuque.com/gr/latex?x%7B0%7D#card=math&code=x_%7B0%7D&id=huXCz) 及其邻域的响应

如果 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图67$ 在 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图68$ 处是平滑的，那么就可以很自然地通过 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图69$ 的一阶泰勒展开式来构建 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图70$ :

可视化 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图72$ %3D%5Cnabla%20f%5Cleft(x%7B0%7D%5Cright)#card=math&code=S%7B1%7D%28x%7B0%7D%29%3D%5Cnabla%20f%5Cleft%28x%7B0%7D%5Cright%29&id=jhOvX) 作为显著图，较大的渐变值可确定对网络输出有重大影响的像素。但是这样做的一个问题就是不适用于线性分类器。

如果
【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图73

这样 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图74$ #card=math&code=f%28x%29&id=KUZ34) 就和 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图75$ 无关了，所以不能解释线性分类器。

——更加直观的理解就是，上面的式子研究的是从 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图76$ 到任意位移 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图77$ 的 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图78$ 的变化，但是对于线性分类器来说，无论起始点 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图79$ 在哪，变化都是一样的。所以对于像神经网络这样的非线性黑匣子 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图80$ ，这个问题能够减少，但是并没有消除，这也就能解释为什么在图像中找不到明显信息的情况，显著性还是那么强了(见下图)。

【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图81
对于显著性，有趣的点在于找到影响 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图82$ 输出的图片区域。因此，考虑删除图片部分( $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图83$ )来对整张图 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图84$ 添加扰动是很好理解的。那么如果我们把 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图85$ 点乘掩码 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图86$ ，这就相当于研究这个函数 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图87$ #card=math&code=f%5Cleft%28x%7B0%7D%20%5Codot%20m%5Cright%29&id=dqKOo) 。 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图88$ 在 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图89$ #card=math&code=m%3D%281%2C1%2C…1%2C%29&id=WCs2i) 处的泰勒展开公式就是![](https://g.yuque.com/gr/latex?S%7B2%7D(x%7B0%7D)%3Ddf%5Cleft(x%7B0%7D%20%5Codot%20m%5Cright)%2Fdm%7C%7Bm%3D(1%2C1%2C…1)%7D%3D%5Cnabla%20f%5Cleft(x%7B0%7D%5Cright)%5Codot%20x%7B0%7D#card=math&code=S%7B2%7D%28x%7B0%7D%29%3Ddf%5Cleft%28x%7B0%7D%20%5Codot%20m%5Cright%29%2Fdm%7C%7Bm%3D%281%2C1%2C…1%29%7D%3D%5Cnabla%20f%5Cleft%28x%7B0%7D%5Cright%29%5Codot%20x_%7B0%7D&id=VFl0b)

对于一个线性分类器 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图90$ ，显著性就是 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图91$ %3Dw%20%5Codot%20x%7B0%7D#card=math&code=S%7B2%7D%28x%7B0%7D%29%3Dw%20%5Codot%20x%7B0%7D&id=JKv2G)，这对于那些 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图92$ 和 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图93$ 同时很大的像素来说，显著性是很大的。因此针对这个问题，在下一节做了改进。

重新回顾Saliency Map

有意义的图像扰动

既然显著性的目的是识别黑盒中图像 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图94$ 中的那些区域来产生输出值 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图95$ #card=math&code=f%28x%7B0%7D%29&id=EDWxC) ，那么我们可以通过观察 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图96$ #card=math&code=f%28x%29&id=notK5) 的值在删除![](https://g.yuque.com/gr/latex?x%7B0%7D#card=math&code=x_%7B0%7D&id=rjaOv)的不同区域 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图97$ 时的变化来做到这一点。

例如 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图98$ %3D%2B1#card=math&code=f%28x_%7B0%7D%29%3D%2B1&id=RwHwV) 表示一张知更鸟图片，那么我们希望 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图99$ %3D%2B1#card=math&code=f%28x%29%3D%2B1&id=GNNlU)(除非选择的R把图中的知更鸟删除了)

如果给定 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图100$ 是 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图101$ 的扰动，我们希望这个解释可以描述 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图102$ 和 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图103$ 之间的关系

实现上面的想法需要考虑几个问题

搞清楚什么是”删除”信息？—->模拟自然图像或者是成像效果，从而导致更加有意义的扰动额解释。
- 一般情况下是无法得到成像过程的结果的，所以我们往往采取用一个常量来替换区域 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图104$ ，加入噪声模糊图像

公式化来说，就是让 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图105$ 作为掩模，给每个像素 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图106$ 都加上一个数值 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图107$ #card=math&code=m%28u%29&id=kxTTq) 。微扰算子定义就可以通过以下形式表达：

$【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图109$ 表示颜色平均值， $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图110$ #card=math&code=%5Ceta%28u%29&id=Yw6op) 表示每个像素的高斯噪声样本， $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图111$ 是高斯模糊核 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图112$ 的最大各向同性标准偏差，本文采用 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图113$

删除和保存

综合上文所提到的，给定一副图像 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图114$ ，我们的目的就是删除图像部分区域观察影响来解释黑盒。而我们要做的就是找到一个具有最大信息量的区域，然后删除掉。

删除方法——理论实现

黑盒 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图115$ %E2%88%88%20%5Cmathbb%7BR%7D%5E%7BC%7D#card=math&code=f%28x%29%E2%88%88%20%5Cmathbb%7BR%7D%5E%7BC%7D&id=ywtVt) 生成一堆关于图像内容的假设分数向量。—>类似神经网络里面的softmax？
找到一个最小的删除掩码 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图116$ 造成分数 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图117$ %5Cright)%20%5Cll%20f%7Bc%7D(x%7B0%7D)#card=math&code=f%7Bc%7D%5Cleft%28%5CPhi%5Cleft%28x%7B0%7D%20%3B%20m%5Cright%29%5Cright%29%20%5Cll%20f%7Bc%7D%28x%7B0%7D%29&id=qsWy2) 变化最大。其中 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图118$ 是目标类别。

找到一个最优 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图119$ 的问题可以表示为如下公式：

【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图120

$【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图121$ 让大部分的掩码都处于关闭状态，因此来删除 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图122$ 中的很小的一个子集。

保存方法——理论实现

找到使得分数大于 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图123$ %5Cright)%20%5Cge%20f%7Bc%7D(x%7B0%7D)%3Am%5E%7B*%7D%20%3D%20argmin%7Bm%7D%5Clambda%5C%7Cm%5C%7C%7B1%7D%20-%20f%7Bc%7D%5Cleft(%5CPhi%5Cleft(x%7B0%7D%20%3B%20m%5Cright)%5Cright)#card=math&code=f%7Bc%7D%5Cleft%28%5CPhi%5Cleft%28x%7B0%7D%20%3B%20m%5Cright%29%5Cright%29%20%5Cge%20f%7Bc%7D%28x%7B0%7D%29%3Am%5E%7B%2A%7D%20%3D%20argmin%7Bm%7D%5Clambda%5C%7Cm%5C%7C%7B1%7D%20-%20f%7Bc%7D%5Cleft%28%5CPhi%5Cleft%28x%7B0%7D%20%3B%20m%5Cright%29%5Cright%29&id=cQu2S) 的最小图像区域然后保存

主要的区别是删除方法删除了足够的信息来阻止网络识别图像中的对象，而保存方法找到能使得分类达到固定分数的最小图像子集。

梯度迭代

略。[和之前的工作相似[15]]

处理伪影

寻找单个的具有代表性的扰动存在产生伪影的风险(导致神经网络生成无意义或者是意外输出的输入)

【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图124

上图表示的就是伪影对删除掩码学习的影响，我们本来希望删除掩码能够删除最大信息量所在的区域，但是因为伪影的存在，删除掩码错误地学习了删除伪影所在区域。

为了解决这种问题，本文提出了两种思路。

对于删除方法而言，不再依赖单个学习掩码 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图125$ 的细节，而是随机学习？—没看懂
通过总变分最小化方法(total-variation)对 $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图126$ 做归一化，然后从低分辨率上采样。

有了这两种思路，上面那条等式就改进为：

【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图127

$【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图128$ %3D%5Csum%7Bu%7D%20g%7B%5Csigma%7Bm%7D%7D(v%20%2F%20s-u)%20m(u)#card=math&code=M%28v%29%3D%5Csum%7Bu%7D%20g%7B%5Csigma%7Bm%7D%7D%28v%20%2F%20s-u%29%20m%28u%29&id=NW4ZR) 表示上采样掩码， $【论文笔记】CVPR2018 Interpretable Explanations of Black Boxes by Meaningful Perturbation - 图129$ 表示二维高斯模糊核。这个式子还能用于优化SGD。