统计学习 - Fisher 信息矩阵 - 《机器学习基础》

定义
一些性质
小结
例子
拓展阅读

定义

对于概率密度函数 $Fisher 信息矩阵 - 图1$ #card=math&code=p%5Ctheta%28x%29) ，其log似然函数为 ![](https://g.yuque.com/gr/latex?%7B%5Cell%7D%7B%5Ctheta%7D(x)%3D%5Clog%20p%7B%5Ctheta%7D(x)#card=math&code=%7B%5Cell%7D%7B%5Ctheta%7D%28x%29%3D%5Clog%20p_%7B%5Ctheta%7D%28x%29).

定义 Fisher information：

Fisher 信息矩阵 - 图2

其中 $Fisher 信息矩阵 - 图3$ #card=math&code=%5Cdot%7B%5Cell%7D%7B%5Ctheta%7D%3D%5Cnabla%20%7B%5Ctheta%7D%5Clog%20p_%7B%5Ctheta%7D%28X%29)

:::info Fisher 信息矩阵 - 图4 本身就刻画了信息量的大小，Fisher 矩阵刻画了信息量的变化情况。 :::

一些性质

:::info $Fisher 信息矩阵 - 图5$ 的期望为0 :::

$Fisher 信息矩阵 - 图6$ %20%5Cnabla%7B%5Ctheta%7D%20%5Clog%20p%7B%5Ctheta%7D(x)%20d%20x%3D%5Cint%20%5Cfrac%7B%5Cnabla%20p%7B%5Ctheta%7D(x)%7D%7Bp%7B%5Ctheta%7D(x)%7D%20p%7B%5Ctheta%7D(x)%20d%20x%20%5C%5C%0A%26%3D%5Cint%20%5Cnabla%20p%7B%5Ctheta%7D(x)%20d%20x%20%5Cstackrel%7B(%5Cstar)%7D%7B%3D%7D%20%5Cnabla%20%5Cint%20p%7B%5Ctheta%7D(x)%20d%20x%3D%5Cnabla%201%3D0%0A%5Cend%7Baligned%7D%0A#card=math&code=%5Cbegin%7Baligned%7D%0A%5Cmathbb%7BE%7D%7B%5Ctheta%7D%5Cleft%5B%5Cdot%7B%5Cell%7D%7B%5Ctheta%7D%5Cright%5D%20%26%3D%5Cint%20p%7B%5Ctheta%7D%28x%29%20%5Cnabla%7B%5Ctheta%7D%20%5Clog%20p%7B%5Ctheta%7D%28x%29%20d%20x%3D%5Cint%20%5Cfrac%7B%5Cnabla%20p%7B%5Ctheta%7D%28x%29%7D%7Bp%7B%5Ctheta%7D%28x%29%7D%20p%7B%5Ctheta%7D%28x%29%20d%20x%20%5C%5C%0A%26%3D%5Cint%20%5Cnabla%20p%7B%5Ctheta%7D%28x%29%20d%20x%20%5Cstackrel%7B%28%5Cstar%29%7D%7B%3D%7D%20%5Cnabla%20%5Cint%20p_%7B%5Ctheta%7D%28x%29%20d%20x%3D%5Cnabla%201%3D0%0A%5Cend%7Baligned%7D%0A)

其中 $Fisher 信息矩阵 - 图7$ #card=math&code=%28%5Cstar%29) 处假设了积分和求导可以交换顺序。 :::info 可以用 $Fisher 信息矩阵 - 图8$ 的 Hessian 阵来定义 Fisher 信息矩阵 :::

$Fisher 信息矩阵 - 图9$ %3D%5Cfrac%7B%5Cnabla%5E%7B2%7D%20p%7B%5Ctheta%7D(x)%7D%7Bp%7B%5Ctheta%7D(x)%7D-%5Cfrac%7B%5Cnabla%20p%7B%5Ctheta%7D(x)%20%5Cnabla%20p%7B%5Ctheta%7D(x)%5E%7B%5Ctop%7D%7D%7Bp%7B%5Ctheta%7D(x)%5E%7B2%7D%7D%3D%5Cfrac%7B%5Cnabla%5E%7B2%7D%20p%7B%5Ctheta%7D(x)%7D%7Bp%7B%5Ctheta%7D(x)%7D-%5Cdot%7B%5Cell%7D%7B%5Ctheta%7D%20%5Cdot%7B%5Cell%7D%7B%5Ctheta%7D%5E%7B%5Ctop%7D%0A#card=math&code=%5Cnabla%5E%7B2%7D%20%5Clog%20p%7B%5Ctheta%7D%28x%29%3D%5Cfrac%7B%5Cnabla%5E%7B2%7D%20p%7B%5Ctheta%7D%28x%29%7D%7Bp%7B%5Ctheta%7D%28x%29%7D-%5Cfrac%7B%5Cnabla%20p%7B%5Ctheta%7D%28x%29%20%5Cnabla%20p%7B%5Ctheta%7D%28x%29%5E%7B%5Ctop%7D%7D%7Bp%7B%5Ctheta%7D%28x%29%5E%7B2%7D%7D%3D%5Cfrac%7B%5Cnabla%5E%7B2%7D%20p%7B%5Ctheta%7D%28x%29%7D%7Bp%7B%5Ctheta%7D%28x%29%7D-%5Cdot%7B%5Cell%7D%7B%5Ctheta%7D%20%5Cdot%7B%5Cell%7D_%7B%5Ctheta%7D%5E%7B%5Ctop%7D%0A)
然后可以得到 Fisher 信息矩阵的等价表达式：

$Fisher 信息矩阵 - 图10$ %20%5Cnabla%5E%7B2%7D%20%5Clog%20p%7B%5Ctheta%7D(x)%20d%20x%2B%5Cint%20%5Cnabla%5E%7B2%7D%20p%7B%5Ctheta%7D(x)%20d%20x%20%5C%5C%0A%26%3D-%5Cmathbb%7BE%7D%5Cleft%5B%5Cnabla%5E%7B2%7D%20%5Clog%20p%7B%5Ctheta%7D(x)%5Cright%5D%2B%5Cnabla%5E%7B2%7D%20%5Cunderbrace%7B%5Cint%20p%7B%5Ctheta%7D(x)%20d%20x%7D%7B%3D1%7D%3D-%5Cmathbb%7BE%7D%5Cleft%5B%5Cnabla%5E%7B2%7D%20%5Clog%20p%7B%5Ctheta%7D(x)%5Cright%5D%0A%5Cend%7Baligned%7D%0A#card=math&code=%5Cbegin%7Baligned%7D%0AI%7B%5Ctheta%7D%20%26%3D%5Cmathbb%7BE%7D%7B%5Ctheta%7D%5Cleft%5B%5Cdot%7B%5Cell%7D%7B%5Ctheta%7D%20%5Cdot%7B%5Cell%7D%7B%5Ctheta%7D%5E%7B%5Ctop%7D%5Cright%5D%3D-%5Cint%20p%7B%5Ctheta%7D%28x%29%20%5Cnabla%5E%7B2%7D%20%5Clog%20p%7B%5Ctheta%7D%28x%29%20d%20x%2B%5Cint%20%5Cnabla%5E%7B2%7D%20p%7B%5Ctheta%7D%28x%29%20d%20x%20%5C%5C%0A%26%3D-%5Cmathbb%7BE%7D%5Cleft%5B%5Cnabla%5E%7B2%7D%20%5Clog%20p%7B%5Ctheta%7D%28x%29%5Cright%5D%2B%5Cnabla%5E%7B2%7D%20%5Cunderbrace%7B%5Cint%20p%7B%5Ctheta%7D%28x%29%20d%20x%7D%7B%3D1%7D%3D-%5Cmathbb%7BE%7D%5Cleft%5B%5Cnabla%5E%7B2%7D%20%5Clog%20p_%7B%5Ctheta%7D%28x%29%5Cright%5D%0A%5Cend%7Baligned%7D%0A)

:::info Fisher 信息矩阵 - 图11 也刻画了 Fisher 信息矩阵 - 图12 的方差（方差越大，说明收集到的信息越多） ::: Fisher 信息矩阵 - 图13

小结

$Fisher 信息矩阵 - 图14$ %5Cright%5D%0A#card=math&code=I%7B%5Ctheta%7D%3D%5Cmathbb%7BE%7D%7B%5Ctheta%7D%5Cleft%5B%5Cdot%7B%5Cell%7D%7B%5Ctheta%7D%20%5Cdot%7B%5Cell%7D%7B%5Ctheta%7D%5Cright%5D%3D-%5Cmathbb%7BE%7D%7B%5Ctheta%7D%5Cleft%5B%5Cnabla%5E%7B2%7D%20%5Clog%20p%7B%5Ctheta%7D%28X%29%5Cright%5D%0A)

:::info 当样本 $Fisher 信息矩阵 - 图15$ 的数目增加时， $Fisher 信息矩阵 - 图16$ 是线性增大的 :::

例子

Example 8.1 (Canonical exponential family): In a canonical exponential family model, we have $Fisher 信息矩阵 - 图17$ %3D%5Clangle%5Ctheta%2C%20%5Cphi(x)%5Crangle-A(%5Ctheta)%2C#card=math&code=%5Clog%20p%7B%5Ctheta%7D%28x%29%3D%5Clangle%5Ctheta%2C%20%5Cphi%28x%29%5Crangle-A%28%5Ctheta%29%2C) where $Fisher 信息矩阵 - 图18$ is the sufficient statistic and $Fisher 信息矩阵 - 图19$ is the log-partition function. Because ![](https://g.yuque.com/gr/latex?%5Cdot%7B%5Cell%7D%7B%5Ctheta%7D%3D%5Cphi(x)-%5Cnabla%20A(%5Ctheta)#card=math&code=%5Cdot%7B%5Cell%7D%7B%5Ctheta%7D%3D%5Cphi%28x%29-%5Cnabla%20A%28%5Ctheta%29) and ![](https://g.yuque.com/gr/latex?%5Cnabla%5E%7B2%7D%20%5Clog%20p%7B%5Ctheta%7D(x)%3D-%5Cnabla%5E%7B2%7D%20A(%5Ctheta)#card=math&code=%5Cnabla%5E%7B2%7D%20%5Clog%20p_%7B%5Ctheta%7D%28x%29%3D-%5Cnabla%5E%7B2%7D%20A%28%5Ctheta%29) is a constant, we obtain

$Fisher 信息矩阵 - 图20$ %0A#card=math&code=I_%7B%5Ctheta%7D%3D%5Cnabla%5E%7B2%7D%20A%28%5Ctheta%29%0A)

拓展阅读

Fisher Information - Stanford University https://web.stanford.edu › stats311 › Lectures › lec-09
费雪信息 (Fisher information) 的直观意义是什么？ - 知乎 https://www.zhihu.com/question/26561604
深度模型从研者眼里的似然估计 & Hessain 海森矩阵 & Fisher Information （费雪信息）

最后引用一个知乎大佬的回答：

:::info 提供一个思路，在信息几何(Information Geometry)这一学科中，概率密度函数族可以看做与参数空间同胚的黎曼流形，Fisher信息矩阵可以看做是统计流形上的黎曼度量，可以证明这一度量是外围欧式空间在流形上的诱导度量。进一步计算可以得到，一维正态分布函数族对应的流形具有-1/2的常曲率，为一双曲流形。这一思路似乎是Rao先提出的，就是Cramer-Rao里的那个Rao。 :::

作者：刘大链接：https://www.zhihu.com/question/26561604/answer/93809082