定义
对于概率密度函数 #card=math&code=p%5Ctheta%28x%29) ,其log似然函数为 %3D%5Clog%20p%7B%5Ctheta%7D(x)#card=math&code=%7B%5Cell%7D%7B%5Ctheta%7D%28x%29%3D%5Clog%20p_%7B%5Ctheta%7D%28x%29).
定义 Fisher information:
其中 #card=math&code=%5Cdot%7B%5Cell%7D%7B%5Ctheta%7D%3D%5Cnabla%20%7B%5Ctheta%7D%5Clog%20p_%7B%5Ctheta%7D%28X%29)
:::info
本身就刻画了信息量的大小,Fisher 矩阵刻画了信息量的变化情况。
:::
一些性质
:::info
的期望为0
:::
%20%5Cnabla%7B%5Ctheta%7D%20%5Clog%20p%7B%5Ctheta%7D(x)%20d%20x%3D%5Cint%20%5Cfrac%7B%5Cnabla%20p%7B%5Ctheta%7D(x)%7D%7Bp%7B%5Ctheta%7D(x)%7D%20p%7B%5Ctheta%7D(x)%20d%20x%20%5C%5C%0A%26%3D%5Cint%20%5Cnabla%20p%7B%5Ctheta%7D(x)%20d%20x%20%5Cstackrel%7B(%5Cstar)%7D%7B%3D%7D%20%5Cnabla%20%5Cint%20p%7B%5Ctheta%7D(x)%20d%20x%3D%5Cnabla%201%3D0%0A%5Cend%7Baligned%7D%0A#card=math&code=%5Cbegin%7Baligned%7D%0A%5Cmathbb%7BE%7D%7B%5Ctheta%7D%5Cleft%5B%5Cdot%7B%5Cell%7D%7B%5Ctheta%7D%5Cright%5D%20%26%3D%5Cint%20p%7B%5Ctheta%7D%28x%29%20%5Cnabla%7B%5Ctheta%7D%20%5Clog%20p%7B%5Ctheta%7D%28x%29%20d%20x%3D%5Cint%20%5Cfrac%7B%5Cnabla%20p%7B%5Ctheta%7D%28x%29%7D%7Bp%7B%5Ctheta%7D%28x%29%7D%20p%7B%5Ctheta%7D%28x%29%20d%20x%20%5C%5C%0A%26%3D%5Cint%20%5Cnabla%20p%7B%5Ctheta%7D%28x%29%20d%20x%20%5Cstackrel%7B%28%5Cstar%29%7D%7B%3D%7D%20%5Cnabla%20%5Cint%20p_%7B%5Ctheta%7D%28x%29%20d%20x%3D%5Cnabla%201%3D0%0A%5Cend%7Baligned%7D%0A)
其中 #card=math&code=%28%5Cstar%29) 处假设了积分和求导可以交换顺序。
:::info
可以用
的 Hessian 阵来定义 Fisher 信息矩阵
:::
%3D%5Cfrac%7B%5Cnabla%5E%7B2%7D%20p%7B%5Ctheta%7D(x)%7D%7Bp%7B%5Ctheta%7D(x)%7D-%5Cfrac%7B%5Cnabla%20p%7B%5Ctheta%7D(x)%20%5Cnabla%20p%7B%5Ctheta%7D(x)%5E%7B%5Ctop%7D%7D%7Bp%7B%5Ctheta%7D(x)%5E%7B2%7D%7D%3D%5Cfrac%7B%5Cnabla%5E%7B2%7D%20p%7B%5Ctheta%7D(x)%7D%7Bp%7B%5Ctheta%7D(x)%7D-%5Cdot%7B%5Cell%7D%7B%5Ctheta%7D%20%5Cdot%7B%5Cell%7D%7B%5Ctheta%7D%5E%7B%5Ctop%7D%0A#card=math&code=%5Cnabla%5E%7B2%7D%20%5Clog%20p%7B%5Ctheta%7D%28x%29%3D%5Cfrac%7B%5Cnabla%5E%7B2%7D%20p%7B%5Ctheta%7D%28x%29%7D%7Bp%7B%5Ctheta%7D%28x%29%7D-%5Cfrac%7B%5Cnabla%20p%7B%5Ctheta%7D%28x%29%20%5Cnabla%20p%7B%5Ctheta%7D%28x%29%5E%7B%5Ctop%7D%7D%7Bp%7B%5Ctheta%7D%28x%29%5E%7B2%7D%7D%3D%5Cfrac%7B%5Cnabla%5E%7B2%7D%20p%7B%5Ctheta%7D%28x%29%7D%7Bp%7B%5Ctheta%7D%28x%29%7D-%5Cdot%7B%5Cell%7D%7B%5Ctheta%7D%20%5Cdot%7B%5Cell%7D_%7B%5Ctheta%7D%5E%7B%5Ctop%7D%0A)
然后可以得到 Fisher 信息矩阵的等价表达式:
%20%5Cnabla%5E%7B2%7D%20%5Clog%20p%7B%5Ctheta%7D(x)%20d%20x%2B%5Cint%20%5Cnabla%5E%7B2%7D%20p%7B%5Ctheta%7D(x)%20d%20x%20%5C%5C%0A%26%3D-%5Cmathbb%7BE%7D%5Cleft%5B%5Cnabla%5E%7B2%7D%20%5Clog%20p%7B%5Ctheta%7D(x)%5Cright%5D%2B%5Cnabla%5E%7B2%7D%20%5Cunderbrace%7B%5Cint%20p%7B%5Ctheta%7D(x)%20d%20x%7D%7B%3D1%7D%3D-%5Cmathbb%7BE%7D%5Cleft%5B%5Cnabla%5E%7B2%7D%20%5Clog%20p%7B%5Ctheta%7D(x)%5Cright%5D%0A%5Cend%7Baligned%7D%0A#card=math&code=%5Cbegin%7Baligned%7D%0AI%7B%5Ctheta%7D%20%26%3D%5Cmathbb%7BE%7D%7B%5Ctheta%7D%5Cleft%5B%5Cdot%7B%5Cell%7D%7B%5Ctheta%7D%20%5Cdot%7B%5Cell%7D%7B%5Ctheta%7D%5E%7B%5Ctop%7D%5Cright%5D%3D-%5Cint%20p%7B%5Ctheta%7D%28x%29%20%5Cnabla%5E%7B2%7D%20%5Clog%20p%7B%5Ctheta%7D%28x%29%20d%20x%2B%5Cint%20%5Cnabla%5E%7B2%7D%20p%7B%5Ctheta%7D%28x%29%20d%20x%20%5C%5C%0A%26%3D-%5Cmathbb%7BE%7D%5Cleft%5B%5Cnabla%5E%7B2%7D%20%5Clog%20p%7B%5Ctheta%7D%28x%29%5Cright%5D%2B%5Cnabla%5E%7B2%7D%20%5Cunderbrace%7B%5Cint%20p%7B%5Ctheta%7D%28x%29%20d%20x%7D%7B%3D1%7D%3D-%5Cmathbb%7BE%7D%5Cleft%5B%5Cnabla%5E%7B2%7D%20%5Clog%20p_%7B%5Ctheta%7D%28x%29%5Cright%5D%0A%5Cend%7Baligned%7D%0A)
:::info
也刻画了
的方差(方差越大,说明收集到的信息越多)
:::
小结
%5Cright%5D%0A#card=math&code=I%7B%5Ctheta%7D%3D%5Cmathbb%7BE%7D%7B%5Ctheta%7D%5Cleft%5B%5Cdot%7B%5Cell%7D%7B%5Ctheta%7D%20%5Cdot%7B%5Cell%7D%7B%5Ctheta%7D%5Cright%5D%3D-%5Cmathbb%7BE%7D%7B%5Ctheta%7D%5Cleft%5B%5Cnabla%5E%7B2%7D%20%5Clog%20p%7B%5Ctheta%7D%28X%29%5Cright%5D%0A)
:::info
当样本 的数目增加时,
是线性增大的
:::
例子
Example 8.1 (Canonical exponential family): In a canonical exponential family model, we have %3D%5Clangle%5Ctheta%2C%20%5Cphi(x)%5Crangle-A(%5Ctheta)%2C#card=math&code=%5Clog%20p%7B%5Ctheta%7D%28x%29%3D%5Clangle%5Ctheta%2C%20%5Cphi%28x%29%5Crangle-A%28%5Ctheta%29%2C) where
is the sufficient statistic and
is the log-partition function. Because -%5Cnabla%20A(%5Ctheta)#card=math&code=%5Cdot%7B%5Cell%7D%7B%5Ctheta%7D%3D%5Cphi%28x%29-%5Cnabla%20A%28%5Ctheta%29) and %3D-%5Cnabla%5E%7B2%7D%20A(%5Ctheta)#card=math&code=%5Cnabla%5E%7B2%7D%20%5Clog%20p_%7B%5Ctheta%7D%28x%29%3D-%5Cnabla%5E%7B2%7D%20A%28%5Ctheta%29) is a constant, we obtain
%0A#card=math&code=I_%7B%5Ctheta%7D%3D%5Cnabla%5E%7B2%7D%20A%28%5Ctheta%29%0A)
拓展阅读
- Fisher Information - Stanford University https://web.stanford.edu › stats311 › Lectures › lec-09
- 费雪信息 (Fisher information) 的直观意义是什么? - 知乎 https://www.zhihu.com/question/26561604
- 深度模型从研者 眼里的 似然估计 & Hessain 海森矩阵 & Fisher Information (费雪信息)
最后引用一个知乎大佬的回答:
:::info 提供一个思路,在信息几何(Information Geometry)这一学科中,概率密度函数族可以看做与参数空间同胚的黎曼流形,Fisher信息矩阵可以看做是统计流形上的黎曼度量,可以证明这一度量是外围欧式空间在流形上的诱导度量。进一步计算可以得到,一维正态分布函数族对应的流形具有-1/2的常曲率,为一双曲流形。这一思路似乎是Rao先提出的,就是Cramer-Rao里的那个Rao。 :::
作者:刘大 链接:https://www.zhihu.com/question/26561604/answer/93809082
