原文链接:https://blog.csdn.net/zengxiantao1994/java/article/details/72787849
以前多次接触过极大似然估计,但一直都不太明白到底什么原理,最近在看贝叶斯分类,对极大似然估计有了新的认识,总结如下:
贝叶斯决策
首先来看贝叶斯分类,我们都知道经典的贝叶斯公式:<br /> ![](https://cdn.nlark.com/yuque/__latex/0f3b61cc8c5bfdc4e940aca90d2beacf.svg#card=math&code=P%28y%7Cx%29%3D%5Cfrac%7BP%28x%7Cy%29P%28y%29%7D%7BP%28x%29%7D&height=41&width=141)<br /> 其中:<br />![](https://cdn.nlark.com/yuque/__latex/78dda268c762099f96748fcdc563f797.svg#card=math&code=P%28y%29&height=18&width=30):为先验概率,表示每种类别分布的概率;<br />![](https://cdn.nlark.com/yuque/__latex/62fcdeeb463ca8efecd7b011ad85f662.svg#card=math&code=P%28x%7Cy%29&height=18&width=42):类条件概率,表示在某种类别前提下,某事发生的概率;<br /> ![](https://cdn.nlark.com/yuque/__latex/fe9c25dd98e0e2b44985de7c0f39f53c.svg#card=math&code=P%28y%7Cx%29&height=18&width=42):为后验概率,表示某事发生了,并且它属于某一类别的概率,有了这个后验概率,我们就可以对样本进行分类。后验概率越大,说明某事物属于这个类别的可能性越大,我们越有理由把它归到这个类别下。
例子
我们来看一个直观的例子:**已知**:在夏季,某公园男性穿凉鞋的概率为1/2,女性穿凉鞋的概率为2/3,并且该公园中男女比例通常为2:1。**问题**:若你在公园中随机遇到一个穿凉鞋的人,请问他的性别为男性或女性的概率分别为多少?<br /> 从问题看,就是上面讲的,某事发生了,它属于某一类别的概率是多少?即后验概率。
设: ![](https://cdn.nlark.com/yuque/__latex/ad578bf6e8481eba8044c2b2db657292.svg#card=math&code=y_1%3D%E7%94%B7%E6%80%A7%EF%BC%8C%20y_2%3D%E5%A5%B3%E6%80%A7%EF%BC%8Cx%3D%E7%A9%BF%E5%87%89%E9%9E%8B&height=21&width=222)<br /> 由已知可得:<br />![](https://cdn.nlark.com/yuque/__latex/c2b0e6dbc8b4f36a956a8b35a97647c9.svg#card=math&code=%E5%85%88%E9%AA%8C%E6%A6%82%E7%8E%87%EF%BC%9A%20p%28y_1%29%3D2%2F3%EF%BC%8Cp%28y_2%291%2F3%0A%5C%5C%0A%E7%B1%BB%E6%9D%A1%E4%BB%B6%E6%A6%82%E7%8E%87%20%EF%BC%9A%20p%28x%7Cy_1%29%3D1%2F2%EF%BC%8Cp%28x%7Cy_2%29%3D2%2F3&height=47&width=643)<br /> 男性和女性穿凉鞋相互独立,所以<br /> ![](https://cdn.nlark.com/yuque/__latex/8c1ec0bf9528b56b9c3caa5fdddf5374.svg#card=math&code=p%28x%29%3Dp%28x%7Cy_1%29p%28y_1%29%2Bp%28x%7Cy_2%29p%28y_2%29%3D5%2F9&height=18&width=267)<br />(若只考虑分类问题,只需要比较后验概率的大小,的取值并不重要)。<br /> 由贝叶斯公式算出:<br />![](https://cdn.nlark.com/yuque/__latex/67136a6b7ff2d40d5f6f4ca7c8cb296d.svg#card=math&code=p%28y_1%7Cx%29%3D%5Cfrac%7Bp%28x%7Cy_1%29p%28y_1%29%7D%7Bp%28x%29%7D%3D%5Cfrac%7B1%2F2%20%2A%202%2F3%7D%7B5%2F9%7D%3D3%2F5%0A%5C%5C%0Ap%28y_2%7Cx%29%3D%5Cfrac%7Bp%28x%7Cy_2%29p%28y_2%29%7D%7Bp%28x%29%7D%3D%5Cfrac%7B2%2F3%20%2A%201%2F3%7D%7B5%2F9%7D%3D2%2F5%0A&height=86&width=643)
问题引出
但是在实际问题中并不都是这样幸运的,我们能获得的数据可能只有有限数目的样本数据,而先验概率![](https://cdn.nlark.com/yuque/__latex/86f5051624ca823c6b97a5528a0d24aa.svg#card=math&code=p%28y_i%29&height=18&width=32)和类条件概率(各类的总体分布)![](https://cdn.nlark.com/yuque/__latex/ae4ca59bae0a97f39ed311bfb3ac8e09.svg#card=math&code=p%28x%7Cy_i%29&height=18&width=45)都是未知的。根据仅有的样本数据进行分类时,一种可行的办法是我们需要先对先验概率和类条件概率进行估计,然后再套用贝叶斯分类器。
先验概率的估计较简单,1、每个样本所属的自然状态都是已知的(有监督学习);2、依靠经验;3、用训练样本中各类出现的频率估计。
类条件概率的估计(非常难),原因包括:概率密度函数包含了一个随机变量的全部信息;样本数据可能不多;特征向量x的维度可能很大等等。总之要直接估计类条件概率的密度函数很难。解决的办法就是,把估计完全未知的概率密度![](https://cdn.nlark.com/yuque/__latex/ae4ca59bae0a97f39ed311bfb3ac8e09.svg#card=math&code=p%28x%7Cy_i%29&height=18&width=45)转化为估计参数。这里就将概率密度估计问题转化为参数估计问题,极大似然估计就是一种参数估计方法。当然了,概率密度函数的选取很重要,模型正确,在样本区域无穷时,我们会得到较准确的估计值,如果模型都错了,那估计半天的参数,肯定也没啥意义了。
重要前提
上面说到,参数估计问题只是实际问题求解过程中的一种简化方法(由于直接估计类条件概率密度函数很困难)。所以能够使用极大似然估计方法的样本必须需要满足一些前提假设。
重要前提:训练样本的分布能代表样本的真实分布。每个样本集中的样本都是所谓独立同分布的随机变量 (iid条件),且有充分的训练样本。
极大似然估计
极大似然估计的原理,用一张图片来说明,如下图所示:
总结起来,最大似然估计的目的就是:利用已知的样本结果,反推最有可能(最大概率)导致这样结果的参数值。
原理:极大似然估计是建立在极大似然原理的基础上的一个统计方法,是概率论在统计学中的应用。极大似然估计提供了一种给定观察数据来评估模型参数的方法,即:“模型已定,参数未知”。通过若干次试验,观察其结果,利用试验结果得到某个参数值能够使样本出现的概率为最大,则称为极大似然估计。
由于样本集中的样本都是独立同分布,可以只考虑一类样本集D,来估计参数向量θ。记已知的样本集为:<br />![image.png](https://cdn.nlark.com/yuque/0/2020/png/456650/1593393136903-3cf1020d-0a21-4ca4-91e1-7773aa005425.png#align=left&display=inline&height=57&margin=%5Bobject%20Object%5D&name=image.png&originHeight=57&originWidth=185&size=3166&status=done&style=none&width=185)<br /> 似然函数(linkehood function):联合概率密度函数![image.png](https://cdn.nlark.com/yuque/0/2020/png/456650/1593393149462-e17f60d3-9597-4d9b-83b3-141005f4be95.png#align=left&display=inline&height=34&margin=%5Bobject%20Object%5D&name=image.png&originHeight=34&originWidth=86&size=2111&status=done&style=none&width=86)称为相对于![image.png](https://cdn.nlark.com/yuque/0/2020/png/456650/1593393156352-8069f7fa-4709-4bad-8bb6-0d3fea9e514c.png#align=left&display=inline&height=36&margin=%5Bobject%20Object%5D&name=image.png&originHeight=36&originWidth=138&size=2422&status=done&style=none&width=138)的θ的似然函数。<br />![image.png](https://cdn.nlark.com/yuque/0/2020/png/456650/1593393162607-e08f15ad-ad3d-4471-8ca8-8f5793856de5.png#align=left&display=inline&height=67&margin=%5Bobject%20Object%5D&name=image.png&originHeight=67&originWidth=510&size=7541&status=done&style=none&width=510)<br /> 如果![image.png](https://cdn.nlark.com/yuque/0/2020/png/456650/1593393182450-c76d3ad1-d69e-4155-a0b3-24bc9d18f031.png#align=left&display=inline&height=29&margin=%5Bobject%20Object%5D&name=image.png&originHeight=29&originWidth=24&size=709&status=done&style=none&width=24)是参数空间中能使似然函数![image.png](https://cdn.nlark.com/yuque/0/2020/png/456650/1593393189513-f5aa6c52-fe3b-42dc-a13e-50f7ed7dff9c.png#align=left&display=inline&height=38&margin=%5Bobject%20Object%5D&name=image.png&originHeight=38&originWidth=52&size=1460&status=done&style=none&width=52)最大的θ值,则![image.png](https://cdn.nlark.com/yuque/0/2020/png/456650/1593393193492-9425db47-2578-4a47-954a-a1fc00857a46.png#align=left&display=inline&height=29&margin=%5Bobject%20Object%5D&name=image.png&originHeight=29&originWidth=24&size=709&status=done&style=none&width=24)应该是“最可能”的参数值,那么![image.png](https://cdn.nlark.com/yuque/0/2020/png/456650/1593393200184-fb60f66b-273d-49d9-99ce-ce395083cc5b.png#align=left&display=inline&height=29&margin=%5Bobject%20Object%5D&name=image.png&originHeight=29&originWidth=24&size=709&status=done&style=none&width=24)就是θ的极大似然估计量。它是样本集的函数,记作:<br />![image.png](https://cdn.nlark.com/yuque/0/2020/png/456650/1593393205802-f253534d-6360-47d2-b4f1-548ed3d7f5b6.png#align=left&display=inline&height=107&margin=%5Bobject%20Object%5D&name=image.png&originHeight=107&originWidth=389&size=14397&status=done&style=none&width=389)
求解极大似然函数
ML估计:求使得出现该组样本的概率最大的θ值。<br />![image.png](https://cdn.nlark.com/yuque/0/2020/png/456650/1593393220486-137f5274-c4ae-4368-83d7-2c17b72ade26.png#align=left&display=inline&height=76&margin=%5Bobject%20Object%5D&name=image.png&originHeight=76&originWidth=396&size=6733&status=done&style=none&width=396)<br /> 实际中为了便于分析,定义了对数似然函数:<br />![image.png](https://cdn.nlark.com/yuque/0/2020/png/456650/1593393236359-82032703-38f6-4354-ab3f-28fadfdf4b63.png#align=left&display=inline&height=43&margin=%5Bobject%20Object%5D&name=image.png&originHeight=43&originWidth=157&size=3261&status=done&style=none&width=157)<br />![image.png](https://cdn.nlark.com/yuque/0/2020/png/456650/1593393251679-60841336-69d4-4ca2-9507-9ae68973b282.png#align=left&display=inline&height=99&margin=%5Bobject%20Object%5D&name=image.png&originHeight=99&originWidth=758&size=14483&status=done&style=none&width=758)<br /> 1. 未知参数只有一个(θ为标量)
在似然函数满足连续、可微的正则条件下,极大似然估计量是下面微分方程的解:
2.未知参数有多个(θ为向量)
则θ可表示为具有S个分量的未知向量:<br />![image.png](https://cdn.nlark.com/yuque/0/2020/png/456650/1593393269971-2fdaeef6-876c-4b7a-9ac3-27f910df1036.png#align=left&display=inline&height=40&margin=%5Bobject%20Object%5D&name=image.png&originHeight=40&originWidth=179&size=2668&status=done&style=none&width=179)<br /> 记梯度算子:<br />![image.png](https://cdn.nlark.com/yuque/0/2020/png/456650/1593393279162-5535f39e-fc5d-4a77-aea9-d8e8ee920632.png#align=left&display=inline&height=85&margin=%5Bobject%20Object%5D&name=image.png&originHeight=85&originWidth=270&size=5158&status=done&style=none&width=270)
若似然函数满足连续可导的条件,则最大似然估计量就是如下方程的解。<br />![image.png](https://cdn.nlark.com/yuque/0/2020/png/456650/1593393290058-2bd41725-2ca1-443b-8e6e-1d09b408fc3c.png#align=left&display=inline&height=70&margin=%5Bobject%20Object%5D&name=image.png&originHeight=70&originWidth=446&size=7742&status=done&style=none&width=446)
方程的解只是一个估计值,只有在样本数趋于无限多的时候,它才会接近于真实值。
极大似然估计的例子
例1:设样本服从正态分布![image.png](https://cdn.nlark.com/yuque/0/2020/png/456650/1593393308759-dd2d6572-344e-4e2c-bab3-1017e03868bc.png#align=left&display=inline&height=47&margin=%5Bobject%20Object%5D&name=image.png&originHeight=47&originWidth=100&size=2387&status=done&style=none&width=100),则似然函数为:<br />![image.png](https://cdn.nlark.com/yuque/0/2020/png/456650/1593393312738-b572c106-fd6c-4c62-9ebe-db6f3731dbcf.png#align=left&display=inline&height=75&margin=%5Bobject%20Object%5D&name=image.png&originHeight=75&originWidth=521&size=11089&status=done&style=none&width=521)
它的对数:<br />![image.png](https://cdn.nlark.com/yuque/0/2020/png/456650/1593393320448-d5b8da9e-4f2d-4034-8c48-5776ac30f76a.png#align=left&display=inline&height=72&margin=%5Bobject%20Object%5D&name=image.png&originHeight=72&originWidth=537&size=9801&status=done&style=none&width=537)<br /> 求导,得方程组:<br />![image.png](https://cdn.nlark.com/yuque/0/2020/png/456650/1593393334030-14242933-5ac8-456a-ba27-609129a25a1e.png#align=left&display=inline&height=163&margin=%5Bobject%20Object%5D&name=image.png&originHeight=163&originWidth=451&size=17503&status=done&style=none&width=451)<br /> 联合解得:<br />![image.png](https://cdn.nlark.com/yuque/0/2020/png/456650/1593393342589-15b76912-4b11-42eb-8fd5-9006c321dc93.png#align=left&display=inline&height=152&margin=%5Bobject%20Object%5D&name=image.png&originHeight=152&originWidth=213&size=7957&status=done&style=none&width=213)
似然方程有唯一解:![image.png](https://cdn.nlark.com/yuque/0/2020/png/456650/1593393355944-57e0a3fc-89a5-453e-b0c5-e9596ac705d1.png#align=left&display=inline&height=50&margin=%5Bobject%20Object%5D&name=image.png&originHeight=50&originWidth=101&size=2215&status=done&style=none&width=101),而且它一定是最大值点,这是因为当![image.png](https://cdn.nlark.com/yuque/0/2020/png/456650/1593393363967-cfd5608c-5cdd-48a4-aafa-8cfa2fc0291d.png#align=left&display=inline&height=40&margin=%5Bobject%20Object%5D&name=image.png&originHeight=40&originWidth=86&size=1281&status=done&style=none&width=86)或![image.png](https://cdn.nlark.com/yuque/0/2020/png/456650/1593393368235-3c65e22e-b845-479a-a4a5-6d28beb7bd49.png#align=left&display=inline&height=32&margin=%5Bobject%20Object%5D&name=image.png&originHeight=32&originWidth=119&size=2371&status=done&style=none&width=119)时,非负函数![image.png](https://cdn.nlark.com/yuque/0/2020/png/456650/1593393374027-d20f2634-e147-4638-a4ea-9dfd544dbab1.png#align=left&display=inline&height=41&margin=%5Bobject%20Object%5D&name=image.png&originHeight=41&originWidth=139&size=2737&status=done&style=none&width=139)。于是U和![image.png](https://cdn.nlark.com/yuque/0/2020/png/456650/1593393379556-f725ba46-751f-4c86-ac5d-cb1b130a2860.png#align=left&display=inline&height=28&margin=%5Bobject%20Object%5D&name=image.png&originHeight=28&originWidth=35&size=651&status=done&style=none&width=35)的极大似然估计为![image.png](https://cdn.nlark.com/yuque/0/2020/png/456650/1593393383657-d20d614b-1f7b-454a-92fc-2365c6648b93.png#align=left&display=inline&height=50&margin=%5Bobject%20Object%5D&name=image.png&originHeight=50&originWidth=101&size=2215&status=done&style=none&width=101)。
例2:设样本服从均匀分布[a, b]。则X的概率密度函数:<br />![image.png](https://cdn.nlark.com/yuque/0/2020/png/456650/1593393397346-4da48e3e-329f-4dd0-a1a3-690b81547f1d.png#align=left&display=inline&height=115&margin=%5Bobject%20Object%5D&name=image.png&originHeight=115&originWidth=255&size=6778&status=done&style=none&width=255)<br /> 对样本![image.png](https://cdn.nlark.com/yuque/0/2020/png/456650/1593393404486-070056fb-7a5f-4fac-88b8-70b785c1fe43.png#align=left&display=inline&height=38&margin=%5Bobject%20Object%5D&name=image.png&originHeight=38&originWidth=173&size=2873&status=done&style=none&width=173):<br />![image.png](https://cdn.nlark.com/yuque/0/2020/png/456650/1593393407271-f2a33f49-60b1-483c-aecb-1a993d96f931.png#align=left&display=inline&height=114&margin=%5Bobject%20Object%5D&name=image.png&originHeight=114&originWidth=444&size=9760&status=done&style=none&width=444)
很显然,L(a,b)作为a和b的二元函数是不连续的,这时不能用导数来求解。而必须从极大似然估计的定义出发,求L(a,b)的最大值,为使L(a,b)达到最大,b-a应该尽可能地小,但b又不能小于![image.png](https://cdn.nlark.com/yuque/0/2020/png/456650/1593393425921-f2a0e68a-a068-4422-8132-b13060b084ac.png#align=left&display=inline&height=34&margin=%5Bobject%20Object%5D&name=image.png&originHeight=34&originWidth=182&size=3198&status=done&style=none&width=182),否则,L(a,b)=0。类似地a不能大过![image.png](https://cdn.nlark.com/yuque/0/2020/png/456650/1593393431805-fa267ea7-9b05-4a0e-bb22-e2bbb0ca1503.png#align=left&display=inline&height=43&margin=%5Bobject%20Object%5D&name=image.png&originHeight=43&originWidth=180&size=3158&status=done&style=none&width=180),因此,a和b的极大似然估计:<br />![image.png](https://cdn.nlark.com/yuque/0/2020/png/456650/1593393435349-70cde3cc-28b8-4804-875a-d3369eafa2ff.png#align=left&display=inline&height=100&margin=%5Bobject%20Object%5D&name=image.png&originHeight=100&originWidth=242&size=7835&status=done&style=none&width=242)
总结
求最大似然估计量的一般步骤:
(1)写出似然函数;
(2)对似然函数取对数,并整理;
(3)求导数;
(4)解似然方程。
最大似然估计的特点:
1.比其他估计方法更加简单;
2.收敛性:无偏或者渐近无偏,当样本数目增加时,收敛性质会更好;
3.如果假设的类条件概率模型正确,则通常能获得较好的结果。但如果假设模型出现偏差,将导致非常差的估计结果。