朴素贝叶斯法
设有K类的分类样本%5C%7D%7Bi%3D1%7D%5EN%2CX_i%5Cin%5Cmathbb%7BR%7D%5Ep%2Cy_i%5Cin%5C%7B1%2C2%2C%5Ccdots%2CK%5C%7D#card=math&code=D%3D%5C%7B%28X_i%2Cy_i%29%5C%7D%7Bi%3D1%7D%5EN%2CX_i%5Cin%5Cmathbb%7BR%7D%5Ep%2Cy_i%5Cin%5C%7B1%2C2%2C%5Ccdots%2CK%5C%7D)
设第k类的样本集合为且
,
朴素贝叶斯假设
根据贝叶斯定理,对于一个新数据%7D%2CX%5E%7B(2)%7D%2C%5Ccdots%2CX%5E%7B(p)%7D)%5ET#card=math&code=X%3D%28X%5E%7B%281%29%7D%2CX%5E%7B%282%29%7D%2C%5Ccdots%2CX%5E%7B%28p%29%7D%29%5ET)
%3D%5Cfrac%7BP(X%2Cy)%7D%7BP(X)%7D%3D%5Cfrac%7BP(X%7Cy)P(y)%7D%7BP(X)%7D%0A#card=math&code=P%28y%7CX%29%3D%5Cfrac%7BP%28X%2Cy%29%7D%7BP%28X%29%7D%3D%5Cfrac%7BP%28X%7Cy%29P%28y%29%7D%7BP%28X%29%7D%0A)
我们的目的是从样本学习到#card=math&code=P%28y%3Dk%7CX%29),属于哪一类的概率最大,就将样本归于哪一类。根据上式,判断
#card=math&code=P%28y%3Dk%7CX%29)大小实际只跟分母
P(y%3Dk)#card=math&code=P%28X%7Cy%3Dk%29P%28y%3Dk%29)有关,因此只需要学习
#card=math&code=P%28X%7Cy%3Dk%29)和
#card=math&code=P%28y%3Dk%29)即可
一般假设服从分类分布(Categorical Distribution)即
%3D%5Clambdak%5Cquad%5Csum%7Bk%3D1%7D%5EK%5Clambdak%3D1%0A#card=math&code=P%28y%3Dk%29%3D%5Clambda_k%5Cquad%5Csum%7Bk%3D1%7D%5EK%5Clambda_k%3D1%0A)
对于#card=math&code=P%28X%7Cy%3Dk%29)
%26%3DP(x1%2Cx_2%2C%5Ccdots%2Cx_p%7Cy%3Dk)%5C%5C%0A%26%3DP(x_1%7Cy%3Dk)P(x_2%2C%5Ccdots%2Cx_p%7Cx_1%2Cy%3Dk)%5C%5C%0A%26%3DP(x_1%7Cy%3Dk)P(x_2%7Cx_p%2Cy%3Dk)%5Ccdots%20P(x_p%7Cx_1%2Cx_2%2C%5Ccdots%2Cx%7Bp-1%7D%2Cy%3Dk)%0A%5Cend%7Baligned%7D%0A#card=math&code=%5Cbegin%7Baligned%7D%0AP%28X%7Cy%3Dk%29%26%3DP%28x1%2Cx_2%2C%5Ccdots%2Cx_p%7Cy%3Dk%29%5C%5C%0A%26%3DP%28x_1%7Cy%3Dk%29P%28x_2%2C%5Ccdots%2Cx_p%7Cx_1%2Cy%3Dk%29%5C%5C%0A%26%3DP%28x_1%7Cy%3Dk%29P%28x_2%7Cx_p%2Cy%3Dk%29%5Ccdots%20P%28x_p%7Cx_1%2Cx_2%2C%5Ccdots%2Cx%7Bp-1%7D%2Cy%3Dk%29%0A%5Cend%7Baligned%7D%0A)
如果即第j维可能取值有
个,则理论上
#card=math&code=P%28X%7Cy%3Dk%29)需要的参数有
个,这显然是无法计算的,所以需要进行简化。于是我们假设随机变量
的每一维都是条件独立的,称为朴素贝叶斯假设,又叫条件独立性假设
%3D%5Cprod%7Bj%3D1%7D%5EpP(x_j%7Cy%3Dk)%0A#card=math&code=P%28X%7Cy%3Dk%29%3D%5Cprod%7Bj%3D1%7D%5EpP%28x_j%7Cy%3Dk%29%0A)
最终分类器可表示为
%5Cprod%7Bj%3D1%7D%5EpP(x_j%7Cy%3Dk)%0A#card=math&code=%5Chat%7By%7D%3D%5Carg%5Cmax_y%20P%28X%7Cy%3Dk%29%5Cprod%7Bj%3D1%7D%5EpP%28x_j%7Cy%3Dk%29%0A)
极大似然估计
- 求
#card=math&code=P%28y%29),即
令对数似然%3D%5Clog%20P(Y)#card=math&code=L%28%5Clambda%29%3D%5Clog%20P%28Y%29)
%26%3D%5Clog%5Cprod%7Bk%3D1%7D%5EK%5BP(y%3Dk)%5D%5E%7BN_k%7D%5C%5C%0A%26%3D%5Csum%7Bk%3D1%7D%5EKNk%5Clog%5Clambda_k%0A%5Cend%7Baligned%7D%0A#card=math&code=%5Cbegin%7Baligned%7D%0AL%28%5Clambda%29%26%3D%5Clog%5Cprod%7Bk%3D1%7D%5EK%5BP%28y%3Dk%29%5D%5E%7BNk%7D%5C%5C%0A%26%3D%5Csum%7Bk%3D1%7D%5EKNk%5Clog%5Clambda_k%0A%5Cend%7Baligned%7D%0A)
%3D%5Cmax%5Clambda%5Csum%7Bk%3D1%7D%5EKNk%5Clog%5Clambda_k%5C%5C%0As.t.%5Cquad%5Csum%7Bk%3D1%7D%5EK%5Clambdak%3D1%20%20%20%20%20%20%20%0A%5Cend%7Bgathered%7D%0A#card=math&code=%5Cbegin%7Bgathered%7D%0A%5Cmax%5Clambda%20L%28%5Clambda%29%3D%5Cmax%5Clambda%5Csum%7Bk%3D1%7D%5EKNk%5Clog%5Clambda_k%5C%5C%0As.t.%5Cquad%5Csum%7Bk%3D1%7D%5EK%5Clambda_k%3D1%20%20%20%20%20%20%20%0A%5Cend%7Bgathered%7D%0A)
这个式子表明,实际就是用样本中每一类出现的频率来估计类的分布
求
#card=math&code=P%28X%7Cy%3Dk%29)
此时有两种情况,即是离散随机变量还是连续随机变量
是离散随机变量
假设的每一维都是Categorical Distribution,且第j维的取值集合为
。令第
类样本第
维取值为
的样本数为
,则
,对数似然函数为
%3D%5Clog%20P(X%7Cy%3Dk)#card=math&code=L%28%5Clambda%29%3D%5Clog%20P%28X%7Cy%3Dk%29)
%3D%5Clambda%7Bjl%7Ck%7D%5Cquad%5Csum%7Bl%3D1%7D%5E%7BSj%7D%5Clambda%7Bjl%7Ck%7D%3D1%5C%5C%0A%5Cbegin%7Baligned%7D%0AL(%5Clambda)%26%3D%5Clog%5Cprod%7Bj%3D1%7D%5EpP(x_j%7Cy%3Dk)%5C%5C%0A%26%3D%5Csum%7Bj%3D1%7D%5Ep%5Clog%20P(xj%7Cy%3Dk)%5C%5C%0A%26%3D%5Csum%7Bj%3D1%7D%5Ep%5Clog%5Cleft(%5Cprod%7Bl%3D1%7D%5E%7BS_j%7D%5BP(x_j%3Dm%7Bjl%7D%7Cy%3Dk)%5D%5E%7BN%7Bjl%7Ck%7D%7D%5Cright)%5C%5C%0A%26%3D%5Csum%7Bj%3D1%7D%5Ep%5Csum%7Bl%3D1%7D%5E%7BS_j%7DN%7Bjl%7Ck%7D%5Clog%5Clambda%7Bjl%7Ck%7D%0A%5Cend%7Baligned%7D%0A%5Cend%7Bgathered%7D%0A#card=math&code=%5Cbegin%7Bgathered%7D%0AP%28x_j%3Dm%7Bjl%7D%7Cy%3Dk%29%3D%5Clambda%7Bjl%7Ck%7D%5Cquad%5Csum%7Bl%3D1%7D%5E%7BSj%7D%5Clambda%7Bjl%7Ck%7D%3D1%5C%5C%0A%5Cbegin%7Baligned%7D%0AL%28%5Clambda%29%26%3D%5Clog%5Cprod%7Bj%3D1%7D%5EpP%28x_j%7Cy%3Dk%29%5C%5C%0A%26%3D%5Csum%7Bj%3D1%7D%5Ep%5Clog%20P%28xj%7Cy%3Dk%29%5C%5C%0A%26%3D%5Csum%7Bj%3D1%7D%5Ep%5Clog%5Cleft%28%5Cprod%7Bl%3D1%7D%5E%7BS_j%7D%5BP%28x_j%3Dm%7Bjl%7D%7Cy%3Dk%29%5D%5E%7BN%7Bjl%7Ck%7D%7D%5Cright%29%5C%5C%0A%26%3D%5Csum%7Bj%3D1%7D%5Ep%5Csum%7Bl%3D1%7D%5E%7BS_j%7DN%7Bjl%7Ck%7D%5Clog%5Clambda%7Bjl%7Ck%7D%0A%5Cend%7Baligned%7D%0A%5Cend%7Bgathered%7D%0A)
%3D%5Cmax%5Clambda%5Csum%7Bj%3D1%7D%5Ep%5Csum%7Bl%3D1%7D%5E%7BS_j%7DN%7Bjl%7Ck%7D%5Clog%5Clambda%7Bjl%7Ck%7D%5C%5C%0As.t.%5Cquad%5Csum%7Bl%3D1%7D%5E%7BSj%7D%5Clambda%7Bjl%7Ck%7D%3D1%2Cj%3D1%2C2%2C%5Ccdots%2Cp%0A%5Cend%7Bgathered%7D%0A#card=math&code=%5Cbegin%7Bgathered%7D%0A%5Cmax%5Clambda%20L%28%5Clambda%29%3D%5Cmax%5Clambda%5Csum%7Bj%3D1%7D%5Ep%5Csum%7Bl%3D1%7D%5E%7BSj%7DN%7Bjl%7Ck%7D%5Clog%5Clambda%7Bjl%7Ck%7D%5C%5C%0As.t.%5Cquad%5Csum%7Bl%3D1%7D%5E%7BSj%7D%5Clambda%7Bjl%7Ck%7D%3D1%2Cj%3D1%2C2%2C%5Ccdots%2Cp%0A%5Cend%7Bgathered%7D%0A)
这个式子表明,实际就是用指定类的样本中该维取值出现的频率来估计该维的分布
是连续随机变量
假设的每一维都是Gaussian Distribution,再按与离散相同的方法求解即可
贝叶斯估计
在使用极大似然估计法的时候,可能会出现样本中某一类的数量为零的情况,于是可以用贝叶斯估计的方法,具体结果如下
%3D%5Cfrac%7BNk%2B%5Clambda%7D%7BN%2BK%5Clambda%7D%5C%5C%0A%5C%5C%0AP(x_j%3Dm%7Bjl%7D%7Cy%3Dk)%3D%5Cfrac%7BN%7Bjl%7Ck%7D%2B%5Clambda%7D%7BN_k%2BS_j%5Clambda%7D%0A%5Cend%7Bgathered%7D%0A#card=math&code=%5Cbegin%7Bgathered%7D%0AP%28y%3Dk%29%3D%5Cfrac%7BN_k%2B%5Clambda%7D%7BN%2BK%5Clambda%7D%5C%5C%0A%5C%5C%0AP%28x_j%3Dm%7Bjl%7D%7Cy%3Dk%29%3D%5Cfrac%7BN_%7Bjl%7Ck%7D%2B%5Clambda%7D%7BN_k%2BS_j%5Clambda%7D%0A%5Cend%7Bgathered%7D%0A)
其中,这个结果等价于在随机变量各个取值的频数上加上一个
时即极大似然估计,
时称为拉普拉斯平滑