原理
我们需要的是后验概率分布%E2%80%8B#card=math&code=P%28Y%7CX%29%E2%80%8B),有了这个之后,我们根据预测样本的X,就可以预测出Y。 但是这没办法直接学习,所以我们转而通过训练数据集学习联合概率分布
%E2%80%8B#card=math&code=P%28X%2CY%29%E2%80%8B),如果我们知道联合分布
%E2%80%8B#card=math&code=P%28X%2CY%29%E2%80%8B),只需要计算
%E2%80%8B#card=math&code=P%28X%29%E2%80%8B),就可以通过条件概率公式,得到
%E2%80%8B#card=math&code=P%28Y%7CX%29%E2%80%8B)。 但是直接学习联合概率分布也是没办法的,所以,我们进一步转化为学习
%E2%80%8B#card=math&code=P%28Y%29%E2%80%8B)和
%E2%80%8B#card=math&code=P%28X%7CY%29%E2%80%8B),然后通过贝叶斯定理,求出
%E2%80%8B#card=math&code=P%28X%2CY%29%E2%80%8B)。
所以整体过程表述如下:
#card=math&code=%C2%A0P%28Y%7CX%29)<——
%EF%BC%8CP(X)#card=math&code=P%28X%2CY%29%EF%BC%8CP%28X%29)<——
%EF%BC%8CP(Y)%EF%BC%8CP(X%7CY)#card=math&code=P%28X%29%EF%BC%8CP%28Y%29%EF%BC%8CP%28X%7CY%29)
下面我们来看一下学习过程,已知:
- 输入空间:
为n维向量集合,也就是每个样本有n个特征
- 输出空间:
,表示有k个分类
- X是定义在输入空间上的随机向量,Y是定义在输出空间上的随机变量,
%E2%80%8B#card=math&code=P%28X%2CY%29%E2%80%8B)是X,Y的联合概率分布
- 训练数据集
%2C(x_2%2Cy_2)%2C%E2%80%A6(x_N%2Cy_N)%5C%7D%E2%80%8B#card=math&code=T%3D%5C%7B%28x_1%2Cy_1%29%2C%28x_2%2Cy_2%29%2C%E2%80%A6%28x_N%2Cy_N%29%5C%7D%E2%80%8B)由
%E2%80%8B#card=math&code=P%28X%2CY%29%E2%80%8B)独立同分布产生
%26%3D%5Cfrac%7BP(X%3Dx%2CY%3Dck)%7D%7BP(X%3Dx)%7D%20%26(1)%20%5C%5C%20%0A%26%3D%20%5Cfrac%7BP(X%3Dx%7CY%3Dc_k)P(Y%3Dc_k)%7D%7B%5Csum%7Bi%7D%20P(X%3Dx%7CY%3Dci)P(Y%3Dc_i)%7D%20%26(2)%5C%5C%20%0A%26%3D%20%5Cfrac%7BP(X%5E%7B(1)%7D%3Dx%5E%7B(1)%7D%2C…%2CX%5E%7B(n)%7D%3Dx%5E%7B(n)%7D%7CY%3Dc_k)P(Y%3Dc_k)%7D%7B%5Csum%7Bi%7D%20P(X%5E%7B(1)%7D%3Dx%5E%7B(1)%7D%2C…%2CX%5E%7B(n)%7D%3Dx%5E%7B(n)%7D%7CY%3Dci)P(Y%3Dc_i)%7D%20%26(3)%20%5C%5C%0A%26%3D%20%5Cfrac%7BP(Y%3Dc_k)%5Cprod_j%20P(X%3Dx%5E%7B(j)%7D%7CY%3Dc_k)%7D%7B%5Csum%7Bi%7D%20P(Y%3Dci)%5Cprod_j%20P(X%3Dx%5E%7B(j)%7D%7CY%3Dc_i)%7D%20%26(4)%20%5C%5C%0A%5Cend%7Baligned%7D%0A#card=math&code=%5Cbegin%7Baligned%7D%0AP%28Y%3Dc_k%7CX%3Dx%29%26%3D%5Cfrac%7BP%28X%3Dx%2CY%3Dc_k%29%7D%7BP%28X%3Dx%29%7D%20%26%281%29%20%5C%5C%20%0A%26%3D%20%5Cfrac%7BP%28X%3Dx%7CY%3Dc_k%29P%28Y%3Dc_k%29%7D%7B%5Csum%7Bi%7D%20P%28X%3Dx%7CY%3Dci%29P%28Y%3Dc_i%29%7D%20%26%282%29%5C%5C%20%0A%26%3D%20%5Cfrac%7BP%28X%5E%7B%281%29%7D%3Dx%5E%7B%281%29%7D%2C…%2CX%5E%7B%28n%29%7D%3Dx%5E%7B%28n%29%7D%7CY%3Dc_k%29P%28Y%3Dc_k%29%7D%7B%5Csum%7Bi%7D%20P%28X%5E%7B%281%29%7D%3Dx%5E%7B%281%29%7D%2C…%2CX%5E%7B%28n%29%7D%3Dx%5E%7B%28n%29%7D%7CY%3Dci%29P%28Y%3Dc_i%29%7D%20%26%283%29%20%5C%5C%0A%26%3D%20%5Cfrac%7BP%28Y%3Dc_k%29%5Cprod_j%20P%28X%3Dx%5E%7B%28j%29%7D%7CY%3Dc_k%29%7D%7B%5Csum%7Bi%7D%20P%28Y%3Dc_i%29%5Cprod_j%20P%28X%3Dx%5E%7B%28j%29%7D%7CY%3Dc_i%29%7D%20%26%284%29%20%5C%5C%0A%5Cend%7Baligned%7D%0A)
上式(3)->(4)步,是假设%7D%2C%E2%80%A6%2CX%5E%7B(n)%7D#card=math&code=X%5E%7B%281%29%7D%2C%E2%80%A6%2CX%5E%7B%28n%29%7D)彼此独立,即
%7D%2C%E2%80%A6%2CX%5E%7B(n)%7D)%3D%5Cprod%7Bi%3D1%7D%5En%20P(X_i)#card=math&code=%5Cdisplaystyle%20P%28X%5E%7B%281%29%7D%2C%E2%80%A6%2CX%5E%7B%28n%29%7D%29%3D%5Cprod%7Bi%3D1%7D%5En%20P%28Xi%29),这是个较强的假设,朴素贝叶斯也因此得名。 如果不假设独立的话,那么每个
%7D#card=math&code=x%5E%7B%28i%29%7D)表示样本的第i个特征,假设可能取值有
个,而Y有k个,所以参数个数为,这是非常大的,是不可直接估计的。
所以,问题转化为,求最大的%E2%80%8B#card=math&code=P%28Y%3Dc_k%7CX%3Dx%29%E2%80%8B),此时的类别
,就是我们要预测的类别:
%3D%5Carg%5Cmax%7Bc_k%7D%20%5Cfrac%7BP(Y%3Dc_k)%5Cprod_j%20P(X%3Dx%5E%7B(j)%7D%7CY%3Dc_k)%7D%7B%5Csum%7Bi%7D%20P(Y%3Dci)%5Cprod_j%20P(X%3Dx%5E%7B(j)%7D%7CY%3Dc_i)%7D%0A#card=math&code=y%3Df%28x%29%3D%5Carg%5Cmax%7Bck%7D%20%5Cfrac%7BP%28Y%3Dc_k%29%5Cprod_j%20P%28X%3Dx%5E%7B%28j%29%7D%7CY%3Dc_k%29%7D%7B%5Csum%7Bi%7D%20P%28Y%3Dc_i%29%5Cprod_j%20P%28X%3Dx%5E%7B%28j%29%7D%7CY%3Dc_i%29%7D%0A)
对于这个样本来说,分母
#card=math&code=P%28X%3Dx%29)都是一样的,所以我们只需要比较分子
#card=math&code=P%28Y%3Dc_k%7CX%3Dx%29),当
时的大小,并不需要具体的值,所以我们可以忽略分母,只求:
%3D%5Carg%5Cmax%7Bc_k%7D%20P(Y%3Dc_k)%5Cprod_j%20P(X%3Dx%5E%7B(j)%7D%7CY%3Dc_k)%0A#card=math&code=y%3Df%28x%29%3D%5Carg%5Cmax%7Bc_k%7D%20P%28Y%3Dc_k%29%5Cprod_j%20P%28X%3Dx%5E%7B%28j%29%7D%7CY%3Dc_k%29%0A)
得到的最大的,就是该样本的预测类别。
后验概率最大化的含义
为什么书中说”朴素贝叶斯法将实例分到后验概率最大的类中,这等价于期望风险最小化?”
极大似然法求
自己理解
根据上面的分析,我们需要首先计算:
%E2%80%8B#card=math&code=P%28Y%3Dc_k%29%E2%80%8B)
就是在训练集中计算每个分类的频率%7D%7CY%3Dc_k)#card=math&code=P%28X%3Dx_j%5E%7B%28i%29%7D%7CY%3Dc_k%29)
联合分布,即算每个特征的每个取值,在每个分类下出现的频率- 利用1,2,就可以进行预测了。
正规的写法
输入
- 训练数据
%2C%20(x2%2C%20y_2)%2C%E2%80%A6%2C(x_N%2C%20y_N)%20%5C%7D#card=math&code=T%3D%5C%7B%28x_1%2C%20y_1%29%2C%20%28x_2%2C%20y_2%29%2C%E2%80%A6%2C%28x_N%2C%20y_N%29%20%5C%7D),其中
%7D%2C%20x_i%5E%7B(2)%7D%2C%E2%80%A6%2Cx_i%5E%7B(n)%7D)%5ET#card=math&code=x_i%3D%28x_i%5E%7B%281%29%7D%2C%20x_i%5E%7B%282%29%7D%2C%E2%80%A6%2Cx_i%5E%7B%28n%29%7D%29%5ET),
%7D#card=math&code=x_i%5E%7B%28j%29%7D)是第
个样本的第
个特征
%7D%20%5Cin%20%5C%7B%20a%7Bj1%7D%2C%20a%7Bj2%7D%2C%20%E2%80%A6%2Ca%7BjSj%7D%20%5C%7D#card=math&code=x_i%5E%7B%28j%29%7D%20%5Cin%20%5C%7B%20a%7Bj1%7D%2C%20a%7Bj2%7D%2C%20%E2%80%A6%2Ca%7BjSj%7D%20%5C%7D) ,表示第
个特征可能取的第m个值,
;
,表示总共
个分类。
- 需要预测的实例
- 训练数据
输出
- 实例
的分类
- 实例
学习步骤
- 计算先验概率和条件概率
%20%3D%20%5Cfrac%7B%5Cdisplaystyle%5Csum%7Bi%3D1%7D%5E%7BN%7DI(y_i%3Dc_k)%7D%7BN%7D%20%5C%5C%0A%26P(X%5E%7B(j)%7D%7CY%3Dc_k)%3D%5Cfrac%7B%5Cdisplaystyle%5Csum%7Bi%3D1%7D%5EN%20I(xi%5E%7B(j)%7D%3Da%7Bjm%7D%2Cyi%3Dc_k)%7D%7B%5Cdisplaystyle%5Csum%7Bi%3D1%7D%5EN%20I(yi%3Dc_k)%7D%20%5C%5C%0A%26(k%3D1%2C2%2C…%2CK%EF%BC%9Bm%3D1%2C2%2C…%2CS_i%EF%BC%9Bj%3D1%2C2%2C…%2Cn)%0A%0A%5Cend%7Baligned%7D%0A#card=math&code=%5Cbegin%7Baligned%7D%0A%26P%28Y%3Dc_k%29%20%3D%20%5Cfrac%7B%5Cdisplaystyle%5Csum%7Bi%3D1%7D%5E%7BN%7DI%28yi%3Dc_k%29%7D%7BN%7D%20%5C%5C%0A%26P%28X%5E%7B%28j%29%7D%7CY%3Dc_k%29%3D%5Cfrac%7B%5Cdisplaystyle%5Csum%7Bi%3D1%7D%5EN%20I%28xi%5E%7B%28j%29%7D%3Da%7Bjm%7D%2Cyi%3Dc_k%29%7D%7B%5Cdisplaystyle%5Csum%7Bi%3D1%7D%5EN%20I%28y_i%3Dc_k%29%7D%20%5C%5C%0A%26%28k%3D1%2C2%2C…%2CK%EF%BC%9Bm%3D1%2C2%2C…%2CS_i%EF%BC%9Bj%3D1%2C2%2C…%2Cn%29%0A%0A%5Cend%7Baligned%7D%0A)
- 计算先验概率和条件概率
#card=math&code=I%28x%29)叫做指示函数,实际上就是用来计数的,当满足x时,
,例如
#card=math&code=I%28y_i%3Dc_k%29)的意思就是当
取值等于
时,记为1。再结合求和使用,就是计算
的样本数。
- 根据给定的实例预测
%20%5Cdisplaystyle%5Cprod%7Bj%3D1%7D%5En%20P(X%5E%7B(j)%7D%3Dx%5E%7B(j)%7D%7CY%3Dc_k)%2C%20%5Cqquad%20k%3D1%2C2%2C…%2CK%0A#card=math&code=P%28Y%3Dc_k%29%20%5Cdisplaystyle%5Cprod%7Bj%3D1%7D%5En%20P%28X%5E%7B%28j%29%7D%3Dx%5E%7B%28j%29%7D%7CY%3Dc_k%29%2C%20%5Cqquad%20k%3D1%2C2%2C…%2CK%0A)
- 选择最大的概率的类
%20%5Cdisplaystyle%5Cprod%7Bj%3D1%7D%5En%20P(X%5E%7B(j)%7D%3Dx%5E%7B(j)%7D%7CY%3Dc_k)%2C%20%5Cqquad%20k%3D1%2C2%2C…%2CK%0A#card=math&code=y%3D%5Carg%5Cmax%7Bck%7DP%28Y%3Dc_k%29%20%5Cdisplaystyle%5Cprod%7Bj%3D1%7D%5En%20P%28X%5E%7B%28j%29%7D%3Dx%5E%7B%28j%29%7D%7CY%3Dc_k%29%2C%20%5Cqquad%20k%3D1%2C2%2C…%2CK%0A)
示例
给定训练数据如下,%7D%2CX%5E%7B(2)%7D#card=math&code=X%5E%7B%281%29%7D%2CX%5E%7B%282%29%7D)表示特征,Y表示类别标签:
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 2 | 3 | 3 | 3 | 3 | 3 | |
S | M | M | S | S | S | M | M | L | L | L | M | M | L | L | |
Y | -1 | -1 | 1 | 1 | -1 | -1 | -1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | -1 |
求%5ET#card=math&code=x%3D%282%2C%20S%29%5ET)的类别?
求解:
- 计算先验概率和条件概率
%3D%5Cfrac%7B9%7D%7B15%7D%2CP(Y%3D-1)%3D%5Cfrac%7B6%7D%7B15%7D%20%5C%5C%0A%26P(X%5E%7B(1)%7D%3D1%7CY%3D1)%3D%5Cfrac%7B2%7D%7B9%7D%2CP(X%5E%7B(1)%7D%3D2%7CY%3D1)%3D%5Cfrac%7B3%7D%7B9%7D%2CP(X%5E%7B(1)%7D%3D3%7CY%3D1)%3D%5Cfrac%7B4%7D%7B9%7D%20%5C%5C%0A%26P(X%5E%7B(2)%7D%3DS%7CY%3D1)%3D%5Cfrac%7B1%7D%7B9%7D%2CP(X%5E%7B(2)%7D%3DM%7CY%3D1)%3D%5Cfrac%7B4%7D%7B9%7D%2CP(X%5E%7B(2)%7D%3DL%7CY%3D1)%3D%5Cfrac%7B4%7D%7B9%7D%20%5C%5C%0A%26P(X%5E%7B(1)%7D%3D1%7CY%3D-1)%3D%5Cfrac%7B3%7D%7B6%7D%2CP(X%5E%7B(1)%7D%3D2%7CY%3D-1)%3D%5Cfrac%7B2%7D%7B6%7D%2CP(X%5E%7B(1)%7D%3D3%7CY%3D-1)%3D%5Cfrac%7B1%7D%7B6%7D%20%5C%5C%0A%26P(X%5E%7B(2)%7D%3DS%7CY%3D-1)%3D%5Cfrac%7B3%7D%7B6%7D%2CP(X%5E%7B(2)%7D%3DM%7CY%3D-1)%3D%5Cfrac%7B2%7D%7B6%7D%2CP(X%5E%7B(2)%7D%3DL%7CY%3D-1)%3D%5Cfrac%7B1%7D%7B6%7D%20%0A%5Cend%7Baligned%7D%0A#card=math&code=%5Cbegin%7Baligned%7D%0A%26P%28Y%3D1%29%3D%5Cfrac%7B9%7D%7B15%7D%2CP%28Y%3D-1%29%3D%5Cfrac%7B6%7D%7B15%7D%20%5C%5C%0A%26P%28X%5E%7B%281%29%7D%3D1%7CY%3D1%29%3D%5Cfrac%7B2%7D%7B9%7D%2CP%28X%5E%7B%281%29%7D%3D2%7CY%3D1%29%3D%5Cfrac%7B3%7D%7B9%7D%2CP%28X%5E%7B%281%29%7D%3D3%7CY%3D1%29%3D%5Cfrac%7B4%7D%7B9%7D%20%5C%5C%0A%26P%28X%5E%7B%282%29%7D%3DS%7CY%3D1%29%3D%5Cfrac%7B1%7D%7B9%7D%2CP%28X%5E%7B%282%29%7D%3DM%7CY%3D1%29%3D%5Cfrac%7B4%7D%7B9%7D%2CP%28X%5E%7B%282%29%7D%3DL%7CY%3D1%29%3D%5Cfrac%7B4%7D%7B9%7D%20%5C%5C%0A%26P%28X%5E%7B%281%29%7D%3D1%7CY%3D-1%29%3D%5Cfrac%7B3%7D%7B6%7D%2CP%28X%5E%7B%281%29%7D%3D2%7CY%3D-1%29%3D%5Cfrac%7B2%7D%7B6%7D%2CP%28X%5E%7B%281%29%7D%3D3%7CY%3D-1%29%3D%5Cfrac%7B1%7D%7B6%7D%20%5C%5C%0A%26P%28X%5E%7B%282%29%7D%3DS%7CY%3D-1%29%3D%5Cfrac%7B3%7D%7B6%7D%2CP%28X%5E%7B%282%29%7D%3DM%7CY%3D-1%29%3D%5Cfrac%7B2%7D%7B6%7D%2CP%28X%5E%7B%282%29%7D%3DL%7CY%3D-1%29%3D%5Cfrac%7B1%7D%7B6%7D%20%0A%5Cend%7Baligned%7D%0A)
- 计算预测结果
%3DP(Y%3D1)P(X%5E%7B(1)%7D%3D2%7CY%3D1)P(X%5E%7B(2)%7D%3DS%7CY%3D1)%3D%5Cdfrac%7B9%7D%7B15%7D%20%5Ccdot%20%5Cdfrac%7B3%7D%7B9%7D%20%5Ccdot%20%5Cdfrac%7B1%7D%7B9%7D%20%3D%20%5Cdfrac%7B1%7D%7B45%7D%5C%5C%0A%26P(X%3Dx%7CY%3D-1)%3DP(Y%3D-1)P(X%5E%7B(1)%7D%3D2%7CY%3D-1)P(X%5E%7B(2)%7D%3DS%7CY%3D-1)%3D%5Cdfrac%7B6%7D%7B15%7D%20%5Ccdot%20%5Cdfrac%7B2%7D%7B6%7D%20%5Ccdot%20%5Cdfrac%7B3%7D%7B6%7D%20%3D%20%5Cdfrac%7B1%7D%7B15%7D%0A%5Cend%7Baligned%7D%0A#card=math&code=%5Cbegin%7Baligned%7D%0A%26P%28X%3Dx%7CY%3D1%29%3DP%28Y%3D1%29P%28X%5E%7B%281%29%7D%3D2%7CY%3D1%29P%28X%5E%7B%282%29%7D%3DS%7CY%3D1%29%3D%5Cdfrac%7B9%7D%7B15%7D%20%5Ccdot%20%5Cdfrac%7B3%7D%7B9%7D%20%5Ccdot%20%5Cdfrac%7B1%7D%7B9%7D%20%3D%20%5Cdfrac%7B1%7D%7B45%7D%5C%5C%0A%26P%28X%3Dx%7CY%3D-1%29%3DP%28Y%3D-1%29P%28X%5E%7B%281%29%7D%3D2%7CY%3D-1%29P%28X%5E%7B%282%29%7D%3DS%7CY%3D-1%29%3D%5Cdfrac%7B6%7D%7B15%7D%20%5Ccdot%20%5Cdfrac%7B2%7D%7B6%7D%20%5Ccdot%20%5Cdfrac%7B3%7D%7B6%7D%20%3D%20%5Cdfrac%7B1%7D%7B15%7D%0A%5Cend%7Baligned%7D%0A)
- 所以,取最大的概率
最终的结果为。
贝叶斯估计
上面的极大似然估计可能会出现概率为0的情况,于是采用贝叶斯估计修正如下:
%20%3D%20%5Cfrac%7B%5Cdisplaystyle%5Csum%7Bi%3D1%7D%5E%7BN%7DI(y_i%3Dc_k)%2B%5Clambda%7D%7BN%2BK%5Clambda%7D%20%5C%5C%0A%26P%7B%5Clambda%7D(X%5E%7B(j)%7D%7CY%3Dck)%3D%5Cfrac%7B%5Cdisplaystyle%5Csum%7Bi%3D1%7D%5EN%20I(xi%5E%7B(j)%7D%3Da%7Bjm%7D%2Cyi%3Dc_k)%20%2B%5Clambda%20%7D%7B%5Cdisplaystyle%5Csum%7Bi%3D1%7D%5EN%20I(yi%3Dc_k)%2BS_j%5Clambda%7D%20%5C%5C%0A%26(k%3D1%2C2%2C…%2CK%EF%BC%9Bm%3D1%2C2%2C…%2CS_i%EF%BC%9Bj%3D1%2C2%2C…%2Cn)%0A%0A%5Cend%7Baligned%7D%0A#card=math&code=%5Cbegin%7Baligned%7D%0A%26P%7B%5Clambda%7D%28Y%3Dck%29%20%3D%20%5Cfrac%7B%5Cdisplaystyle%5Csum%7Bi%3D1%7D%5E%7BN%7DI%28yi%3Dc_k%29%2B%5Clambda%7D%7BN%2BK%5Clambda%7D%20%5C%5C%0A%26P%7B%5Clambda%7D%28X%5E%7B%28j%29%7D%7CY%3Dck%29%3D%5Cfrac%7B%5Cdisplaystyle%5Csum%7Bi%3D1%7D%5EN%20I%28xi%5E%7B%28j%29%7D%3Da%7Bjm%7D%2Cyi%3Dc_k%29%20%2B%5Clambda%20%7D%7B%5Cdisplaystyle%5Csum%7Bi%3D1%7D%5EN%20I%28y_i%3Dc_k%29%2BS_j%5Clambda%7D%20%5C%5C%0A%26%28k%3D1%2C2%2C…%2CK%EF%BC%9Bm%3D1%2C2%2C…%2CS_i%EF%BC%9Bj%3D1%2C2%2C…%2Cn%29%0A%0A%5Cend%7Baligned%7D%0A)
当时,称为拉普拉斯平滑。
示例
以拉普拉斯平滑,我们还是用上面的例子演示一下计算过程:
求解:
- 计算先验概率和条件概率
%3D%5Cfrac%7B10%7D%7B17%7D%2CP(Y%3D-1)%3D%5Cfrac%7B7%7D%7B17%7D%20%5C%5C%0A%26P(X%5E%7B(1)%7D%3D1%7CY%3D1)%3D%5Cfrac%7B3%7D%7B12%7D%2CP(X%5E%7B(1)%7D%3D2%7CY%3D1)%3D%5Cfrac%7B4%7D%7B12%7D%2CP(X%5E%7B(1)%7D%3D3%7CY%3D1)%3D%5Cfrac%7B5%7D%7B12%7D%20%5C%5C%0A%26P(X%5E%7B(2)%7D%3DS%7CY%3D1)%3D%5Cfrac%7B2%7D%7B12%7D%2CP(X%5E%7B(2)%7D%3DM%7CY%3D1)%3D%5Cfrac%7B5%7D%7B12%7D%2CP(X%5E%7B(2)%7D%3DL%7CY%3D1)%3D%5Cfrac%7B5%7D%7B12%7D%20%5C%5C%0A%26P(X%5E%7B(1)%7D%3D1%7CY%3D-1)%3D%5Cfrac%7B4%7D%7B9%7D%2CP(X%5E%7B(1)%7D%3D2%7CY%3D-1)%3D%5Cfrac%7B3%7D%7B9%7D%2CP(X%5E%7B(1)%7D%3D3%7CY%3D-1)%3D%5Cfrac%7B2%7D%7B9%7D%20%5C%5C%0A%26P(X%5E%7B(2)%7D%3DS%7CY%3D-1)%3D%5Cfrac%7B4%7D%7B9%7D%2CP(X%5E%7B(2)%7D%3DM%7CY%3D-1)%3D%5Cfrac%7B3%7D%7B9%7D%2CP(X%5E%7B(2)%7D%3DL%7CY%3D-1)%3D%5Cfrac%7B2%7D%7B9%7D%20%0A%5Cend%7Baligned%7D%0A#card=math&code=%5Cbegin%7Baligned%7D%0A%26P%28Y%3D1%29%3D%5Cfrac%7B10%7D%7B17%7D%2CP%28Y%3D-1%29%3D%5Cfrac%7B7%7D%7B17%7D%20%5C%5C%0A%26P%28X%5E%7B%281%29%7D%3D1%7CY%3D1%29%3D%5Cfrac%7B3%7D%7B12%7D%2CP%28X%5E%7B%281%29%7D%3D2%7CY%3D1%29%3D%5Cfrac%7B4%7D%7B12%7D%2CP%28X%5E%7B%281%29%7D%3D3%7CY%3D1%29%3D%5Cfrac%7B5%7D%7B12%7D%20%5C%5C%0A%26P%28X%5E%7B%282%29%7D%3DS%7CY%3D1%29%3D%5Cfrac%7B2%7D%7B12%7D%2CP%28X%5E%7B%282%29%7D%3DM%7CY%3D1%29%3D%5Cfrac%7B5%7D%7B12%7D%2CP%28X%5E%7B%282%29%7D%3DL%7CY%3D1%29%3D%5Cfrac%7B5%7D%7B12%7D%20%5C%5C%0A%26P%28X%5E%7B%281%29%7D%3D1%7CY%3D-1%29%3D%5Cfrac%7B4%7D%7B9%7D%2CP%28X%5E%7B%281%29%7D%3D2%7CY%3D-1%29%3D%5Cfrac%7B3%7D%7B9%7D%2CP%28X%5E%7B%281%29%7D%3D3%7CY%3D-1%29%3D%5Cfrac%7B2%7D%7B9%7D%20%5C%5C%0A%26P%28X%5E%7B%282%29%7D%3DS%7CY%3D-1%29%3D%5Cfrac%7B4%7D%7B9%7D%2CP%28X%5E%7B%282%29%7D%3DM%7CY%3D-1%29%3D%5Cfrac%7B3%7D%7B9%7D%2CP%28X%5E%7B%282%29%7D%3DL%7CY%3D-1%29%3D%5Cfrac%7B2%7D%7B9%7D%20%0A%5Cend%7Baligned%7D%0A)
所以,以Y为例,因为总共有2个取值,所以,然后因为是拉普拉斯平滑,所以
,所以,分子多了1,分母多了
,从15->17。其他同理。
- 计算预测结果
%3DP(Y%3D1)P(X%5E%7B(1)%7D%3D2%7CY%3D1)P(X%5E%7B(2)%7D%3DS%7CY%3D1)%3D%5Cdfrac%7B10%7D%7B17%7D%20%5Ccdot%20%5Cdfrac%7B4%7D%7B12%7D%20%5Ccdot%20%5Cdfrac%7B2%7D%7B12%7D%20%3D%20%5Cdfrac%7B5%7D%7B153%7D%3D0.0327%5C%5C%0A%26P(X%3Dx%7CY%3D-1)%3DP(Y%3D-1)P(X%5E%7B(1)%7D%3D2%7CY%3D-1)P(X%5E%7B(2)%7D%3DS%7CY%3D-1)%3D%5Cdfrac%7B7%7D%7B17%7D%20%5Ccdot%20%5Cdfrac%7B3%7D%7B9%7D%20%5Ccdot%20%5Cdfrac%7B4%7D%7B9%7D%20%3D%20%5Cdfrac%7B28%7D%7B459%7D%20%3D0.0610%0A%5Cend%7Baligned%7D%0A#card=math&code=%5Cbegin%7Baligned%7D%0A%26P%28X%3Dx%7CY%3D1%29%3DP%28Y%3D1%29P%28X%5E%7B%281%29%7D%3D2%7CY%3D1%29P%28X%5E%7B%282%29%7D%3DS%7CY%3D1%29%3D%5Cdfrac%7B10%7D%7B17%7D%20%5Ccdot%20%5Cdfrac%7B4%7D%7B12%7D%20%5Ccdot%20%5Cdfrac%7B2%7D%7B12%7D%20%3D%20%5Cdfrac%7B5%7D%7B153%7D%3D0.0327%5C%5C%0A%26P%28X%3Dx%7CY%3D-1%29%3DP%28Y%3D-1%29P%28X%5E%7B%281%29%7D%3D2%7CY%3D-1%29P%28X%5E%7B%282%29%7D%3DS%7CY%3D-1%29%3D%5Cdfrac%7B7%7D%7B17%7D%20%5Ccdot%20%5Cdfrac%7B3%7D%7B9%7D%20%5Ccdot%20%5Cdfrac%7B4%7D%7B9%7D%20%3D%20%5Cdfrac%7B28%7D%7B459%7D%20%3D0.0610%0A%5Cend%7Baligned%7D%0A)
- 所以,取最大的概率
最终的结果为。
贝叶斯分类适用场景
从上面的过程来看:
- 输入数据必须是离散的,不然没法计算概率
- 输出可以是多分类的
- 对数据量要求不高