问题提出
有一些数据,它们属于两个类别,数据
类别
,在坐标系中以散点图表示出来如下图:

在p维空间里,如何将它们用一个p-1维超平面分开?
核心:sigmoid函数
sigmoid函数定义如下:
表示数据样本的类别
的概率。
该函数有如下性质:
关于点
对称
相应地,有:
现考虑如下函数:
相应地:
其中超平面为将数据集正确分类的超平面,类别为1的数据集在超平面上方,类别为0在下方。
考虑一下,如果一个数据样本落在超平面上方,那么会有
,会有
;反之则会有
。
所以,现在请把逻辑回归模型想象成一个黑盒,输入是一个数据样本,输出是一个概率值,即
,如果概率值大于0.5,则将该数据样本归于类别1,否则归于类别0。
训练目标
对于一个数据样本,其所属类别非0即1,故满足0-1分布:
结合刚才的讨论不难发现,对于被正确分类的样本,会有;而对于被错误分类的样本,会有
。
如果分类做得越好,那么被正确分类的样本会越多,那么值%22%20aria-hidden%3D%22true%22%3E%0A%20%3Cuse%20transform%3D%22scale(1.2)%22%20xlink%3Ahref%3D%22%23E1-MJSZ2-220F%22%20x%3D%220%22%20y%3D%22-1%22%3E%3C%2Fuse%3E%0A%3Cg%20transform%3D%22translate(77%2C-1308)%22%3E%0A%20%3Cuse%20transform%3D%22scale(0.849)%22%20xlink%3Ahref%3D%22%23E1-MJMATHI-69%22%20x%3D%220%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20transform%3D%22scale(0.849)%22%20xlink%3Ahref%3D%22%23E1-MJMAIN-3D%22%20x%3D%22345%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20transform%3D%22scale(0.849)%22%20xlink%3Ahref%3D%22%23E1-MJMAIN-31%22%20x%3D%221123%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%3C%2Fg%3E%0A%20%3Cuse%20transform%3D%22scale(0.849)%22%20xlink%3Ahref%3D%22%23E1-MJMATHI-6E%22%20x%3D%22603%22%20y%3D%221627%22%3E%3C%2Fuse%3E%0A%3Cg%20transform%3D%22translate(1734%2C0)%22%3E%0A%20%3Cuse%20transform%3D%22scale(1.2)%22%20xlink%3Ahref%3D%22%23E1-MJMATHI-70%22%20x%3D%220%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20transform%3D%22scale(0.849)%22%20xlink%3Ahref%3D%22%23E1-MJMATHI-69%22%20x%3D%22712%22%20y%3D%22-326%22%3E%3C%2Fuse%3E%0A%3C%2Fg%3E%0A%3C%2Fg%3E%0A%3C%2Fsvg%3E#card=math&code=%5Clarge%0A%5Cprod%7Bi%3D1%7D%5En%20p_i&id=w6CDa)就会越大。
所以模型训练目标是为了使如下这个函数值最大:

所以目标参数为:

而经过推导,可以得到:
对分别求偏导,得到:
通过梯度下降上升法多次迭代,求得最优的参数。
数据集及最终训练结果
数据集如下:
x1 x2 y-0.017612 14.053064 0-1.395634 4.662541 1-0.752157 6.538620 0-1.322371 7.152853 00.423363 11.054677 00.406704 7.067335 10.667394 12.741452 0-2.460150 6.866805 10.569411 9.548755 0-0.026632 10.427743 00.850433 6.920334 11.347183 13.175500 01.176813 3.167020 1-1.781871 9.097953 0-0.566606 5.749003 10.931635 1.589505 1-0.024205 6.151823 1-0.036453 2.690988 1-0.196949 0.444165 11.014459 5.754399 11.985298 3.230619 1-1.693453 -0.557540 1-0.576525 11.778922 0-0.346811 -1.678730 1-2.124484 2.672471 11.217916 9.597015 0-0.733928 9.098687 0-3.642001 -1.618087 10.315985 3.523953 11.416614 9.619232 0-0.386323 3.989286 10.556921 8.294984 11.224863 11.587360 0-1.347803 -2.406051 11.196604 4.951851 10.275221 9.543647 00.470575 9.332488 0-1.889567 9.542662 0-1.527893 12.150579 0-1.185247 11.309318 0-0.445678 3.297303 11.042222 6.105155 1-0.618787 10.320986 01.152083 0.548467 10.828534 2.676045 1-1.237728 10.549033 0-0.683565 -2.166125 10.229456 5.921938 1-0.959885 11.555336 00.492911 10.993324 00.184992 8.721488 0-0.355715 10.325976 0-0.397822 8.058397 00.824839 13.730343 01.507278 5.027866 10.099671 6.835839 1-0.344008 10.717485 01.785928 7.718645 1-0.918801 11.560217 0-0.364009 4.747300 1-0.841722 4.119083 10.490426 1.960539 1-0.007194 9.075792 00.356107 12.447863 00.342578 12.281162 0-0.810823 -1.466018 12.530777 6.476801 11.296683 11.607559 00.475487 12.040035 0-0.783277 11.009725 00.074798 11.023650 0-1.337472 0.468339 1-0.102781 13.763651 0-0.147324 2.874846 10.518389 9.887035 01.015399 7.571882 0-1.658086 -0.027255 11.319944 2.171228 12.056216 5.019981 1-0.851633 4.375691 1-1.510047 6.061992 0-1.076637 -3.181888 11.821096 10.283990 03.010150 8.401766 1-1.099458 1.688274 1-0.834872 -1.733869 1-0.846637 3.849075 11.400102 12.628781 01.752842 5.468166 10.078557 0.059736 10.089392 -0.715300 11.825662 12.693808 00.197445 9.744638 00.126117 0.922311 1-0.679797 1.220530 10.677983 2.556666 10.761349 10.693862 0-2.168791 0.143632 11.388610 9.341997 00.317029 14.739025 0
设置梯度上升参数α=0.001,训练次数10000次,最终得到:
w1 = 1.25358296w2 = -2.00267269b = 14.75214744
结果如下图:
优缺点
- 优点:计算代价不高,易于理解和实现。
- 缺点:容易欠拟合,分类精度可能不高。
