问题提出

有一些数据逻辑回归 - 图1,它们属于两个类别,数据逻辑回归 - 图2类别逻辑回归 - 图3,在坐标系中以散点图表示出来如下图:
Figure_1.png
在p维空间里,如何将它们用一个p-1维超平面分开?

核心:sigmoid函数

sigmoid函数定义如下:
逻辑回归 - 图5
表示数据样本逻辑回归 - 图6的类别逻辑回归 - 图7的概率。
该函数有如下性质:

  • 逻辑回归 - 图8
  • 逻辑回归 - 图9
  • 逻辑回归 - 图10关于点逻辑回归 - 图11对称

相应地,有:
逻辑回归 - 图12
现考虑如下函数:
逻辑回归 - 图13
相应地:
逻辑回归 - 图14

其中超平面逻辑回归 - 图15为将数据集正确分类的超平面,类别为1的数据集在超平面上方,类别为0在下方。
考虑一下,如果一个数据样本逻辑回归 - 图16落在超平面上方,那么会有逻辑回归 - 图17,会有逻辑回归 - 图18;反之则会有逻辑回归 - 图19
所以,现在请把逻辑回归模型想象成一个黑盒,输入是一个数据样本逻辑回归 - 图20,输出是一个概率值,即逻辑回归 - 图21,如果概率值大于0.5,则将该数据样本归于类别1,否则归于类别0。

训练目标

对于一个数据样本,其所属类别非0即1,故满足0-1分布:
逻辑回归 - 图22
结合刚才的讨论不难发现,对于被正确分类的样本,会有逻辑回归 - 图23;而对于被错误分类的样本,会有逻辑回归 - 图24
如果分类做得越好,那么被正确分类的样本会越多,那么值逻辑回归 - 图25%22%20aria-hidden%3D%22true%22%3E%0A%20%3Cuse%20transform%3D%22scale(1.2)%22%20xlink%3Ahref%3D%22%23E1-MJSZ2-220F%22%20x%3D%220%22%20y%3D%22-1%22%3E%3C%2Fuse%3E%0A%3Cg%20transform%3D%22translate(77%2C-1308)%22%3E%0A%20%3Cuse%20transform%3D%22scale(0.849)%22%20xlink%3Ahref%3D%22%23E1-MJMATHI-69%22%20x%3D%220%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20transform%3D%22scale(0.849)%22%20xlink%3Ahref%3D%22%23E1-MJMAIN-3D%22%20x%3D%22345%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20transform%3D%22scale(0.849)%22%20xlink%3Ahref%3D%22%23E1-MJMAIN-31%22%20x%3D%221123%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%3C%2Fg%3E%0A%20%3Cuse%20transform%3D%22scale(0.849)%22%20xlink%3Ahref%3D%22%23E1-MJMATHI-6E%22%20x%3D%22603%22%20y%3D%221627%22%3E%3C%2Fuse%3E%0A%3Cg%20transform%3D%22translate(1734%2C0)%22%3E%0A%20%3Cuse%20transform%3D%22scale(1.2)%22%20xlink%3Ahref%3D%22%23E1-MJMATHI-70%22%20x%3D%220%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20transform%3D%22scale(0.849)%22%20xlink%3Ahref%3D%22%23E1-MJMATHI-69%22%20x%3D%22712%22%20y%3D%22-326%22%3E%3C%2Fuse%3E%0A%3C%2Fg%3E%0A%3C%2Fg%3E%0A%3C%2Fsvg%3E#card=math&code=%5Clarge%0A%5Cprod%7Bi%3D1%7D%5En%20p_i&id=w6CDa)就会越大。
所以模型训练目标是为了使如下这个函数值最大:
![](https://cdn.nlark.com/yuque/__latex/f96fd71cb97be554cd2c49a825fbac42.svg#card=math&code=%5Clarge%0AL%28w%2C%20b%29%3D%5Cln%20%28%5Cprod
%7Bi%3D1%7D%5En%20pi%29%0A%5C%5C&id=ImSWX)
所以目标参数为:
![](https://cdn.nlark.com/yuque/__latex/6b4eaaa3de53e95fdd9d1b11c416fae7.svg#card=math&code=%5Clarge%0A%5Chat%20w%2C%20%5Chat%20b%20%3D%20arg%20%5Cmax
%7Bw%2C%20b%7D%20L%28w%2C%20b%29%0A%5C%5C&id=UjqZp)
而经过推导,可以得到:
逻辑回归 - 图26
逻辑回归 - 图27

逻辑回归 - 图28逻辑回归 - 图29
逻辑回归 - 图30
逻辑回归 - 图31
逻辑回归 - 图32分别求偏导,得到:
逻辑回归 - 图33
逻辑回归 - 图34
通过梯度下降上升法多次迭代,求得最优的参数逻辑回归 - 图35

数据集及最终训练结果

数据集如下:

  1. x1 x2 y
  2. -0.017612 14.053064 0
  3. -1.395634 4.662541 1
  4. -0.752157 6.538620 0
  5. -1.322371 7.152853 0
  6. 0.423363 11.054677 0
  7. 0.406704 7.067335 1
  8. 0.667394 12.741452 0
  9. -2.460150 6.866805 1
  10. 0.569411 9.548755 0
  11. -0.026632 10.427743 0
  12. 0.850433 6.920334 1
  13. 1.347183 13.175500 0
  14. 1.176813 3.167020 1
  15. -1.781871 9.097953 0
  16. -0.566606 5.749003 1
  17. 0.931635 1.589505 1
  18. -0.024205 6.151823 1
  19. -0.036453 2.690988 1
  20. -0.196949 0.444165 1
  21. 1.014459 5.754399 1
  22. 1.985298 3.230619 1
  23. -1.693453 -0.557540 1
  24. -0.576525 11.778922 0
  25. -0.346811 -1.678730 1
  26. -2.124484 2.672471 1
  27. 1.217916 9.597015 0
  28. -0.733928 9.098687 0
  29. -3.642001 -1.618087 1
  30. 0.315985 3.523953 1
  31. 1.416614 9.619232 0
  32. -0.386323 3.989286 1
  33. 0.556921 8.294984 1
  34. 1.224863 11.587360 0
  35. -1.347803 -2.406051 1
  36. 1.196604 4.951851 1
  37. 0.275221 9.543647 0
  38. 0.470575 9.332488 0
  39. -1.889567 9.542662 0
  40. -1.527893 12.150579 0
  41. -1.185247 11.309318 0
  42. -0.445678 3.297303 1
  43. 1.042222 6.105155 1
  44. -0.618787 10.320986 0
  45. 1.152083 0.548467 1
  46. 0.828534 2.676045 1
  47. -1.237728 10.549033 0
  48. -0.683565 -2.166125 1
  49. 0.229456 5.921938 1
  50. -0.959885 11.555336 0
  51. 0.492911 10.993324 0
  52. 0.184992 8.721488 0
  53. -0.355715 10.325976 0
  54. -0.397822 8.058397 0
  55. 0.824839 13.730343 0
  56. 1.507278 5.027866 1
  57. 0.099671 6.835839 1
  58. -0.344008 10.717485 0
  59. 1.785928 7.718645 1
  60. -0.918801 11.560217 0
  61. -0.364009 4.747300 1
  62. -0.841722 4.119083 1
  63. 0.490426 1.960539 1
  64. -0.007194 9.075792 0
  65. 0.356107 12.447863 0
  66. 0.342578 12.281162 0
  67. -0.810823 -1.466018 1
  68. 2.530777 6.476801 1
  69. 1.296683 11.607559 0
  70. 0.475487 12.040035 0
  71. -0.783277 11.009725 0
  72. 0.074798 11.023650 0
  73. -1.337472 0.468339 1
  74. -0.102781 13.763651 0
  75. -0.147324 2.874846 1
  76. 0.518389 9.887035 0
  77. 1.015399 7.571882 0
  78. -1.658086 -0.027255 1
  79. 1.319944 2.171228 1
  80. 2.056216 5.019981 1
  81. -0.851633 4.375691 1
  82. -1.510047 6.061992 0
  83. -1.076637 -3.181888 1
  84. 1.821096 10.283990 0
  85. 3.010150 8.401766 1
  86. -1.099458 1.688274 1
  87. -0.834872 -1.733869 1
  88. -0.846637 3.849075 1
  89. 1.400102 12.628781 0
  90. 1.752842 5.468166 1
  91. 0.078557 0.059736 1
  92. 0.089392 -0.715300 1
  93. 1.825662 12.693808 0
  94. 0.197445 9.744638 0
  95. 0.126117 0.922311 1
  96. -0.679797 1.220530 1
  97. 0.677983 2.556666 1
  98. 0.761349 10.693862 0
  99. -2.168791 0.143632 1
  100. 1.388610 9.341997 0
  101. 0.317029 14.739025 0

设置梯度上升参数α=0.001,训练次数10000次,最终得到:

  1. w1 = 1.25358296
  2. w2 = -2.00267269
  3. b = 14.75214744

结果如下图:
Figure_1.png

优缺点

  • 优点:计算代价不高,易于理解和实现。
  • 缺点:容易欠拟合,分类精度可能不高。