感知器
拖延了挺久,最近准备按计划捡起机器学习,上学期虽然也有花时间,但是过于零碎杂乱。视频也看了,吴恩达的机器学习视频,浙大的机器学习视频,前者是很好的入门指引,但需要进一步深入才能充分理解(指数理形式上),后者讲得细致、深入但对于传统机器学习算法仅讲了支持向量机。自忖要想完全投入进一个领域,必须对其有全面和深入的感知。开始重启ML学习。
问题形式
(主要)针对线性可分问题。
一个简单的二分类问题,给定一个数据集(输入为n维向量,输出为一维常数,-1或1),构建一个线性函数,使得对于每个输入,都能得到对应的输出。
数据集(假设为线性可分):
%2C(x%5E2%2C%20y%5E2)%2C%5Cdots%2C(x%5EN%2C%20y%5EN%5C%7D%20%5C%5C%0Ax%3D(x_1%2C%20x_2%2C%20%5Cdots%2C%20x_n)%2C%5C%20y%5Cin%20%5C%7B1%2C%20-1%5C%7D%0A#card=math&code=%5C%7B%28x%5E1%2C%20y%5E1%29%2C%28x%5E2%2C%20y%5E2%29%2C%5Cdots%2C%28x%5EN%2C%20y%5EN%5C%7D%20%5C%5C%0Ax%3D%28x_1%2C%20x_2%2C%20%5Cdots%2C%20x_n%29%2C%5C%20y%5Cin%20%5C%7B1%2C%20-1%5C%7D%0A)
判断函数(维度经过扩充):
%20%3D%20sign(w%5ETx)%20%5C%5C%0Aw%3D(w_0%2Cw_1%2C%5Cdots%2C%20w_n)%EF%BC%8Cx%3D(1%2Cx_1%2Cx_2%2C%5Cdots%2C%20x_n)%0A#card=math&code=f%28x%29%20%3D%20sign%28w%5ETx%29%20%5C%5C%0Aw%3D%28w_0%2Cw_1%2C%5Cdots%2C%20w_n%29%EF%BC%8Cx%3D%281%2Cx_1%2Cx_2%2C%5Cdots%2C%20x_n%29%0A)
目标
在最终停止更新时(设此时),不存在误分类样本
PLA(Perceptron Learning Algorithm)
- 初始化参数
- 检查是否有误分类点,如果没有则停止
- 更新参数
,(点
#card=math&code=%28x%5Ei%2Cy%5Ei%29)为误分类样本),转向2
一点解释
更新参数式子的来历:
当误分类时,,要使误分类情况好转,要增加
的值,即
%5ETx%20%5Cleq%20y(w%5Et)%5ETx%0A#card=math&code=y%28w%5E%7Bt%2B1%7D%29%5ETx%20%5Cleq%20y%28w%5Et%29%5ETx%0A)
设,则有
比较自然的选择则是:,即
算法收敛性:
%5C%5C%20%0A%26%3D(w%5Ef)%5ETw%5Et%2By%5Ei(w%5Ef)%5ETx%5Ei%5C%5C%0A%26%5Cgeq(w%5Ef)%5ETw%5Et%2B%5Cmin_n(y%5Ei(w%5Ef)%5ETx%5Ei)%0A%5Cend%7Balign%7D%0A#card=math&code=%5Cbegin%7Balign%7D%0Aw%5Efw%5E%7Bt%2B1%7D%20%26%20%3D%20w%5Ef%28w%5Et%2By%5Eix%5Ei%29%5C%5C%20%0A%26%3D%28w%5Ef%29%5ETw%5Et%2By%5Ei%28w%5Ef%29%5ETx%5Ei%5C%5C%0A%26%5Cgeq%28w%5Ef%29%5ETw%5Et%2B%5Cmin_n%28y%5Ei%28w%5Ef%29%5ETx%5Ei%29%0A%5Cend%7Balign%7D%0A)
%5ETx%5Ei%20%2B%20%7C%7Cy%5Eix%5Ei%7C%7C%5E2%5C%5C%0A%26%5Cleq%20%7C%7Cw%5Et%7C%7C%5E2%2B%7C%7Cx%5Ei%7C%7C%5C%5C%0A%26%5Cleq%20%7C%7Cw%5Et%7C%7C%5E2%2B%5Cmax_n%7C%7Cx%5Ei%7C%7C%5E2%0A%5Cend%7Balign%7D%0A#card=math&code=%5Cbegin%7Balign%7D%0A%7C%7Cw%5E%7Bt%2B1%7D%7C%7C%20%26%3D%20%7C%7Cw%5Et%2By%5Eix%5Ei%7C%7C%5C%5C%0A%26%3D%7C%7Cw%5Et%7C%7C%5E2%2B2y%5Ei%28w%5Et%29%5ETx%5Ei%20%2B%20%7C%7Cy%5Eix%5Ei%7C%7C%5E2%5C%5C%0A%26%5Cleq%20%7C%7Cw%5Et%7C%7C%5E2%2B%7C%7Cx%5Ei%7C%7C%5C%5C%0A%26%5Cleq%20%7C%7Cw%5Et%7C%7C%5E2%2B%5Cmax_n%7C%7Cx%5Ei%7C%7C%5E2%0A%5Cend%7Balign%7D%0A)
假设,
%5ETw%5Et%2B%20%5Cmin_n%7By%5Ei(w%5Ef)%5ETx%5Ei%7D%20%5C%5C%0A%26%5Cgeq%20(w%5Ef)%5ETw%5E%7Bt-1%7D%20%2B%20%5Cmin_n%7By%5Ei(w%5Ef)%5ETx%5Ei%7D%20%5C%5C%0A%26%5Cdots%20%5C%5C%0A%26%5Cgeq%20(w%5Ef)%5ETw%5E%7B0%7D%20%2B%20%5Cmin_n%7By%5Ei(w%5Ef)%5ETx%5Ei%7D%20%5C%5C%0A%26%3D%20%5Cmin_n%7By%5Ei(w%5Ef)%5ETx%5Ei%7D%0A%5Ctag%7B1%7D%0A%5Cend%7Balign%7D%0A#card=math&code=%5Cbegin%7Balign%7D%0Aw%5Efw%5E%7Bt%2B1%7D%20%26%5Cgeq%20%28w%5Ef%29%5ETw%5Et%2B%20%5Cmin_n%7By%5Ei%28w%5Ef%29%5ETx%5Ei%7D%20%5C%5C%0A%26%5Cgeq%20%28w%5Ef%29%5ETw%5E%7Bt-1%7D%20%2B%20%5Cmin_n%7By%5Ei%28w%5Ef%29%5ETx%5Ei%7D%20%5C%5C%0A%26%5Cdots%20%5C%5C%0A%26%5Cgeq%20%28w%5Ef%29%5ETw%5E%7B0%7D%20%2B%20%5Cmin_n%7By%5Ei%28w%5Ef%29%5ETx%5Ei%7D%20%5C%5C%0A%26%3D%20%5Cmin_n%7By%5Ei%28w%5Ef%29%5ETx%5Ei%7D%0A%5Ctag%7B1%7D%0A%5Cend%7Balign%7D%0A)
联合式子(1),(2),有
%20%26%3D%20%5Cfrac%7B(w%5Ef)%5ETw%5E%7Bt%2B1%7D%7D%7B%7C%7Cw%5Efw%5E%7Bt%2B1%7D%7C%7C%7D%20%5C%5C%0A%26%5Cgeq%20%5Cfrac%7Bt%20%5Cmin_n(y_i(w%5Ef)%5ETx%5Ei)%7D%7B%7C%7Cw%5Ef%7C%7C%20%5Csqrt%7Bt%20%5Cmax_n%7C%7Cy%5Eix%5Ei%7C%7C%5E2%7D%7D%5C%5C%0A%26%5Cgeq%20%5Csqrt%7Bt%7D%20%5Cfrac%7B%5Cmin_n(y%5Ei(w%5Ef)%5ETx%5Ei)%7D%7B%7C%7Cw%5Ef%7C%7C%5Cmax_n%7C%7Cy%5Eix%5Ei%7C%7C%7D%20%5Ctag%7B3%7D%0A%5Cend%7Balign%7D%0A#card=math&code=%5Cbegin%7Balign%7D%0Acos%28w%5Ef%2Cw%5E%7Bt%2B1%7D%29%20%26%3D%20%5Cfrac%7B%28w%5Ef%29%5ETw%5E%7Bt%2B1%7D%7D%7B%7C%7Cw%5Efw%5E%7Bt%2B1%7D%7C%7C%7D%20%5C%5C%0A%26%5Cgeq%20%5Cfrac%7Bt%20%5Cmin_n%28y_i%28w%5Ef%29%5ETx%5Ei%29%7D%7B%7C%7Cw%5Ef%7C%7C%20%5Csqrt%7Bt%20%5Cmax_n%7C%7Cy%5Eix%5Ei%7C%7C%5E2%7D%7D%5C%5C%0A%26%5Cgeq%20%5Csqrt%7Bt%7D%20%5Cfrac%7B%5Cmin_n%28y%5Ei%28w%5Ef%29%5ETx%5Ei%29%7D%7B%7C%7Cw%5Ef%7C%7C%5Cmax_n%7C%7Cy%5Eix%5Ei%7C%7C%7D%20%5Ctag%7B3%7D%0A%5Cend%7Balign%7D%0A)
设
%5ETx%5Ei)%7D%7B%7C%7Cw%5Ef%7C%7C%7D%0A#card=math&code=R%5E2%20%3D%20%5Cmax_n%7C%7Cy%5Eix%5Ei%7C%7C%5E2%20%2C%5C%20%5Crho%20%3D%20%5Cfrac%7B%5Cmin_n%28y%5Ei%28w%5Ef%29%5ETx%5Ei%29%7D%7B%7C%7Cw%5Ef%7C%7C%7D%0A)
则由式子(3)可得
说明参数迭代次数是由上界的,且上界为常数,即算法是收敛的
Modified PLA
- 初始化参数
- 检查是否有误分类点,如果没有则停止
,(点
#card=math&code=%28x%5Ei%2Cy%5Ei%29)为误分类样本)
- 如果在
下,误分类点数量小于在
下,则更新
,转向2
注:
在林轩田的《机器学习基础》和李航的《统计学习方法》上,解决问题的思路有所区别。
前者优化目标是最小化误分类点数量,由线性可分到非线性可分数据集,给出了两种算法,思路分别是正确分类出每个数据、误分类的数据点数量最少;后者优化目标是最小化(所有)误分类点距超平面(所假设的线性方程)的距离和。
两者导出的结论是一致的(在数据线性可分的情况下)。前者直观、形象,但证明时更需要敏锐的观察和思考;后者更加条理化(由优化目标导出的损失函数可求导),流程化。