Multivariate(多元) Linear Regression
Multiple Features
Notation:
%7D%24%24%20%3D%20input(features)%20of%20%24%24i%5E%7Bth%7D%24%24%20training%20example%0A%0A%24%24x%5E%7B(i)%7D%7Bj%7D%24%24%20%3D%20value%20of%20feature%20%24%24j%24%24%20in%20%24%24%20i%5E%7Bth%7D%24%24%20training%20example%0A%0Aexample%3A%0A%0A!%5Bimage-20210416105906694%5D(https%3A%2F%2Fraw.githubusercontent.com%2FRainGivingU%2FPicture-warehouse%2Fmaster%2Fimg%2F20210416105913.png)%0A%0A%24%24%20n%20%3D%204%24%24%09%09%09%09%09%24%24%20x%5E%7B(2)%7D%20%3D%20%5Cbegin%7Bbmatrix%7D%0A%201416%5C%5C%0A%203%5C%5C%0A%202%5C%5C%0A40%0A%5Cend%7Bbmatrix%7D%24%24%20%09%09%09%09%24%24x%5E%7B(2)%7D%7B4%7D%20%3D%2040#card=math&code=n%24%24%20%3D%20number%20of%20features%0A%0A%24%24x%5E%7B%28i%29%7D%24%24%20%3D%20input%28features%29%20of%20%24%24i%5E%7Bth%7D%24%24%20training%20example%0A%0A%24%24x%5E%7B%28i%29%7D%7Bj%7D%24%24%20%3D%20value%20of%20feature%20%24%24j%24%24%20in%20%24%24%20i%5E%7Bth%7D%24%24%20training%20example%0A%0A%2Aexample%3A%2A%0A%0A%21%5Bimage-20210416105906694%5D%28https%3A%2F%2Fraw.githubusercontent.com%2FRainGivingU%2FPicture-warehouse%2Fmaster%2Fimg%2F20210416105913.png%29%0A%0A%24%24%20n%20%3D%204%24%24%09%09%09%09%09%24%24%20x%5E%7B%282%29%7D%20%3D%20%5Cbegin%7Bbmatrix%7D%0A%201416%5C%5C%0A%203%5C%5C%0A%202%5C%5C%0A40%0A%5Cend%7Bbmatrix%7D%24%24%20%09%09%09%09%24%24x%5E%7B%282%29%7D%7B4%7D%20%3D%2040)
Hypothesis: h{\theta}(x)=\theta{0}+\theta{1} x{1}+\theta{2} x{2}+\cdots+\theta{n} x{n}
For convenience of notation, define x_{0} = 1
%3D%5Ctheta%7B0%7D%2B%5Ctheta%7B1%7D%20x%7B1%7D%2B%5Ctheta%7B2%7D%20x%7B2%7D%2B%5Ccdots%2B%5Ctheta%7Bn%7D%20x%7Bn%7D%20%3D%20%5Ctheta%5E%7BT%7Dx%0A#card=math&code=h%7B%5Ctheta%7D%28x%29%3D%5Ctheta%7B0%7D%2B%5Ctheta%7B1%7D%20x%7B1%7D%2B%5Ctheta%7B2%7D%20x%7B2%7D%2B%5Ccdots%2B%5Ctheta%7Bn%7D%20x_%7Bn%7D%20%3D%20%5Ctheta%5E%7BT%7Dx%0A)
Gradient Descent for Multiple Variables
Hypothesis: h{\theta}(x)=\theta{0}+\theta{1} x{1}+\theta{2} x{2}+\cdots+\theta{n} x{n} = \theta^{T}x
Parameters: \theta{0}, \theta{1}, \ldots, \theta_{n}
不要将参数看成n + 1个\theta,将其视为一个n+1-dimension vector,对下面的函数J也是这样
Cost Function: J\left(\theta{0}, \theta{1}, \ldots, \theta{n}\right)=\frac{1}{2 m} \sum{i=1}{(i)}\right)-y{2}

Feature Scaling(特征缩放)

特征缩放,使各个特征的数值范围相近,从而使梯度下降算法更快更准确,不需要对缩放方式精确要求
从数值上看,尽量满足下面范围
推荐一种计算方法
%20Regression%0A%0AWe%20can%20improve%20our%20features%20and%20the%20form%20of%20our%20hypothesis%20function%20in%20a%20couple%20different%20ways.%0A%0A%23%23%23%20Combine%20multiple%20features%20into%20one%0A%0A!%5Bimage-20210416122420359%5D(https%3A%2F%2Fraw.githubusercontent.com%2FRainGivingU%2FPicture-warehouse%2Fmaster%2Fimg%2F20210416122420.png)%0A%0A%E6%9C%89%E6%97%B6%E6%88%91%E4%BB%AC%E4%BD%BF%E7%94%A8%E5%B7%B2%E6%9C%89%E7%89%B9%E5%BE%81%E5%BE%97%E5%88%B0%E7%9A%84%E6%96%B0%E7%89%B9%E5%BE%81%E5%8F%AF%E4%BB%A5%E6%9B%B4%E5%A5%BD%E5%9C%B0%E7%94%A8%E4%BA%8E%E9%A2%84%E6%B5%8B%0A%0Aexample%3A%0A#card=math&code=%5Cmu%7Bi%7D%24%24%E6%98%AF%E8%AF%A5%E7%89%B9%E5%BE%81%E5%B9%B3%E5%9D%87%E5%80%BC%EF%BC%8C%24%24s%7Bi%7D%24%24%E5%8F%AF%E4%BB%A5%E6%98%AF%E6%9E%81%E5%B7%AE%EF%BC%8C%E6%96%B9%E5%B7%AE%E7%AD%89%0A%0A%23%23%20Learning%20Rate%0A%0A%20If%20%24%24%5Calpha%24%24%20is%20too%20small%3A%20slow%20convergence.%20%0A%0A%20If%20%24%24%5Calpha%24%24%20is%20too%20large%3A%20%EF%BF%BCmay%20not%20decrease%20on%20every%20iteration%20and%20thus%20may%20not%20converge.%0A%0A%23%23%20Features%20and%20Polynomial%28%E5%A4%9A%E9%A1%B9%E5%BC%8F%29%20Regression%0A%0AWe%20can%20improve%20our%20features%20and%20the%20form%20of%20our%20hypothesis%20function%20in%20a%20couple%20different%20ways.%0A%0A%23%23%23%20Combine%20multiple%20features%20into%20one%0A%0A%21%5Bimage-20210416122420359%5D%28https%3A%2F%2Fraw.githubusercontent.com%2FRainGivingU%2FPicture-warehouse%2Fmaster%2Fimg%2F20210416122420.png%29%0A%0A%E6%9C%89%E6%97%B6%E6%88%91%E4%BB%AC%E4%BD%BF%E7%94%A8%E5%B7%B2%E6%9C%89%E7%89%B9%E5%BE%81%E5%BE%97%E5%88%B0%E7%9A%84%E6%96%B0%E7%89%B9%E5%BE%81%E5%8F%AF%E4%BB%A5%E6%9B%B4%E5%A5%BD%E5%9C%B0%E7%94%A8%E4%BA%8E%E9%A2%84%E6%B5%8B%0A%0A%2Aexample%3A%2A%0A)
Area = frontage \cdot depth\
x{1} = frontage\
x{2} = depth\
x = x{1} \cdot x{2}\
h{\theta} = \theta{0} + \theta_{1}x
.%0A%0AFor%20example%2C%20if%20our%20hypothesis%20function%20is%20%24%24%20h%7B%5Ctheta%7D(x)%20%3D%20%5Ctheta%7B0%7D%20%2B%20%5Ctheta%7B1%7Dx%7B1%7D#card=math&code=%0A%23%23%23%20Polynomial%20Regression%0A%0AWe%20can%20%2A%2Achange%20the%20behavior%20or%20curve%2A%2A%20of%20our%20hypothesis%20function%20by%20making%20it%20a%20quadratic%2C%20cubic%20or%20square%20root%20function%20%28or%20any%20other%20form%29.%0A%0AFor%20example%2C%20if%20our%20hypothesis%20function%20is%20%24%24%20h%7B%5Ctheta%7D%28x%29%20%3D%20%5Ctheta%7B0%7D%20%2B%20%5Ctheta%7B1%7Dx%7B1%7D)
then we can create additional features based on x{1}, to get the quadratic function h{\theta}(x) = \theta{0} + \theta{1}x{1} + \theta{2}x_{1}^{2}
or the cubic function h{\theta}(x) = \theta{0} + \theta{1}x{1} + \theta{2}x{1}^{2}+ \theta{3}x{1}^{3}
In the cubic version, we have created new features x{2} = x{1}^{2}, x{3} = x{1}^{3}
To make it a square root function, we could do: h{\theta}(x)=\theta{0}+\theta{1} x{1}+\theta{2} \sqrt{x{1}}
One important thing to keep in mind is, if you choose your features this way then feature scaling becomes very important.
eg. if x{1} has range 1 - 1000 then range of x{1}^{2} becomes 1 - 1000000 and that of be x_{1}^{3} comes 1 - 1000000000
