Multiple Features
Gradient descent for multiple variables
Computing Parameters Analytically
- Normal Equation(正规方程；标准方程)
- Normal Equation Noninvertibility

Multiple Features

Multiple variables = multiple features
feature vector
Parameters

Gradient descent for multiple variables

cost function

Gradient descent

When n = 1:

When n>= 1:

算法过程

Feature Scaling 特征缩放

Reason: The contours of the cost function J(θ) can take on this very very skewed elliptical shape and it takes _a long time _to run gradient descents before it can reach global minimum.

归一化：１）把数据变成(０，１)或者（1,1）之间的小数。

Normalization

归一化：

把数据变成 (0,１)或者（1,1）之间的小数。
把有量纲表达式变成无量纲表达式，便于不同单位或量级的指标能够进行比较和加权。归一化是一种简化计算的方式，即将有量纲的表达式，经过变换，化为无量纲的表达式，成为纯量。

无量纲：通过某种方法能去掉实际过程中的单位，从而简化计算。

标准化：在机器学习中，我们可能要处理不同种类的资料，例如，音讯和图片上的像素值，这些资料可能是高维度的，资料标准化后会使每个特征中的数值平均变为0(将每个特征的值都减掉原始资料中该特征的平均)、标准差变为1，这个方法被广泛的使用在许多机器学习算法中(例如：支持向量机、逻辑回归和类神经网络)。

中心化：平均值为0，对标准差无要求

归一化和标准化的区别：

归一化是将样本的特征值转换到同一量纲下把数据映射到[0,1]或者[-1, 1]区间内，仅由变量的极值决定，因区间放缩法是归一化的一种。
标准化是依照特征矩阵的列处理数据，其通过求z-score的方法，转换为标准正态分布，和整体样本分布相关，每个样本点都能对标准化产生影响。它们的相同点在于都能取消由于量纲不同引起的误差；都是一种线性变换，都是对向量X按照比例压缩再进行平移。

标准化和中心化的区别：

标准化是原始分数减去平均数然后除以标准差；
中心化是原始分数减去平均数。所以一般流程为先中心化再标准化。

为什么要归一化/标准化？
如前文所说，归一化/标准化实质是一种线性变换，线性变换有很多良好的性质，这些性质决定了对数据改变后不会造成“失效”，反而能提高数据的表现，这些性质是归一化/标准化的前提。

fit(),transform()和fit_transform()

Learning Rate

Debugging gradient descent. Make a plot with number of iterations on the x-axis. Now plot the cost function, J(θ) over the number of iterations of gradient descent. If J(θ) ever increases, then you probably need to decrease α.

Automatic convergence test. Declare convergence if J(θ) decreases by less than E in one iteration, where E is some small value such as WK2 Linear Regression with Multiple Variables - 图12 . However in practice it’s difficult to choose this threshold value.

It has been proven that if learning rate α is sufficiently small, then J(θ) will decrease on every iteration.

Features and Polynomial Regression(多项式回归)

We can improve our features and the form of our hypothesis function in a couple different ways.

We can combine multiple features into one. For example, we can combine x1 and x2 into a new feature x3 by taking x1⋅x2.

We can change the behavior or curve of our hypothesis function by making it a quadratic（二次）, cubic（三次） or square root（平方根） function (or any other form).

Notice: 由于特征指数项不一，导致数值差距很大，注意特征归一化

Computing Parameters Analytically

Normal Equation(正规方程；标准方程)

The normal equation would give us a method to solve for θ analytically

WK2 Linear Regression with Multiple Variables