1.方程的θ参数次数:d
    image.png
    High bias (underfitting): both J{train}(\Theta)_Jtrain(Θ) and J{CV}(\Theta)_JCV(Θ) will be high. Also, J{CV}(\Theta) \approx J{train}(\Theta)JCV(Θ)≈Jtrain(Θ).
    High variance (overfitting): J{train}(\Theta)_Jtrain(Θ) will be low and J{CV}(\Theta)_JCV(Θ) will be much greater than J{train}(\Theta)_Jtrain(Θ).
    image.png
    2.方程的regulation参数λ
    image.png
    image.png
    image.png
    3.方程的训练数据数量m:learning curves

    • As the training set gets larger, the error for a quadratic function increases.
    • The error value will plateau out after a certain m, or training set size.

    image.png
    Experiencing high bias:
    Low training set size: causes J{train}(\Theta)_Jtrain(Θ) to be low and J{CV}(\Theta)_JCV(Θ) to be high.
    Large training set size: causes both J{train}(\Theta)_Jtrain(Θ) and J{CV}(\Theta)_JCV(Θ) to be high with J{train}(\Theta)_Jtrain(Θ)≈J{CV}(\Theta)_JCV(Θ).
    If a learning algorithm is suffering from high bias, getting more training data will not (by itself) help much.
    image.png
    Experiencing high variance:
    Low training set size: J{train}(\Theta)_Jtrain(Θ) will be low and J{CV}(\Theta)_JCV(Θ) will be high.
    Large training set size: J{train}(\Theta)_Jtrain(Θ) increases with training set size and J{CV}(\Theta)_JCV(Θ) continues to decrease without leveling off. Also, J{train}(\Theta)_Jtrain(Θ) < J{CV}(\Theta)_JCV(Θ) but the difference between them remains significant.
    If a learning algorithm is suffering from high variance, getting more training data is likely to help.
    image.png
    4.神经网络的大小神经元数量、层数

    • A large neural network with more parameters is prone to overfitting. It is also computationally expensive. In this case you can use regularization (increase λ) to address the overfitting.

    image.png
    总结优化思路:
    image.png