Backpropagation 向后传播

The backpropagation algorithm, based on the gradient descent algorithm, is designed to train the weights and biases of a neural network. 基于梯度下降算法训练神经网络中的权值和偏差。

How does backpropagation work?
For training, each training tuple,is first normalised to [0.0 ~ 1.0].
The error is
propagated backward by updating the weights and biases** to reflect the error of the network’s prediction for the dependent variable for a training tuple. 通过向后传播误差,更新权值和偏差,来反映神经网络对训练集对预测误差。

  • For unit Training a Neural Network - 图1 in the output layer, its error Training a Neural Network - 图2 is computed by
    • Training a Neural Network - 图3,
    • where Training a Neural Network - 图4 is the actual output of unit Training a Neural Network - 图5, and Training a Neural Network - 图6 is the known target value of the given training tuple.
  • To compute the error of a hidden layer unit Training a Neural Network - 图7, the weighted sum of the errors of the units connected to unit Training a Neural Network - 图8 in the next layer are considered.
    • Training a Neural Network - 图9
    • where Training a Neural Network - 图10 is the weight of the connection from unit Training a Neural Network - 图11 to a unit Training a Neural Network - 图12 in the next higher layer.
  • The weights and biases are updated to reflect the propagated errors. 权值和偏差被更新以反映传播的误差。
    • Each weight Training a Neural Network - 图13 in the network is updated by the following equations, where Training a Neural Network - 图14 is the change in weight Training a Neural Network - 图15:
      • Training a Neural Network - 图16
      • Training a Neural Network - 图17
    • Each bias Training a Neural Network - 图18 in the network is updated by the following equations, where Training a Neural Network - 图19 is the change in bias Training a Neural Network - 图20:
      • Training a Neural Network - 图21
      • Training a Neural Network - 图22

Learning rate: The parameter Training a Neural Network - 图23 is the learning rate, typically, a value between 0.0 and 1.0. If the learning rate is too small, then learning (ie reduction in the Error) will occur at a very slow pace. If the learning rate is too large, then oscillation between inadequate solutions may occur. A rule of thumb is to set the learning rate to Training a Neural Network - 图24, where Training a Neural Network - 图25 is the number of iterations through the training set so far. In principle the network should _converge _towards a minimal Error, but if this is not happening, try reducing this parameter value and retrain.
Updating schedules

  • Case updating: The backpropagation algorithm given here updates the weights and biases immediately after the presentation of each tuple.
  • Epoch updating: Alternatively, the weight and bias increments could be accumulated in variables, so that the weights and biases are updated after all the tuples in the training set have been presented.
  • Either way, we keep updating until we have a satisfactory error; presenting each tuple of the training set many times over
  • In theory, the mathematical derivation of backpropagation employs epoch updating, yet in practice, case updating is more common because it tends to yield more accurate results.

image.png
> Backpropagation algorithm from the textbook
How do you decide to terminate training?

  • When your weights and biases are changing very little , ie. all the Training a Neural Network - 图27s are small; or 权值和偏差改变较小
  • Accuracy on the training set is good enough; or 准确度达标
  • A prespecified number of epochs have passed. 许多预先规定的训练时段已经过去了。