Neural Nets 神经网络 - Multi-Layer Feed-Forward Neural Network - 《机器学习》

Structure of Multilayer neural network
Defining a network topology

Structure of Multilayer neural network

The inputs to the network correspond to the attributes measured for each training tuple
Inputs are fed simultaneously into the units making up the input layer
They are then weighted and fed simultaneously to a hidden layer
- The number of hidden layers is arbitrary (1 hidden layer in the example above).
- The number of hidden units is arbitrary (3 hidden units in the example above).
The weighted outputs of the last hidden layer are input to units making up the output layer, which emits the network’s prediction
The network is feed-forward: none of the weights cycles back to an input unit or to an output unit of a previous layer 网络是前馈的:没有一个权值循环回上一层的输入单位或输出单位
From a statistical point of view, networks perform nonlinear regression: given enough hidden units and enough training samples, they can closely approximate any function (output) 从统计学的角度来看，网络执行非线性回归:给定足够的隐藏单元和足够的训练样本，它们可以接近任何函数(输出)

Defining a network topology

Decide the network topology
- Specify the number of units in the input layer: usually _one input unit per attribute _in the data (but nominal attributes can have one input per value). 指定输入层中的单元数:通常数据中的每个属性都有一个输入单元(但标称属性可以每个值有一个输入)。
- the number of hidden layers (at least one, and commonly only one)
  - the number of units in each hidden layer
- and the number of units in the output layer
Output, one unit per response variable. In a typical binary classification problem only one output unit is used and a threshold is applied to the output value to select the class label. The value can be interpreted as a probability of belonging to the positive class, and in this way a neural network can be used as a probablistic model. For classification of more than two classes, one output unit per class is used and the highest-valued class is selected for the label.在典型的二值分类问题中，只使用一个输出单元，并对输出值应用一个阈值来选择类标签。这个值可以解释为属于正类的概率，这样神经网络就可以作为一个概率模型。对于多于两个类的分类，每个类使用一个输出单元，并为标签选择值最高的类。
Choose an activation function for each hidden and output unit (explained later) 为每个隐藏和输出单元选择一个激活函数
Usually randomly, determine initial values for the weights. 通常是随机的，确定权重的初始值。
Once a network has been trained, if its accuracy is unacceptable, try a different network topology or a different set of initial weights or maybe different activation functions. Use trial-and-error or there are methods that systematically search for a high-performing topology. 一旦一个网络被训练过，如果它的准确性是不可接受的，那么尝试不同的网络拓扑或不同的初始权值集，或者可能是不同的激活函数。使用试错法，或者有一些系统地搜索高性能拓扑的方法。