3. 深度学习基础 - 3.9 多层感知机的从零开始实现 - 《机器学习》

有了前面的经验，多层感知机的实现过程也是类似的。只挑差异大的部分记录一下。

定义模型参数

因为用到了多层网络，所以有两组 3.9 多层感知机的从零开始实现 - 图1
隐藏层的神经元个数也是一个 超参数 ，这里跟着作者设。

# 定义模型参数
input_size, hidden_size, output_size = 784, 256, 10
elem_type = torch.float32
w1 = torch.normal(0, 0.01, (input_size, hidden_size), dtype=elem_type, requires_grad=True)
b1 = torch.zeros(hidden_size, dtype=elem_type, requires_grad=True)
w2 = torch.normal(0, 0.01, (hidden_size, output_size), dtype=elem_type, requires_grad=True)
b2 = torch.zeros(output_size, dtype=elem_type, requires_grad=True)
params = [w1, b1, w2, b2]

定义激活函数

先不用自带的，自己简单实现一下。

# 定义激活函数
def relu(x):
    """
    对 x 进行 relu 运算
    Args:
        x: 原始数值
    Returns:
        无
    Raises:
        无
    """
    return torch.max(input=x, other=torch.tensor(0.0))

定义模型

多层网络，数值的传递过程也稍稍麻烦一点

# 定义模型
def net(x):
    """
    对 x 进行 relu 运算
    Args:
        x: 原始数值
    Returns:
        无
    Raises:
        无
    """
    input_layer = x.view((-1, input_size))
    hidden_layer = relu(torch.matmul(input_layer, w1) + b1)
    output_layer = torch.matmul(hidden_layer, w2) + b2
    return output_layer

学习率

一开始我自己定了 0.1 的学习率，然后发现正确率极低。然后发现作者备注说涉及到数值计算时求和还是求平均的选择，因为我这里用了PyTorch的损失函数和作者提供的优化器，有点不伦不类的，把 lr 改大些就正常了。3.9 多层感知机的从零开始实现.py