NiN块

NiN即net in net，网络中的网络，它通过串联几个卷积层与全连接层构成的小网络来搭建一个深层网络。但是有一个问题，之前提到过卷积层输出的张量通常是4维的（样本，通道，高，宽），而全连接层输入与输出的数组都是二维的，那么两个网络连接的时候怎么将它们转换一下呢？这时候我们之前提到的1X1卷积层就起到作用了。
NiN - 图1
左图为传统ＣＮＮ，右图为ＮｉＮ

import torch
from torch import nn, optim
def nin_block(in_channels, out_channels, kernel_size, stride, padding):
    blk = nn.Sequential(nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding),
                        nn.ReLU(),
                        nn.Conv2d(in_channels, out_channels, 1),
                        nn.ReLU(),
                        nn.Conv2d(in_channels, out_channels, 1),
                        nn.ReLU()
                        )
    return blk

NiN模型

NiN使用了输出通道数等于标签类别数的NiN块，然后使用全局平均池化层对每个通道中所有元素求平均并直接用于分类。这里的全局平均池化层即窗口形状等于输入空间维形状的平均池化层。NiN的这个设计的好处是可以显著减小模型参数尺寸，从而缓解过拟合。

将平均池化层形状等同于输入空间维形状的操作可以使用kernel_size=x.size()[2:]实现，其实就是把前两个维度（in_channel, out_channel）去掉。

class GlobalAvgPool2d(nn.Module):
    def __init__(self):
        super(GlobalAvgPool2d, self).__init__()
    def forward(self, x):
        return f.avg_pool2d(x, kernel_size=x.size()[2:])
net = nn.Sequential(
    nin_block(1, 96, kernel_size=11, stride=4, padding=0),
    nn.MaxPool2d(kernel_size=3, stride=2),
    nin_block(96, 256, kernel_size=5, stride=1, padding=2),
    nn.MaxPool2d(kernel_size=3, stride=2),
    nin_block(256, 384, kernel_size=3, stride=1, padding=1),
    nn.MaxPool2d(kernel_size=3, stride=2),
    nn.Dropout(0.5),
    nin_block(384, 10, kernel_size=3, stride=1, padding=1),
    GlobalAvgPool2d(),
    FlattenLayer()
)

每一层的形状：

X = torch.rand(1, 1, 224, 224)
for name, blk in net.named_children():
    X = blk(X)
    print(name, 'output shape: ', X.shape)
结果：
0 output shape:  torch.Size([1, 96, 54, 54])
1 output shape:  torch.Size([1, 96, 26, 26])
2 output shape:  torch.Size([1, 256, 26, 26])
3 output shape:  torch.Size([1, 256, 12, 12])
4 output shape:  torch.Size([1, 384, 12, 12])
5 output shape:  torch.Size([1, 384, 5, 5])
6 output shape:  torch.Size([1, 384, 5, 5])
7 output shape:  torch.Size([1, 10, 5, 5])
8 output shape:  torch.Size([1, 10, 1, 1])
9 output shape:  torch.Size([1, 10])

训练

if __name__ == '__main__':
    batch_size = 128
    train_iter, test_iter = load_data_fashion_mnist(batch_size, resize=224)
    lr, num_epochs = 0.002, 5
    optimizer = torch.optim.Adam(net.parameters(), lr=lr)
    train_ch5(net, train_iter, test_iter, batch_size, optimizer, device="cuda", num_epochs=num_epochs)
结果：
training on  cuda
epoch 1, loss 1.3128, train acc 0.528, test acc 0.733, time 88.1 sec
epoch 2, loss 0.5850, train acc 0.780, test acc 0.806, time 86.3 sec
epoch 3, loss 0.4897, train acc 0.819, test acc 0.823, time 86.9 sec
epoch 4, loss 0.4441, train acc 0.836, test acc 0.844, time 86.0 sec
epoch 5, loss 0.4098, train acc 0.848, test acc 0.853, time 89.2 sec