Official PyTorch tutorials - Building models with PyTorch - 《DeepLearning》

torch.nn.Module
Convolutional Layers

https://pytorch.org/tutorials/beginner/introyt/modelsyt_tutorial.html

学习pytorch的使用，如何构建一个最简单的网络模型

Tips: 一般把网络中具有可学习参数的层（如全连接层、卷积层等）放在构造函数init()中; ReLU、dropout、BatchNormanation层在forward方法里面可以使用nn.functional来代替。forward方法是必须要重写的，它是实现模型的功能，实现各个层之间的连接关系的核心。
只要在nn.Module的子类中定义了forward函数，backward函数就会自动被实现(利用autograd)。

torch.nn.Module

As a simple example, here’s a very simple model with two linear layers and an activation function. We’ll create an instance of it and ask it to report on its parameters:

import torch
class TinyModel(torch.nn.Module):
    def __init__(self):
        super(TinyModel,self).__init__()
        self.linear1 = torch.nn.Linear(100,200)
        self.activation = torch.nn.ReLU()
        self.linear2 = torch.nn.Linear(200,10)
        self.softmax = torch.nn.Softmax()
    def forward(self,x):
        x = self.linear1(x)
        x = self.activation(x)
        x = self.linear2(x)
        x = self.softmax(x)
        return x
    tinymodel = TinyModel()
    print("The model:")
    print(tinymodel)

This shows the fundamental structure of a Pytorch model: there is an__init__() method that defines the layers and other components of a model, and a forward() method where the computation gets done.

Convolutional Layers

卷积层用于处理具有高度空间相关性的数据，它们在计算机视觉中非常常用

Convolutional layers are built to handle data with a high degree of spatial correlation. They are very commonly used in computer vision, where they detect close groupings of features which the compose into high-level features. They pop up in other contexts too - for example , in NLP applications, where the a word’s immediate context can affect the meaning of a sentece.

import torch.functional as F
from torch import nn
class LeNet(nn.Module):
    def __init__(self):
        super(LeNet,self).__init__()
        self.conv1 = nn.Conv2d(1,6,5)
        self.conv2 = nn.Conv2d(6,16,3)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16*6*6,120)
        self.fc2 = nn.Linear(120,84)
        self.fc3 = nn.Linear(84,10)
    def forward(self,x):
        # Max pooling over a (2,2) window
        x = F.max_pool2d(F.relu(self.conv1(x)),(2,2))
        x = F.max_pool2d(F.relu(self.conv2(x)),2)
        x = x.view(-1,self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x
    def num_flat_features(self,x):
        size = x.size()[1:] # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *=s
        return num_features

input image channel (black&white), 6 output channels, 5x5 square convolution

卷积层的第一个输入为input_channels, 我们想要学到的特征数量取决于卷积层的数量，也就是output_channels

LeNet5 is meant to take in a 1x32x32 black&white image. The first argument to a convolution layer’s constructor is the number of input channels. Here, it is 1. If we were building this model to look at 3-color channels, it would be 3.
A convolutional layer is like a window that scans over the image, looking for a pattern it recognizes. These patterns are called features, and one of the parameters of a convolution layer is the number of features we would like it to learn. This is the second argument to the constructor is the number of output features.
The third argument is the window or kernel size

The output of a convolutional layer is an activation map - a spatial representation of the presence of features in the input tensor.

conv1 will give us an output tensor of 6x28x28; 6 is the number of features, and 28 is the height and width of our map.

output of convolution —>ReLU activation function—> max pooling layer
We then pass the output of the convolution through a ReLU activation function( more in activation functions later), then through a max pooling layer.

最大池化 —> 低分辨率版本的activation map
The max pooling layer takes features near each other in the activation map and groups them together. It does this by reducing the tensor, merging every 2x2 groups of cells in the output into a single cell, and assigning that cell the maximum value of the 4 cells that went into it. This gives us a lower-resolution version of the activation map, with dimensions 61414.