一、使用 Module 类来自定义模型
二、使用 Module 类来自定义网络层
- 1. 从系统预定义的层说起
- 2. 自定义层的简单例子

参考来源：
CSDN：pytorch 教程之 nn.Module 类详解——使用 Module 类来自定义模型
 CSDN：pytorch 教程之 nn.Module 类详解——使用 Module 类来自定义网络层
 知乎：【Pytorch】nn.module() 类解析及冻结特定层参数的方法

一、使用 Module 类来自定义模型

前言：pytorch 中对于一般的序列模型，直接使用 torch.nn.Sequential 类及可以实现，这点类似于keras，但是更多的时候面对复杂的模型，比如：多输入多输出、多分支模型、跨层连接模型、带有自定义层的模型等，就需要自己来定义一个模型了。本文将详细说明如何让使用 **Mudule** 类来自定义一个模型。

1. torch.nn.Module 类概述

个人理解，pytorch 不像 tensorflow 那么底层，也不像 keras 那么高层，这里先比较 keras 和 pytorch 的一些小区别。

keras 更常见的操作是通过继承 Layer 类来实现自定义层，不推荐去继承 Model 类定义模型，详细原因可以参见官方文档
pytorch 中其实一般没有特别明显的 Layer 和 Module 的区别，不管是自定义层、自定义块、自定义模型，都是通过继承 **Module** 类完成的，这一点很重要。其实 Sequential 类也是继承自 Module 类的。

注意：我们当然也可以直接通过继承 torch.autograd.Function 类来自定义一个层，但是这很不推荐，不提倡，至于为什么后面会介绍。
总结：pytorch 里面一切自定义操作基本上都是继承 nn.Module 类来实现的
这里仅仅先讨论使用 Module 来实现自定义模块，自定义层先不做讨论。

2. torch.nn.Module 类的简介

先来简单看一它的定义：

class Module(object):
    def __init__(self):
    def forward(self, *input):
    def add_module(self, name, module):
    def cuda(self, device=None):
    def cpu(self):
    def __call__(self, *input, **kwargs):
    def parameters(self, recurse=True):
    def named_parameters(self, prefix='', recurse=True):
    def children(self):
    def named_children(self):
    def modules(self):  
    def named_modules(self, memo=None, prefix=''):
    def train(self, mode=True):
    def eval(self):
    def zero_grad(self):
    def __repr__(self):
    def __dir__(self):
'''
有一部分没有完全列出来
'''

我们在定义自已的网络的时候，需要继承 nn.Module 类，并重新实现构造函数 **__init__** 和前向传播 **forward** 这两个方法。但有一些注意技巧：

一般把网络中具有可学习参数的层（如全连接层、卷积层等）放在构造函数 **__init__()** 中，当然我也可以把不具有参数的层也放在里面。
一般把不具有可学习参数的层(如 **ReLU**、**dropout**、**BatchNormanation** 层)可放在构造函数中，也可不放在构造函数中，如果不放在构造函数 __init__ 里面，则在 forward 方法里面可以使用 nn.functional 来代替。
**forward** 方法是必须要重写的，它是实现模型的功能，实现各个层之间的连接关系的核心。

下面先看一个简单的例子。

import torch
class MyNet(torch.nn.Module):
    def __init__(self):
        super(MyNet, self).__init__()  # 第一句话，调用父类的构造函数
        self.conv1 = torch.nn.Conv2d(3, 32, 3, 1, 1)
        self.relu1=torch.nn.ReLU()
        self.max_pooling1=torch.nn.MaxPool2d(2,1)
        self.conv2 = torch.nn.Conv2d(3, 32, 3, 1, 1)
        self.relu2=torch.nn.ReLU()
        self.max_pooling2=torch.nn.MaxPool2d(2,1)
        self.dense1 = torch.nn.Linear(32 * 3 * 3, 128)
        self.dense2 = torch.nn.Linear(128, 10)
    def forward(self, x):
        x = self.conv1(x)
        x = self.relu1(x)
        x = self.max_pooling1(x)
        x = self.conv2(x)
        x = self.relu2(x)
        x = self.max_pooling2(x)
        x = self.dense1(x)
        x = self.dense2(x)
        return x
model = MyNet()
print(model)
'''
运行结果为：
MyNet(
  (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (relu1): ReLU()
  (max_pooling1): MaxPool2d(kernel_size=2, stride=1, padding=0, dilation=1, ceil_mode=False)
  (conv2): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (relu2): ReLU()
  (max_pooling2): MaxPool2d(kernel_size=2, stride=1, padding=0, dilation=1, ceil_mode=False)
  (dense1): Linear(in_features=288, out_features=128, bias=True)
  (dense2): Linear(in_features=128, out_features=10, bias=True)
)
'''

注意：上面的是将所有的层都放在了构造函数 __init__ 里面，但是只是定义了一系列的层，各个层之间到底是什么连接关系并没有，而是在 forward 里面实现所有层的连接关系，当然这里依然是顺序连接的。下面再来看一下一个例子：

import torch
import torch.nn.functional as F
class MyNet(torch.nn.Module):
    def __init__(self):
        super(MyNet, self).__init__()  # 第一句话，调用父类的构造函数
        self.conv1 = torch.nn.Conv2d(3, 32, 3, 1, 1)
        self.conv2 = torch.nn.Conv2d(3, 32, 3, 1, 1)
        self.dense1 = torch.nn.Linear(32 * 3 * 3, 128)
        self.dense2 = torch.nn.Linear(128, 10)
    def forward(self, x):
        x = self.conv1(x)
        x = F.relu(x)
        x = F.max_pool2d(x)
        x = self.conv2(x)
        x = F.relu(x)
        x = F.max_pool2d(x)
        x = self.dense1(x)
        x = self.dense2(x)
        return x
model = MyNet()
print(model)
'''
运行结果为：
MyNet(
  (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (conv2): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (dense1): Linear(in_features=288, out_features=128, bias=True)
  (dense2): Linear(in_features=128, out_features=10, bias=True)
)
'''

注意：此时，将没有训练参数的层没有放在构造函数里面了，所以这些层就不会出现在 model 里面，但是运行关系是在 forward 里面通过 nn.functional 中的方法实现的。

总结：所有放在构造函数 **__init__** 里面的层的都是这个模型的“固有属性”。

3. torch.nn.Module 类的的多种实现

上面是为了一个简单的演示，但是Module类是非常灵活的，可以有很多灵活的实现方式，下面将一一介绍。

3.1 通过 Sequential 来包装层

即将几个层包装在一起作为一个大的层（块），前面的一篇文章详细介绍了 Sequential 类的使用，包括常见的三种方式，以及每一种方式的优缺点，参见语雀：torch.nn.Sequential 类。

所以这里对层的包装当然也可以通过这三种方式了。

方法一

语雀：最简单的序贯模型

import torch.nn as nn
from collections import OrderedDict
class MyNet(nn.Module):
    def __init__(self):
        super(MyNet, self).__init__()
        self.conv_block = nn.Sequential(
            nn.Conv2d(3, 32, 3, 1, 1),
            nn.ReLU(),
            nn.MaxPool2d(2))
        self.dense_block = nn.Sequential(
            nn.Linear(32 * 3 * 3, 128),
            nn.ReLU(),
            nn.Linear(128, 10)
        )
    # 在这里实现层之间的连接关系，其实就是所谓的前向传播
    def forward(self, x):
        conv_out = self.conv_block(x)
        res = conv_out.view(conv_out.size(0), -1)
        out = self.dense_block(res)
        return out
model = MyNet()
print(model)
'''
运行结果为：
MyNet(
  (conv_block): Sequential(
    (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (dense_block): Sequential(
    (0): Linear(in_features=288, out_features=128, bias=True)
    (1): ReLU()
    (2): Linear(in_features=128, out_features=10, bias=True)
  )
)
'''

同前面的文章，这里在每一个包装块里面，各个层是没有名称的，默认按照 0、1、2、3、4 来排名。

方法二

语雀：给每一个层添加名称

import torch.nn as nn
from collections import OrderedDict
class MyNet(nn.Module):
    def __init__(self):
        super(MyNet, self).__init__()
        self.conv_block = nn.Sequential(
            OrderedDict(
                [
                    ("conv1", nn.Conv2d(3, 32, 3, 1, 1)),
                    ("relu1", nn.ReLU()),
                    ("pool", nn.MaxPool2d(2))
                ]
            ))
        self.dense_block = nn.Sequential(
            OrderedDict([
                ("dense1", nn.Linear(32 * 3 * 3, 128)),
                ("relu2", nn.ReLU()),
                ("dense2", nn.Linear(128, 10))
            ])
        )
    def forward(self, x):
        conv_out = self.conv_block(x)
        res = conv_out.view(conv_out.size(0), -1)
        out = self.dense_block(res)
        return out
model = MyNet()
print(model)
'''
运行结果为：
MyNet(
  (conv_block): Sequential(
    (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (relu1): ReLU()
    (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (dense_block): Sequential(
    (dense1): Linear(in_features=288, out_features=128, bias=True)
    (relu2): ReLU()
    (dense2): Linear(in_features=128, out_features=10, bias=True)
  )
)
'''

方法三

[model.add_module(self, name, module)](https://www.yuque.com/yuque-qsztn/va7nxh/znemg4#y1Jzx)

import torch.nn as nn
from collections import OrderedDict
class MyNet(nn.Module):
    def __init__(self):
        super(MyNet, self).__init__()
        self.conv_block=torch.nn.Sequential()
        self.conv_block.add_module("conv1",torch.nn.Conv2d(3, 32, 3, 1, 1))
        self.conv_block.add_module("relu1",torch.nn.ReLU())
        self.conv_block.add_module("pool1",torch.nn.MaxPool2d(2))
        self.dense_block = torch.nn.Sequential()
        self.dense_block.add_module("dense1",torch.nn.Linear(32 * 3 * 3, 128))
        self.dense_block.add_module("relu2",torch.nn.ReLU())
        self.dense_block.add_module("dense2",torch.nn.Linear(128, 10))
    def forward(self, x):
        conv_out = self.conv_block(x)
        res = conv_out.view(conv_out.size(0), -1)
        out = self.dense_block(res)
        return out
model = MyNet()
print(model)
'''
运行结果为：
MyNet(
  (conv_block): Sequential(
    (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (relu1): ReLU()
    (pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (dense_block): Sequential(
    (dense1): Linear(in_features=288, out_features=128, bias=True)
    (relu2): ReLU()
    (dense2): Linear(in_features=128, out_features=10, bias=True)
  )
)
'''

上面的方式二和方式三，在每一个包装块里面，每个层都是有名称的。

3.2 Module 类的几个常见方法使用

特别注意：Sequential 类虽然继承自 Module 类，二者有相似部分，但是也有很多不同的部分，集中体现在：
**Sequenrial** 类实现了整数索引，故而可以使用 **model[index]** 这样的方式获取一个曾，但是 **Module** 类并没有实现整数索引，不能够通过整数索引来获得层，那该怎么办呢？它提供了几个主要的方法，如下：

def children(self):
def named_children(self):
def modules(self):
def named_modules(self, memo=None, prefix=''):
'''
注意：这几个方法返回的都是一个Iterator迭代器，故而通过for循环访问，当然也可以通过next
'''

1. `model.children()` 和 `model.named_children()` 方法

model.children() 和 model.named_children() 方法返回的是迭代器 **iterator** ；
**model.children()**：每一次迭代返回的每一个元素实际上是 **Sequential** 类型，而 Sequential 类型又可以使用下标 index 索引来获取每一个 Sequenrial 里面的具体层，比如 conv 层、dense 层等；
**model.named_children()**：每一次迭代返回的每一个元素实际上是一个元组类型，元组的第一个元素是名称，第二个元素就是对应的层或者是 Sequential 。

2. `model.children()` 方法

import torch
import torch.nn as nn
from collections import OrderedDict
class MyNet(nn.Module):
    def __init__(self):
        super(MyNet, self).__init__()
        self.conv_block = torch.nn.Sequential()
        self.conv_block.add_module("conv1", torch.nn.Conv2d(3, 32, 3, 1, 1))
        self.conv_block.add_module("relu1", torch.nn.ReLU())
        self.conv_block.add_module("pool1", torch.nn.MaxPool2d(2))
        self.dense_block = torch.nn.Sequential()
        self.dense_block.add_module("dense1", torch.nn.Linear(32 * 3 * 3, 128))
        self.dense_block.add_module("relu2", torch.nn.ReLU())
        self.dense_block.add_module("dense2", torch.nn.Linear(128, 10))
    def forward(self, x):
        conv_out = self.conv_block(x)
        res = conv_out.view(conv_out.size(0), -1)
        out = self.dense_block(res)
        return out
model = MyNet()
for i in model.children():
    print(i)
    print(type(i)) # 查看每一次迭代的元素到底是什么类型，实际上是 Sequential 类型,所以有可以使用下标 index 索引来获取每一个 Sequenrial 里面的具体层
'''
运行结果为：
Sequential(
  (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (relu1): ReLU()
  (pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
<class 'torch.nn.modules.container.Sequential'>
Sequential(
  (dense1): Linear(in_features=288, out_features=128, bias=True)
  (relu2): ReLU()
  (dense2): Linear(in_features=128, out_features=10, bias=True)
)
<class 'torch.nn.modules.container.Sequential'>
'''

3. `model.named_children()` 方法

import torch
import torch.nn as nn
from collections import OrderedDict
class MyNet(nn.Module):
    def __init__(self):
        super(MyNet, self).__init__()
        self.conv_block = torch.nn.Sequential()
        self.conv_block.add_module("conv1", torch.nn.Conv2d(3, 32, 3, 1, 1))
        self.conv_block.add_module("relu1", torch.nn.ReLU())
        self.conv_block.add_module("pool1", torch.nn.MaxPool2d(2))
        self.dense_block = torch.nn.Sequential()
        self.dense_block.add_module("dense1", torch.nn.Linear(32 * 3 * 3, 128))
        self.dense_block.add_module("relu2", torch.nn.ReLU())
        self.dense_block.add_module("dense2", torch.nn.Linear(128, 10))
    def forward(self, x):
        conv_out = self.conv_block(x)
        res = conv_out.view(conv_out.size(0), -1)
        out = self.dense_block(res)
        return out
model = MyNet()
for i in model.named_children():
    print(i)
    print(type(i)) # 查看每一次迭代的元素到底是什么类型，实际上是 返回一个 tuple,tuple 的第一个元素是名称
'''
运行结果为：
('conv_block', Sequential(
  (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (relu1): ReLU()
  (pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
))
<class 'tuple'>
('dense_block', Sequential(
  (dense1): Linear(in_features=288, out_features=128, bias=True)
  (relu2): ReLU()
  (dense2): Linear(in_features=128, out_features=10, bias=True)
))
<class 'tuple'>
'''

4. `model.modules()` 和 `model.named_modules()`

model.modules() 和 model.named_modules() 方法返回的是迭代器 iterator；
model 的 modules() 方法和 named_modules() 方法都会将整个模型的所有构成（包括包装层、单独的层、自定义层等）由浅入深依次遍历出来，只不过 modules() 返回的每一个元素是直接返回的层对象本身，而 named_modules() 返回的每一个元素是一个元组，第一个元素是名称，第二个元素才是层对象本身。
如何理解 children和 modules 之间的这种差异性。注意 pytorch 里面不管是模型、层、激活函数、损失函数都可以当成是 Module 的拓展，所以 modules 和 named_modules 会层层迭代，由浅入深，将每一个自定义块 block 、然后 block 里面的每一个层都当成是 module 来迭代。而 children 就比较直观，就表示的是所谓的“孩子”，所以没有层层迭代深入。

5. `model.modules()` 方法

import torch
import torch.nn as nn
from collections import OrderedDict
class MyNet(nn.Module):
    def __init__(self):
        super(MyNet, self).__init__()
        self.conv_block = torch.nn.Sequential()
        self.conv_block.add_module("conv1", torch.nn.Conv2d(3, 32, 3, 1, 1))
        self.conv_block.add_module("relu1", torch.nn.ReLU())
        self.conv_block.add_module("pool1", torch.nn.MaxPool2d(2))
        self.dense_block = torch.nn.Sequential()
        self.dense_block.add_module("dense1", torch.nn.Linear(32 * 3 * 3, 128))
        self.dense_block.add_module("relu2", torch.nn.ReLU())
        self.dense_block.add_module("dense2", torch.nn.Linear(128, 10))
    def forward(self, x):
        conv_out = self.conv_block(x)
        res = conv_out.view(conv_out.size(0), -1)
        out = self.dense_block(res)
        return out
model = MyNet()
for i in model.modules():
    print(i)
    print("==================================================")
'''
运行结果为：
MyNet(
  (conv_block): Sequential(
    (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (relu1): ReLU()
    (pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (dense_block): Sequential(
    (dense1): Linear(in_features=288, out_features=128, bias=True)
    (relu2): ReLU()
    (dense2): Linear(in_features=128, out_features=10, bias=True)
  )
)
==================================================
Sequential(
  (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (relu1): ReLU()
  (pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
==================================================
Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
==================================================
ReLU()
==================================================
MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
==================================================
Sequential(
  (dense1): Linear(in_features=288, out_features=128, bias=True)
  (relu2): ReLU()
  (dense2): Linear(in_features=128, out_features=10, bias=True)
)
==================================================
Linear(in_features=288, out_features=128, bias=True)
==================================================
ReLU()
==================================================
Linear(in_features=128, out_features=10, bias=True)
==================================================
'''

6. `model.named_modules()` 方法

import torch
import torch.nn as nn
from collections import OrderedDict
class MyNet(nn.Module):
    def __init__(self):
        super(MyNet, self).__init__()
        self.conv_block = torch.nn.Sequential()
        self.conv_block.add_module("conv1", torch.nn.Conv2d(3, 32, 3, 1, 1))
        self.conv_block.add_module("relu1", torch.nn.ReLU())
        self.conv_block.add_module("pool1", torch.nn.MaxPool2d(2))
        self.dense_block = torch.nn.Sequential()
        self.dense_block.add_module("dense1", torch.nn.Linear(32 * 3 * 3, 128))
        self.dense_block.add_module("relu2", torch.nn.ReLU())
        self.dense_block.add_module("dense2", torch.nn.Linear(128, 10))
    def forward(self, x):
        conv_out = self.conv_block(x)
        res = conv_out.view(conv_out.size(0), -1)
        out = self.dense_block(res)
        return out
model = MyNet()
for i in model.named_modules():
    print(i)
    print("==================================================")
'''
运行结果是：
('', MyNet(
  (conv_block): Sequential(
    (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (relu1): ReLU()
    (pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (dense_block): Sequential(
    (dense1): Linear(in_features=288, out_features=128, bias=True)
    (relu2): ReLU()
    (dense2): Linear(in_features=128, out_features=10, bias=True)
  )
))
==================================================
('conv_block', Sequential(
  (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (relu1): ReLU()
  (pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
))
==================================================
('conv_block.conv1', Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)))
==================================================
('conv_block.relu1', ReLU())
==================================================
('conv_block.pool1', MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False))
==================================================
('dense_block', Sequential(
  (dense1): Linear(in_features=288, out_features=128, bias=True)
  (relu2): ReLU()
  (dense2): Linear(in_features=128, out_features=10, bias=True)
))
==================================================
('dense_block.dense1', Linear(in_features=288, out_features=128, bias=True))
==================================================
('dense_block.relu2', ReLU())
==================================================
('dense_block.dense2', Linear(in_features=128, out_features=10, bias=True))
==================================================
'''

注意：上面这四个方法是以层包装为例来说明的，如果没有层的包装，我们依然可以使用这四个方法，其实结果也是类似的这样去推，这里就不再列出来了。

二、使用 Module 类来自定义网络层

前言：前面介绍了如何自定义一个模型——通过继承 nn.Module 类来实现，在 __init__ 构造函数中申明各个层的定义，在 forward 中实现层之间的连接关系，实际上就是前向传播的过程。
事实上，在 pytorch 里面自定义层也是通过继承自 nn.Module 类来实现的，我前面说过，**pytorch** 里面一般是没有层的概念，层也是当成一个模型来处理的，这里和 keras 是不一样的。前面介绍过，我们当然也可以直接通过继承 torch.autograd.Function 类来自定义一个层，但是这很不推荐，不提倡，至于为什么后面会介绍。记住一句话，keras 更加注重的是层 Layer、pytorch 更加注重的是模型 Module 。
所以本文就专门来介绍如何通过 **nn.Module** 类来实现自定义层。

1. 从系统预定义的层说起

1.1 Linear 层的代码

import math
import torch
from torch.nn.parameter import Parameter
from .. import functional as F
from .. import init
from .module import Module
from ..._jit_internal import weak_module, weak_script_method
class Linear(Module):
    __constants__ = ['bias']
    def __init__(self, in_features, out_features, bias=True):
        super(Linear, self).__init__()
        self.in_features = in_features
        self.out_features = out_features
        self.weight = Parameter(torch.Tensor(out_features, in_features))
        if bias:
            self.bias = Parameter(torch.Tensor(out_features))
        else:
            self.register_parameter('bias', None)
        self.reset_parameters()
    def reset_parameters(self):
        init.kaiming_uniform_(self.weight, a=math.sqrt(5))
        if self.bias is not None:
            fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
            bound = 1 / math.sqrt(fan_in)
            init.uniform_(self.bias, -bound, bound)
    @weak_script_method
    def forward(self, input):
        return F.linear(input, self.weight, self.bias)
    def extra_repr(self):
        return 'in_features={}, out_features={}, bias={}'.format(
            self.in_features, self.out_features, self.bias is not None
        )

1.2 Conv2d 类的实现

class Conv2d(_ConvNd):
    def __init__(self, in_channels, out_channels, kernel_size, stride=1,
                 padding=0, dilation=1, groups=1,
                 bias=True, padding_mode='zeros'):
        kernel_size = _pair(kernel_size)
        stride = _pair(stride)
        padding = _pair(padding)
        dilation = _pair(dilation)
        super(Conv2d, self).__init__(
            in_channels, out_channels, kernel_size, stride, padding, dilation,
            False, _pair(0), groups, bias, padding_mode)
    @weak_script_method
    def forward(self, input):
        if self.padding_mode == 'circular':
            expanded_padding = ((self.padding[1] + 1) // 2, self.padding[1] // 2,
                                (self.padding[0] + 1) // 2, self.padding[0] // 2)
            return F.conv2d(F.pad(input, expanded_padding, mode='circular'),
                            self.weight, self.bias, self.stride,
                            _pair(0), self.dilation, self.groups)
        return F.conv2d(input, self.weight, self.bias, self.stride,
                        self.padding, self.dilation, self.groups)

1.3 初步总结

我在前面的文章里面说过，torch 里面实现神经网络有两种方式

高层 **API** 方法：使用 torch.nn.**** 来实现；
低层 **API** 方法：使用低层函数方法，torch.nn.functional.**** 来实现；

其中，我们推荐使用高层 **API** 的方法，原因如下：
高层 API 是使用类的形式来包装的，既然是类就可以存储参数，比如全连接层的权值矩阵、偏置矩阵等都可以作为类的属性存储着，但是低层 API 仅仅是实现函数的运算功能，没办法保存这些信息，会丢失参数信息，但是高层 API 是依赖于低层 API 的计算函数的，比如上面的两个层：

**Linear** 高级层——>低层 **F.linear()** 函数。
**Conv2d** 高级层——>低层 **F.conv2d()** 函数。

1.4 自定义层的步骤

要实现一个自定义层大致分以下几个主要的步骤：

自定义一个类，继承自 **Module** 类，并且一定要实现两个基本的函数，第一是构造函数 __init__ ，第二个是层的逻辑运算函数，即所谓的前向计算函数 forward 函数。
在构造函数 init_ 中实现层的参数定义。比如 Linear 层的权重和偏置，Conv2d 层的 in_channels、out_channels、kernel_size、stride=1、padding=0、dilation=1、groups=1、bias=True、padding_mode='zeros' 这一系列参数；
在前向传播 **forward** 函数里面实现前向运算。这一般都是通过 torch.nn.functional.*** 函数来实现，当然很多时候我们也需要自定义自己的运算方式。如果该层含有权重，那么权重必须是 nn.Parameter 类型，关于Tensor 和 Variable（0.3版本之前）与 Parameter 的区别请参阅相关的文档。简单说就是 Parameter 默认需要求导，其他两个类型则不会。另外一般情况下，可能的话，为自己定义的新层提供默认的参数初始化，以防使用过程中忘记初始化操作。
补充：一般情况下，我们定义的参数是可以求导的，但是自定义操作如不可导，需要实现 backward 函数。

总结：这里其实和定义一个自定义模型是一样的，核心都是实现最基本的构造函数 __init__ 和前向运算函数 forward 函数，可以参考上面的语雀：一、使用 Module 类来自定义模型。

2. 自定义层的简单例子

比如我要实现一个简单的层，这个层的功能是 y=w*sqrt(x2+bias)，即输入 X 的平方再加上一个偏执项，再开根号，然后再乘以权值矩阵 w，那要怎么做呢，按照上面的定义过程，我们先定义一个这样的层（即一个类），代码如下：

2.1 定义一个自定义层 MyLayer

# 定义一个 my_layer.py
import torch
class MyLayer(torch.nn.Module):
    '''
    因为这个层实现的功能是：y=weights*sqrt(x**2+bias),所以有两个参数：
    权值矩阵weights
    偏置矩阵bias
    输入 x 的维度是（in_features,)
    输出 y 的维度是（out_features,) 故而
    bias 的维度是（in_fearures,)，注意这里为什么是in_features,而不是out_features，注意体会这里和Linear层的区别所在
    weights 的维度是（in_features, out_features）注意这里为什么是（in_features, out_features）,而不是（out_features, in_features），注意体会这里和Linear层的区别所在
    '''
    def __init__(self, in_features, out_features, bias=True):
        super(MyLayer, self).__init__()  # 和自定义模型一样，第一句话就是调用父类的构造函数
        self.in_features = in_features
        self.out_features = out_features
        self.weight = torch.nn.Parameter(torch.Tensor(in_features, out_features)) # 由于weights是可以训练的，所以使用Parameter来定义
        if bias:
            self.bias = torch.nn.Parameter(torch.Tensor(in_features))             # 由于bias是可以训练的，所以使用Parameter来定义
        else:
            self.register_parameter('bias', None)
    def forward(self, input):
        input_=torch.pow(input,2)+self.bias
        y=torch.matmul(input_,self.weight)
        return y

2.2 自定义模型并且训练

import torch
from my_layer import MyLayer # 自定义层
N, D_in, D_out = 10, 5, 3  # 一共10组样本，输入特征为5，输出特征为3 
# 先定义一个模型
class MyNet(torch.nn.Module):
    def __init__(self):
        super(MyNet, self).__init__()  # 第一句话，调用父类的构造函数
        self.mylayer1 = MyLayer(D_in,D_out)
    def forward(self, x):
        x = self.mylayer1(x)
        return x
model = MyNet()
print(model)
'''
运行结果为：
MyNet(
  (mylayer1): MyLayer()   # 这就是自己定义的一个层
)
'''

下面开始训练

# 创建输入、输出数据
x = torch.randn(N, D_in)  #（10，5）
y = torch.randn(N, D_out) #（10，3）
#定义损失函数
loss_fn = torch.nn.MSELoss(reduction='sum')
learning_rate = 1e-4
#构造一个optimizer对象
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
for t in range(10): # 
    # 第一步：数据的前向传播，计算预测值p_pred
    y_pred = model(x)
    # 第二步：计算计算预测值p_pred与真实值的误差
    loss = loss_fn(y_pred, y)
    print(f"第 {t} 个epoch, 损失是 {loss.item()}")
    # 在反向传播之前，将模型的梯度归零，这
    optimizer.zero_grad()
    # 第三步：反向传播误差
    loss.backward()
    # 直接通过梯度一步到位，更新完整个网络的训练参数
    optimizer.step()

程序的运行结果为：

第 0 个epoch, 损失是 29.241456985473633
第 1 个epoch, 损失是 29.223047256469727
第 2 个epoch, 损失是 29.20465850830078
第 3 个epoch, 损失是 29.186279296875
第 4 个epoch, 损失是 29.167924880981445
第 5 个epoch, 损失是 29.14959716796875
第 6 个epoch, 损失是 29.131284713745117
第 7 个epoch, 损失是 29.112987518310547
第 8 个epoch, 损失是 29.094717025756836
第 9 个epoch, 损失是 29.076465606689453

总结：

本文的实践说明了如何使用 **Module** 父类来拓展实现自定义模型、自定义层，我们发现二者有异曲同工之处，这也是 pytorch 如此受欢迎的原因之一了。后面还会继续讲解通过 Function 来自定义一个层。需要注意的是：
**Function** 与 **Module** 都可以对 **pytorch** 进行自定义拓展，使其满足网络的需求，但这两者还是有十分重要的不同，具体的不同后面再说。

机器学习 | 神经网络 | 深度学习

使用 torch.nn.Module 来自定义模型和网络层

一、使用 Module 类来自定义模型

1. torch.nn.Module 类概述

2. torch.nn.Module 类的简介

3. torch.nn.Module 类的的多种实现

3.1 通过 Sequential 来包装层

方法一

方法二

方法三

3.2 Module 类的几个常见方法使用

1. `model.children()` 和 `model.named_children()` 方法

2. `model.children()` 方法

3. `model.named_children()` 方法

4. `model.modules()` 和 `model.named_modules()`

5. `model.modules()` 方法

6. `model.named_modules()` 方法

二、使用 Module 类来自定义网络层

1. 从系统预定义的层说起

1.1 Linear 层的代码

1.2 Conv2d 类的实现

1.3 初步总结

1.4 自定义层的步骤

2. 自定义层的简单例子

2.1 定义一个自定义层 MyLayer

2.2 自定义模型并且训练

总结：

使用 torch.nn.Module 来自定义模型和网络层

一、使用 Module 类来自定义模型

1. torch.nn.Module 类概述

2. torch.nn.Module 类的简介

3. torch.nn.Module 类的的多种实现

3.1 通过 Sequential 来包装层

方法一

方法二

方法三

3.2 Module 类的几个常见方法使用

1. model.children() 和 model.named_children() 方法

2. model.children() 方法

3. model.named_children() 方法

4. model.modules() 和 model.named_modules()

5. model.modules() 方法

6. model.named_modules() 方法

二、使用 Module 类来自定义网络层

1. 从系统预定义的层说起

1.1 Linear 层的代码

1.2 Conv2d 类的实现

1.3 初步总结

1.4 自定义层的步骤

2. 自定义层的简单例子

2.1 定义一个自定义层 MyLayer

2.2 自定义模型并且训练

总结：

1. `model.children()` 和 `model.named_children()` 方法

2. `model.children()` 方法

3. `model.named_children()` 方法

4. `model.modules()` 和 `model.named_modules()`

5. `model.modules()` 方法

6. `model.named_modules()` 方法