Pytorch：如何以及何时使用Module，Sequential，ModuleList和ModuleDict - 《人工智能文档池》

模块：主要构建块
顺序：堆叠和合并图层
动态顺序：一次创建多个图层
ModuleList：我们需要迭代的时候
ModuleDict：我们需要选择的时候
最终实施
结论

在Pytorch 4.1上更新
你可以在这里找到代码
Pytorch是一个开源深度学习框架，提供了创建ML模型的智能方法。即使文档制作完好，我仍然发现大多数人仍然能够编写错误且没有组织的PyTorch代码。
今天，我们将看到如何使用PyTorch的三个主要构建块：Module, Sequential and ModuleList。我们将从一个例子开始，迭代地我们将使它变得更好。
所有这四个类都包含在内 torch.nn

import torch.nn as nn
# nn.Module
# nn.Sequential
# nn.Module

模块：主要构建块

该模块是主构建块，它定义的基类为所有神经网络和你必须继承它。
让我们创建一个经典的CNN分类器作为示例：

import torch.nn.functional as F
class MyCNNClassifier(nn.Module):
    def __init__(self, in_c, n_classes):
        super().__init__()
        self.conv1 = nn.Conv2d(in_c, 32, kernel_size=3, stride=1, padding=1)
        self.bn1 = nn.BatchNorm2d(32)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1)
        self.bn2 = nn.BatchNorm2d(32)
        self.fc1 = nn.Linear(32 * 28 * 28, 1024)
        self.fc2 = nn.Linear(1024, n_classes)
    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = F.relu(x)
        x = self.conv2(x)
        x = self.bn2(x)
        x = F.relu(x)
        x = x.view(x.size(0), -1) # flat
        x = self.fc1(x)
        x = F.sigmoid(x)
        x = self.fc2(x)
        return x

model = MyCNNClassifier(1, 10)
print(model)

MyCNNClassifier( (conv1): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (bn2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (fc1): Linear(in_features=25088, out_features=1024, bias=True) (fc2): Linear(in_features=1024, out_features=10, bias=True) )

这是一个非常简单的分类器，其编码部分使用两个层，其中3x3 convs + batchnorm + relu，以及具有两个线性层的解码部分。如果你不熟悉PyTorch，你可能以前见过这种类型的编码，但有两个问题。
如果我们想要添加一个图层，我们必须__init__在forward函数中再次编写大量代码。另外，如果我们想要在另一个模型中使用一些常见块，例如3x3 conv + batchnorm + relu，我们必须再次编写它。

顺序：堆叠和合并图层

Sequential是一个模块容器，可以堆叠在一起并同时运行。
您可以注意到我们必须存储到self所有内容中。我们可以Sequential用来改进我们的代码。

class MyCNNClassifier(nn.Module):
    def __init__(self, in_c, n_classes):
        super().__init__()
        self.conv_block1 = nn.Sequential(
            nn.Conv2d(in_c, 32, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(32),
            nn.ReLU()
        )
        self.conv_block2 = nn.Sequential(
            nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU()
        )
        self.decoder = nn.Sequential(
            nn.Linear(32 * 28 * 28, 1024),
            nn.Sigmoid(),
            nn.Linear(1024, n_classes)
        )
    def forward(self, x):
        x = self.conv_block1(x)
        x = self.conv_block2(x)
        x = x.view(x.size(0), -1) # flat
        x = self.decoder(x)
        return x

model = MyCNNClassifier(1, 10)
print(model)

MyCNNClassifier( (conv_block1): Sequential( (0): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU() ) (conv_block2): Sequential( (0): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU() ) (decoder): Sequential( (0): Linear(in_features=25088, out_features=1024, bias=True) (1): Sigmoid() (2): Linear(in_features=1024, out_features=10, bias=True) ) )

好多了？
你注意到了conv_block1，conv_block2看起来几乎一样吗？我们可以创建一个函数来重新生成nn.Sequential甚至简化代码！

def conv_block(in_f, out_f, *args, **kwargs):
    return nn.Sequential(
        nn.Conv2d(in_f, out_f, *args, **kwargs),
        nn.BatchNorm2d(out_f),
        nn.ReLU()
    )

然后我们可以在我们的模块中调用此函数

class MyCNNClassifier(nn.Module):
    def __init__(self, in_c, n_classes):
        super().__init__()
        self.conv_block1 = conv_block(in_c, 32, kernel_size=3, padding=1)
        self.conv_block2 = conv_block(32, 64, kernel_size=3, padding=1)
        self.decoder = nn.Sequential(
            nn.Linear(32 * 28 * 28, 1024),
            nn.Sigmoid(),
            nn.Linear(1024, n_classes)
        )
    def forward(self, x):
        x = self.conv_block1(x)
        x = self.conv_block2(x)
        x = x.view(x.size(0), -1) # flat
        x = self.decoder(x)
        return x

model = MyCNNClassifier(1, 10)
print(model)

MyCNNClassifier( (conv_block1): Sequential( (0): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU() ) (conv_block2): Sequential( (0): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU() ) (decoder): Sequential( (0): Linear(in_features=25088, out_features=1024, bias=True) (1): Sigmoid() (2): Linear(in_features=1024, out_features=10, bias=True) ) )

更干净！仍然conv_block1和conv_block2几乎一样！我们可以使用合并它们nn.Sequential

class MyCNNClassifier(nn.Module):
    def __init__(self, in_c, n_classes):
        super().__init__()
        self.encoder = nn.Sequential(
            conv_block(in_c, 32, kernel_size=3, padding=1),
            conv_block(32, 64, kernel_size=3, padding=1)
        )
        self.decoder = nn.Sequential(
            nn.Linear(32 * 28 * 28, 1024),
            nn.Sigmoid(),
            nn.Linear(1024, n_classes)
        )
    def forward(self, x):
        x = self.encoder(x)
        x = x.view(x.size(0), -1) # flat
        x = self.decoder(x)
        return x

model = MyCNNClassifier(1, 10)
print(model)

MyCNNClassifier( (encoder): Sequential( (0): Sequential( (0): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU() ) (1): Sequential( (0): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU() ) ) (decoder): Sequential( (0): Linear(in_features=25088, out_features=1024, bias=True) (1): Sigmoid() (2): Linear(in_features=1024, out_features=10, bias=True) ) )

self.encoder现在举行摊位conv_block。我们为模型分离了逻辑，使其更易于阅读和重用。我们的conv_block功能可以导入并用于其他模型。

动态顺序：一次创建多个图层

如果我们可以添加新的图层self.encoder，硬编码它们会不方便：

self.encoder = nn.Sequential(
            conv_block(in_c, 32, kernel_size=3, padding=1),
            conv_block(32, 64, kernel_size=3, padding=1),
            conv_block(64, 128, kernel_size=3, padding=1),
            conv_block(128, 256, kernel_size=3, padding=1),
        )

如果我们可以将大小定义为数组并自动创建所有层而不编写每个层，那会不会很好？幸运的是，我们可以创建一个数组并将其传递给Sequential

class MyCNNClassifier(nn.Module):
    def __init__(self, in_c, n_classes):
        super().__init__()
        self.enc_sizes = [in_c, 32, 64]
        conv_blocks = [conv_block(in_f, out_f, kernel_size=3, padding=1) 
                       for in_f, out_f in zip(self.enc_sizes, self.enc_sizes[1:])]
        self.encoder = nn.Sequential(*conv_blocks)
        self.decoder = nn.Sequential(
            nn.Linear(32 * 28 * 28, 1024),
            nn.Sigmoid(),
            nn.Linear(1024, n_classes)
        )
    def forward(self, x):
        x = self.encoder(x)
        x = x.view(x.size(0), -1) # flat
        x = self.decoder(x)
        return x

model = MyCNNClassifier(1, 10)
print(model)

MyCNNClassifier( (encoder): Sequential( (0): Sequential( (0): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU() ) (1): Sequential( (0): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU() ) ) (decoder): Sequential( (0): Linear(in_features=25088, out_features=1024, bias=True) (1): Sigmoid() (2): Linear(in_features=1024, out_features=10, bias=True) ) )

让我们分解吧。我们创建了一个self.enc_sizes包含编码器大小的数组。然后我们conv_blocks通过迭代大小来创建一个数组。由于我们必须为每一层提供一个尺寸和大尺寸的展位，我们zip通过将它移动一个来自行调整尺寸。
为了清楚起见，请看下面的例子：

sizes = [1, 32, 64]
for in_f,out_f in zip(sizes, sizes[1:]):
    print(in_f,out_f)

1 32 32 64

然后，由于Sequential不接受列表，我们使用*运算符对其进行分解。
田田！现在，如果我们只想添加大小，我们可以轻松地向列表中添加新数字。将尺寸设为参数是常见做法。

class MyCNNClassifier(nn.Module):
    def __init__(self, in_c, enc_sizes, n_classes):
        super().__init__()
        self.enc_sizes = [in_c, *enc_sizes]
        conv_blokcs = [conv_block(in_f, out_f, kernel_size=3, padding=1) 
                       for in_f, out_f in zip(self.enc_sizes, self.enc_sizes[1:])]
        self.encoder = nn.Sequential(*conv_blokcs)
        self.decoder = nn.Sequential(
            nn.Linear(32 * 28 * 28, 1024),
            nn.Sigmoid(),
            nn.Linear(1024, n_classes)
        )
    def forward(self, x):
        x = self.encoder(x)
        x = x.view(x.size(0), -1) # flat
        x = self.decoder(x)
        return x

model = MyCNNClassifier(1, [32,64, 128], 10)
print(model)

MyCNNClassifier( (encoder): Sequential( (0): Sequential( (0): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU() ) (1): Sequential( (0): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU() ) (2): Sequential( (0): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU() ) ) (decoder): Sequential( (0): Linear(in_features=25088, out_features=1024, bias=True) (1): Sigmoid() (2): Linear(in_features=1024, out_features=10, bias=True) ) )

我们可以为解码器部分做同样的事情

def dec_block(in_f, out_f):
    return nn.Sequential(
        nn.Linear(in_f, out_f),
        nn.Sigmoid()
    )
class MyCNNClassifier(nn.Module):
    def __init__(self, in_c, enc_sizes, dec_sizes,  n_classes):
        super().__init__()
        self.enc_sizes = [in_c, *enc_sizes]
        self.dec_sizes = [32 * 28 * 28, *dec_sizes]
        conv_blokcs = [conv_block(in_f, out_f, kernel_size=3, padding=1) 
                       for in_f, out_f in zip(self.enc_sizes, self.enc_sizes[1:])]
        self.encoder = nn.Sequential(*conv_blokcs)
        dec_blocks = [dec_block(in_f, out_f) 
                       for in_f, out_f in zip(self.dec_sizes, self.dec_sizes[1:])]
        self.decoder = nn.Sequential(*dec_blocks)
        self.last = nn.Linear(self.dec_sizes[-1], n_classes)
    def forward(self, x):
        x = self.encoder(x)
        x = x.view(x.size(0), -1) # flat
        x = self.decoder(x)
        return x

model = MyCNNClassifier(1, [32,64], [1024, 512], 10)
print(model)

MyCNNClassifier( (encoder): Sequential( (0): Sequential( (0): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU() ) (1): Sequential( (0): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU() ) ) (decoder): Sequential( (0): Sequential( (0): Linear(in_features=25088, out_features=1024, bias=True) (1): Sigmoid() ) (1): Sequential( (0): Linear(in_features=1024, out_features=512, bias=True) (1): Sigmoid() ) ) (last): Linear(in_features=512, out_features=10, bias=True) )

我们遵循相同的模式，我们为解码部分创建一个新块，线性+ sigmoid，然后我们传递一个大小的数组。我们不得不添加一个，self.last因为我们不想激活输出
现在，我们甚至可以将我们的模型分解为两个！编码器+解码器

class MyEncoder(nn.Module):
    def __init__(self, enc_sizes):
        super().__init__()
        self.conv_blokcs = nn.Sequential(*[conv_block(in_f, out_f, kernel_size=3, padding=1) 
                       for in_f, out_f in zip(enc_sizes, enc_sizes[1:])])
        def forward(self, x):
            return self.conv_blokcs(x)
class MyDecoder(nn.Module):
    def __init__(self, dec_sizes, n_classes):
        super().__init__()
        self.dec_blocks = nn.Sequential(*[dec_block(in_f, out_f) 
                       for in_f, out_f in zip(dec_sizes, dec_sizes[1:])])
        self.last = nn.Linear(dec_sizes[-1], n_classes)
    def forward(self, x):
        return self.dec_blocks()
class MyCNNClassifier(nn.Module):
    def __init__(self, in_c, enc_sizes, dec_sizes,  n_classes):
        super().__init__()
        self.enc_sizes = [in_c, *enc_sizes]
        self.dec_sizes = [32 * 28 * 28, *dec_sizes]
        self.encoder = MyEncoder(self.enc_sizes)
        self.decoder = MyDecoder(dec_sizes, n_classes)
    def forward(self, x):
        x = self.encoder(x)
        x = x.flatten(1) # flat
        x = self.decoder(x)
        return x

model = MyCNNClassifier(1, [32,64], [1024, 512], 10)
print(model)

MyCNNClassifier( (encoder): MyEncoder( (conv_blokcs): Sequential( (0): Sequential( (0): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU() ) (1): Sequential( (0): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU() ) ) ) (decoder): MyDecoder( (dec_blocks): Sequential( (0): Sequential( (0): Linear(in_features=1024, out_features=512, bias=True) (1): Sigmoid() ) ) (last): Linear(in_features=512, out_features=10, bias=True) ) )

请注意，MyEncoder并MyDecoder也有可能是返回功能nn.Sequential。我更喜欢使用第一个模型用于模型，第二个模式用于构建块。
通过将我们的模块扩展到子模块，可以更轻松地共享代码，调试代码并对其进行测试。

ModuleList：我们需要迭代的时候

ModuleList允许您存储Module为列表。当您需要遍历图层并存储/使用某些信息（如U-net）时，它非常有用。
之间的主要区别Sequential是，ModuleList没有一个forward方法，以便内层没有连接。假设我们需要解码器中每层的每个输出，我们可以通过以下方式存储它：

class MyModule(nn.Module):
    def __init__(self, sizes):
        super().__init__()
        self.layers = nn.ModuleList([nn.Linear(in_f, out_f) for in_f, out_f in zip(sizes, sizes[1:])])
        self.trace = []
    def forward(self,x):
        for layer in self.layers:
            x = layer(x)
            self.trace.append(x)
        return x

model = MyModule([1, 16, 32])
import torch
model(torch.rand((4,1)))
[print(trace.shape) for trace in model.trace]

torch.Size([4, 16]) torch.Size([4, 32]) [None, None]

ModuleDict：我们需要选择的时候

如果我们想切换到LearkyRelu我们的conv_block怎么办？我们可以ModuleDict用来创建字典Module并Module在需要时动态切换

def conv_block(in_f, out_f, activation='relu', *args, **kwargs):
    activations = nn.ModuleDict([
                ['lrelu', nn.LeakyReLU()],
                ['relu', nn.ReLU()]
    ])
    return nn.Sequential(
        nn.Conv2d(in_f, out_f, *args, **kwargs),
        nn.BatchNorm2d(out_f),
        activations[activation]
    )

print(conv_block(1, 32,'lrelu', kernel_size=3, padding=1))
print(conv_block(1, 32,'relu', kernel_size=3, padding=1))

Sequential( (0): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): LeakyReLU(negative_slope=0.01) ) Sequential( (0): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU() )

最终实施

让我们把它包起来！

def conv_block(in_f, out_f, activation='relu', *args, **kwargs):
    activations = nn.ModuleDict([
                ['lrelu', nn.LeakyReLU()],
                ['relu', nn.ReLU()]
    ])
    return nn.Sequential(
        nn.Conv2d(in_f, out_f, *args, **kwargs),
        nn.BatchNorm2d(out_f),
        activations[activation]
    )
def dec_block(in_f, out_f):
    return nn.Sequential(
        nn.Linear(in_f, out_f),
        nn.Sigmoid()
    )
class MyEncoder(nn.Module):
    def __init__(self, enc_sizes, *args, **kwargs):
        super().__init__()
        self.conv_blokcs = nn.Sequential(*[conv_block(in_f, out_f, kernel_size=3, padding=1, *args, **kwargs) 
                       for in_f, out_f in zip(enc_sizes, enc_sizes[1:])])
        def forward(self, x):
            return self.conv_blokcs(x)
class MyDecoder(nn.Module):
    def __init__(self, dec_sizes, n_classes):
        super().__init__()
        self.dec_blocks = nn.Sequential(*[dec_block(in_f, out_f) 
                       for in_f, out_f in zip(dec_sizes, dec_sizes[1:])])
        self.last = nn.Linear(dec_sizes[-1], n_classes)
    def forward(self, x):
        return self.dec_blocks()
class MyCNNClassifier(nn.Module):
    def __init__(self, in_c, enc_sizes, dec_sizes,  n_classes, activation='relu'):
        super().__init__()
        self.enc_sizes = [in_c, *enc_sizes]
        self.dec_sizes = [32 * 28 * 28, *dec_sizes]
        self.encoder = MyEncoder(self.enc_sizes, activation=activation)
        self.decoder = MyDecoder(dec_sizes, n_classes)
    def forward(self, x):
        x = self.encoder(x)
        x = x.flatten(1) # flat
        x = self.decoder(x)
        return x

model = MyCNNClassifier(1, [32,64], [1024, 512], 10, activation='lrelu')
print(model)

MyCNNClassifier( (encoder): MyEncoder( (conv_blokcs): Sequential( (0): Sequential( (0): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): LeakyReLU(negative_slope=0.01) ) (1): Sequential( (0): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): LeakyReLU(negative_slope=0.01) ) ) ) (decoder): MyDecoder( (dec_blocks): Sequential( (0): Sequential( (0): Linear(in_features=1024, out_features=512, bias=True) (1): Sigmoid() ) ) (last): Linear(in_features=512, out_features=10, bias=True) ) )

结论

你可以在这里找到代码
所以，总结一下。

使用Module时，您有多个更小的块的大块撰写
Sequential想要从图层创建小块时使用
使用ModuleList时，你需要通过一些层或积木迭代，并做一些事情
ModuleDict当您需要参数化模型的某些块时使用，例如激活功能

这就是所有人！
谢谢你的阅读