transformer结构图:

Transformer代码讲解 - 图1

一些函数中常用参数图解:

Transformer代码讲解 - 图2

1.主函数

  1. if __name__ == '__main__':
  2. # ------------------句子的输入部分--------------------
  3. sentences = ['ich mochte ein bier P', 'S i want a beer', 'i want a beer E']
  4. # ---------------------配置文件-----------------------
  5. # Transformer Parameters. Padding Should be Zero
  6. ## 构建词表
  7. # 编码端词表
  8. src_vocab = {'P': 0, 'ich': 1, 'mochte': 2, 'ein': 3, 'bier': 4}
  9. src_vocab_size = len(src_vocab)
  10. # 解码端词表
  11. tgt_vocab = {'P': 0, 'i': 1, 'want': 2, 'a': 3, 'beer': 4, 'S': 5, 'E': 6}
  12. tgt_vocab_size = len(tgt_vocab)
  13. src_len = 5 # length of source 编码端输入长度
  14. tgt_len = 5 # length of target 解码端输入长度
  15. ## 模型参数
  16. d_model = 512 # Embedding Size 每个字符转换为embedding时的大小
  17. d_ff = 2048 # FeedForward dimension 前馈神经网络中,Leaner层次映射到多少维度
  18. d_k = d_v = 64 # dimension of K(=Q), V
  19. n_layers = 6 # number of Encoder of Decoder Layer 解码端有6个encode堆叠在一起
  20. n_heads = 8 # number of heads in Multi-Head Attention 多头注意力机制的头分为8个
  21. # ------------------模型部分--------------------------
  22. model = Transformer()
  23. criterion = nn.CrossEntropyLoss()
  24. optimizer = optim.Adam(model.parameters(), lr=0.001)
  25. enc_inputs, dec_inputs, target_batch = make_batch(sentences)
  26. for epoch in range(20):
  27. optimizer.zero_grad()
  28. outputs, enc_self_attns, dec_self_attns, dec_enc_attns = model(enc_inputs, dec_inputs)
  29. loss = criterion(outputs, target_batch.contiguous().view(-1))
  30. print('Epoch:', '%04d' % (epoch + 1), 'cost =', '{:.6f}'.format(loss))
  31. loss.backward()
  32. optimizer.step()

2.句子的输入部分

  1. sentences = ['ich mochte ein bier P', 'S i want a beer', 'i want a beer E']

德语ich mochte ein bier P作为编码端的输入,英语S i want a beer作为解码端的输入,i want a beer E是解码端的真实标签。i want a beer E和解码端的输出做损失。

句子中的S是一个句子的开始标志,句子中的E是一个句子结束的标志。

句子中的P是用于句子补齐的特殊符号:

因为输入网络中的每个句子长度是不同的,在输入网络中时会设定每个句子的最大长度max_length,如果句子长度小于max_length,就用P补齐;如果句子长度大于max_length,就截断超出的部分。如图所示:

Transformer代码讲解 - 图3

3.配置文件部分

其中构建词表是为了将语言字符映射为数字,以便更好地被计算机识别。

构建编码端词表:

  1. src_vocab = {'P': 0, 'ich': 1, 'mochte': 2, 'ein': 3, 'bier': 4}
  2. src_vocab_size = len(src_vocab)

构建解码端词表:

  1. tgt_vocab = {'P': 0, 'i': 1, 'want': 2, 'a': 3, 'beer': 4, 'S': 5, 'E': 6}
  2. tgt_vocab_size = len(tgt_vocab)

其他讲解写在了上面的代码中。

4.Transformer模型的定义

调用Transformer模型:

  1. model = Transformer()

Transformer模型的定义:

  1. ## 1. 从整体网路结构来看,分为三个部分:编码层,解码层,输出层
  2. class Transformer(nn.Module):
  3. def __init__(self):
  4. super(Transformer, self).__init__()
  5. self.encoder = Encoder() ## 编码层
  6. self.decoder = Decoder() ## 解码层
  7. self.projection = nn.Linear(d_model, tgt_vocab_size, bias=False) ## 输出层
  8. def forward(self, enc_inputs, dec_inputs):
  9. # enc_inputs 形状为[batch_size, src_len],编码段输入
  10. # dec_inputs,形状为[batch_size, tgt_len],解码端输入
  11. # enc_outputs是主要的输出
  12. # enc_self_attns是QK转置相乘之后softmax之后的矩阵值,代表的是每个单词和其他单词的相关性。
  13. enc_outputs, enc_self_attns = self.encoder(enc_inputs)
  14. # dec_outputs 是decoder主要输出,用于后续的linear映射;
  15. # dec_self_attns 是查看每个单词对decoder中输入的其余单词的相关性;
  16. # dec_enc_attns是decoder中每个单词对encoder中每个单词的相关性;
  17. dec_outputs, dec_self_attns, dec_enc_attns = self.decoder(dec_inputs, enc_inputs, enc_outputs)
  18. # dec_outputs做映射到词表大小
  19. # dec_logits : [batch_size x src_vocab_size x tgt_vocab_size]
  20. dec_logits = self.projection(dec_outputs)
  21. return dec_logits.view(-1, dec_logits.size(-1)), enc_self_attns, dec_self_attns, dec_enc_attns

4.1.输出层projection

def __init__(self)中的projection层:

 self.projection = nn.Linear(d_model, tgt_vocab_size, bias=False) ## 输出层

其中 d_model 是解码层每个token输出的维度大小,之后会做一个 tgt_vocab_size 大小的softmax。

解码层的输出层如果输出的是512个维度,那么需要将512个维度映射到词表大小,之后去做softmax,来得到当前时刻哪个词出现的概率最大。

4.2.实现函数forward

def forward(self, enc_inputs, dec_inputs):

enc_inputs 和 dec_inputs 是整个transformer的输入,分别对应结构图中的Inputs和Outputs:

Transformer代码讲解 - 图4

enc_inputs是编码端的输入,形状为 batch_size × src_len,src_len 是输入端句子的长度;

dec_inputs是解码端输入,形状为 batch_size × src_len,tgt_len 是解码端句子的长度。

以下三句代码将编码层、解码层和输出层放置好的数据串起来:

enc_outputs, enc_self_attns = self.encoder(enc_inputs)

enc_inputs流经了encoder得到编码层的输出 enc_outputs 和 enc_self_attns。enc_outputs是主要的输出,enc_self_attns是QK转置相乘之后经过softmax后的矩阵值,代表的是每个单词和其他单词的相关性。这里的输出也可以自己指定,想要什么就输出什么不一定是前面说的这些输出,也可以是全部tokens的输出,或特定某一层的输出,或是中间某些参数的输出。

dec_outputs, dec_self_attns, dec_enc_attns = self.decoder(dec_inputs, enc_inputs, enc_outputs)

dec_inputs是解码端的输入,enc_outputs是编码端的输出(用于和解码端作交互),这两个是解码端的主要输入,对应结构图中的红框部分:

Transformer代码讲解 - 图5

enc_inputs是编码端的输入。

dec_logits = self.projection(dec_outputs)

dec_outputs是解码端的输出,通过projection层映射到词表大小。

4.3.Encoder层的定义

# 2. Encoder 部分包含三个部分:词向量embedding,位置编码,注意力层及前馈神经网络
class Encoder(nn.Module): # Encoder继承nn.Module类
    def __init__(self): # 初始化
        super(Encoder, self).__init__() 
        self.src_emb = nn.Embedding(src_vocab_size, d_model) # 词向量层。定义生成一个矩阵,大小是src_vocab_size * d_model
        self.pos_emb = PositionalEncoding(d_model) # 位置编码
        self.layers = nn.ModuleList([EncoderLayer() for _ in range(n_layers)]) ## 使用ModuleList对多个encoder进行堆叠

    def forward(self, enc_inputs): # enc_inputs形状是:[batch_size x source_len]
        # 通过src_emb进行索引定位,enc_outputs输出形状是[batch_size, src_len, d_model]
        enc_outputs = self.src_emb(enc_inputs)

        # 位置编码,把两者相加放入到这个函数里,从这里可以去看位置编码函数的实现:3.
        enc_outputs = self.pos_emb(enc_outputs.transpose(0, 1)).transpose(0, 1)

        # get_attn_pad_mask是为了得到句子中pad的位置信息,给到模型后面
        enc_self_attn_mask = get_attn_pad_mask(enc_inputs, enc_inputs)

        enc_self_attns = []
        for layer in self.layers:# 去看EncoderLayer层函数:5.
            enc_outputs, enc_self_attn = layer(enc_outputs, enc_self_attn_mask)
            enc_self_attns.append(enc_self_attn)
        return enc_outputs, enc_self_attns

代码中Encoder层分为三个部分,如图所示:

Transformer代码讲解 - 图6

4.3.1.模型初始化def __init__(self)

self.src_emb = nn.Embedding(src_vocab_size, d_model) # 词向量层

把 Embedding 放到 src_emb 中,定义生成一个矩阵,大小是src_vocab_size * d_model。

self.pos_emb = PositionalEncoding(d_model) # 位置编码

位置编码层 PositionalEncoding 放到 pos_emb 中,这里是固定的正余弦函数,也可以使用类似词向量的nn.Embedding获得一个可以更新学习的位置编码。

self.layers = nn.ModuleList([EncoderLayer() for _ in range(n_layers)]) # 注意力层和前馈神经网络

把注意力层和前馈神经网络放到EncoderLayer()中,nn.ModuleList是对多个encoder进行堆叠。

4.3.2.实现函数forward(self, enc_inputs)

enc_inputs 是Encoder层的输入,大小是batch_size × source_len(source_len是输入句子的长度)。

enc_outputs = self.src_emb(enc_inputs)

src_emb就是上面定义的nn.Embedding词向量层,它的作用是通过索引在词表中定位,把对应数字的词向量提取出来形成一个矩阵enc_outputs,矩阵的形状是batch_size × src_len × d_model。

enc_outputs = self.pos_emb(enc_outputs.transpose(0, 1)).transpose(0, 1)

这一行代码是位置编码,输入是词向量层src_emb的输出。这一层将词向量和位置编码相加,下面介绍位置编码函数的实现。

位置编码

位置编码的实现其实很简单,直接对照着公式去敲代码就可以,下面这个代码只是其中一种实现方式。

# 3. PositionalEncoding 代码实现
class PositionalEncoding(nn.Module):
    def __init__(self, d_model, dropout=0.1, max_len=5000):
        super(PositionalEncoding, self).__init__()

        self.dropout = nn.Dropout(p=dropout)

        pe = torch.zeros(max_len, d_model) # 定义pe的维度,有max_len列,d_model行
        position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1) # 字符对应的位置
        div_term = torch.exp(torch.arange(0, d_model, 2).float() * (-math.log(10000.0) / d_model)) # 指数部分
        pe[:, 0::2] = torch.sin(position * div_term) # 偶数位置
        pe[:, 1::2] = torch.cos(position * div_term) # 奇数位置
        # 上面代码获取之后得到的pe:[max_len*d_model]

        # 下面这个代码之后,我们得到的pe形状是:[max_len*1*d_model]
        pe = pe.unsqueeze(0).transpose(0, 1)

        self.register_buffer('pe', pe)  # 定一个缓冲区,可以理解为pe参数不参与参数更新

    def forward(self, x):
        # x: [seq_len, batch_size, d_model],x是embedding后的词向量
        x = x + self.pe[:x.size(0), :] # embedding后的词向量和位置编码相加
        return self.dropout(x)

位置编码公式:

(公式中pos代表的是单词在句子中的索引,2i指偶数索引的位置,2i+1指奇数索引的位置;注意如果max_len是128个,那么索引就是从0,1,2,…,127。假设demodel是512,2i对应的字符中i从0取到255,那2i对应取值就是0,2,4…510)

Transformer代码讲解 - 图7%7D%3Dsin(pos%2F10000%5E%7B2i%2Fd%7Bmodel%7D%20%7D%20)%20%24%24%EF%BC%8C%E8%A1%A8%E7%A4%BA%E5%81%B6%E6%95%B0%E4%BD%8D%E7%BD%AE%E7%9A%84%E4%BD%8D%E7%BD%AE%E7%BC%96%E7%A0%81%E8%AE%A1%E7%AE%97%E5%85%AC%E5%BC%8F%EF%BC%8C%0A%0A%24%24PE%7B(pos%2C2i%2B1)%7D%3Dcos(pos%2F10000%5E%7B2i%2Fd%7Bmodel%7D%20%7D%20)%20%24%24%EF%BC%8C%E8%A1%A8%E7%A4%BA%E5%A5%87%E6%95%B0%E4%BD%8D%E7%BD%AE%E7%9A%84%E4%BD%8D%E7%BD%AE%E7%BC%96%E7%A0%81%E8%AE%A1%E7%AE%97%E5%85%AC%E5%BC%8F%EF%BC%8C%0A%0A%E5%85%AC%E5%BC%8F%E4%B8%AD%E7%9A%84%E5%85%AC%E5%85%B1%E9%83%A8%E5%88%86%E6%98%AF%24pos%2F10000%5E%7B2i%2Fd%7Bmodel%7D%20%7D%20%24%EF%BC%8C%E5%85%B6%E4%B8%AD%241%2F10000%5E%7B2i%2Fd%7Bmodel%7D%20%7D%20%24%E5%8F%AF%E4%BB%A5%E5%81%9A%E5%A6%82%E4%B8%8B%E8%BD%AC%E6%8D%A2%EF%BC%9A%0A#card=math&code=PE%7B%28pos%2C2i%29%7D%3Dsin%28pos%2F10000%5E%7B2i%2Fd%7Bmodel%7D%20%7D%20%29%20%24%24%EF%BC%8C%E8%A1%A8%E7%A4%BA%E5%81%B6%E6%95%B0%E4%BD%8D%E7%BD%AE%E7%9A%84%E4%BD%8D%E7%BD%AE%E7%BC%96%E7%A0%81%E8%AE%A1%E7%AE%97%E5%85%AC%E5%BC%8F%EF%BC%8C%0A%0A%24%24PE%7B%28pos%2C2i%2B1%29%7D%3Dcos%28pos%2F10000%5E%7B2i%2Fd%7Bmodel%7D%20%7D%20%29%20%24%24%EF%BC%8C%E8%A1%A8%E7%A4%BA%E5%A5%87%E6%95%B0%E4%BD%8D%E7%BD%AE%E7%9A%84%E4%BD%8D%E7%BD%AE%E7%BC%96%E7%A0%81%E8%AE%A1%E7%AE%97%E5%85%AC%E5%BC%8F%EF%BC%8C%0A%0A%E5%85%AC%E5%BC%8F%E4%B8%AD%E7%9A%84%E5%85%AC%E5%85%B1%E9%83%A8%E5%88%86%E6%98%AF%24pos%2F10000%5E%7B2i%2Fd%7Bmodel%7D%20%7D%20%24%EF%BC%8C%E5%85%B6%E4%B8%AD%241%2F10000%5E%7B2i%2Fd_%7Bmodel%7D%20%7D%20%24%E5%8F%AF%E4%BB%A5%E5%81%9A%E5%A6%82%E4%B8%8B%E8%BD%AC%E6%8D%A2%EF%BC%9A%0A)

1/10000^{2i/d{model} } =10000^{-(2i/d{model})} = e{-(2i/d{model})}} } = e^{(-2i/d{model})*\log_{e}{10000 }}=e^{2i×\frac{-log10000}{dmodel} }

Transformer代码讲解 - 图8.float()%20%20(-math.log(10000.0)%20%2F%20d_model))%0A%60%60%60%0A%0A%E5%85%B6%E4%B8%AD%60torch.arange(0%2C%20d_model%2C%202).float()%60%E5%AF%B9%E5%BA%94%E7%9A%84%E6%98%AF%E5%85%83%E7%B4%A0%E7%B4%A2%E5%BC%95%EF%BC%882i%E6%88%962i%2B1%EF%BC%89%EF%BC%8C%60torch.arange(0%2C%20d_model%2C%202)%60%E8%BF%94%E5%9B%9E%E4%B8%80%E4%B8%AA%E4%B8%80%E7%BB%B4%E7%9A%84%E5%BC%A0%E9%87%8F%EF%BC%8C%E5%BC%A0%E9%87%8F%E4%B8%AD%E7%9A%84%E5%85%83%E7%B4%A0%E5%8F%96%E5%80%BC%E8%8C%83%E5%9B%B4%E6%98%AF%5B0%2C%20d_model)%EF%BC%8C%E6%AD%A5%E9%95%BF%E4%B8%BA2%E3%80%82%0A%0A%E4%BB%A5%E4%B8%8B%E4%BB%A3%E7%A0%81%E5%88%86%E5%88%AB%E5%AF%B9%E5%BA%94%E5%81%B6%E6%95%B0%E4%BD%8D%E7%BD%AE%E8%AE%A1%E7%AE%97%E5%85%AC%E5%BC%8F%E5%92%8C%E5%A5%87%E6%95%B0%E4%BD%8D%E7%BD%AE%E8%AE%A1%E7%AE%97%E5%85%AC%E5%BC%8F%EF%BC%9A%0A%0A%60%60%60python%0Ape%5B%3A%2C%200%3A%3A2%5D%20%3D%20torch.sin(position%20%20divterm)%20%23%20%E5%81%B6%E6%95%B0%E4%BD%8D%E7%BD%AE%0Ape%5B%3A%2C%201%3A%3A2%5D%20%3D%20torch.cos(position%20*%20div_term)%20%23%20%E5%A5%87%E6%95%B0%E4%BD%8D%E7%BD%AE%0A%60%60%60%0A%0A%60pe%5B%3A%2C%200%3A%3A2%5D%60%E5%B0%B1%E6%98%AF%E4%BB%8E0%E5%8F%96%E5%88%B0%E7%BB%93%E5%B0%BE%EF%BC%8C%E6%AD%A5%E9%95%BF%E4%B8%BA2%EF%BC%8C%E4%BB%A3%E8%A1%A8%E7%9A%84%E6%98%AF%E5%81%B6%E6%95%B0%E4%BD%8D%E7%BD%AE%EF%BC%8C%60pe%5B%3A%2C%201%3A%3A2%5D%60%E5%B0%B1%E6%98%AF%E4%BB%8E1%E5%8F%96%E5%88%B0%E7%BB%93%E5%B0%BE%EF%BC%8C%E6%AD%A5%E9%95%BF%E4%B8%BA2%EF%BC%8C%E4%BB%A3%E8%A1%A8%E7%9A%84%E6%98%AF%E5%A5%87%E6%95%B0%E4%BD%8D%E7%BD%AE%E3%80%82%0A%0A%E4%B8%8B%E9%9D%A2%E7%9A%84%E4%BB%A3%E7%A0%81%E5%AF%B9%E4%B8%8A%E9%9D%A2%E5%BE%97%E5%88%B0%E7%9A%84pe%E7%9F%A9%E9%98%B5%E8%BF%9B%E8%A1%8C%E5%A2%9E%E7%BB%B4%E5%92%8C%E8%BD%AC%E7%BD%AE%EF%BC%9A%0A%0A%60%60%60python%0Ape%20%3D%20pe.unsqueeze(0).transpose(0%2C%201)%0A%60%60%60%0A%0Ape%E7%9A%84%E5%88%9D%E5%A7%8B%E5%BD%A2%E7%8A%B6%E6%98%AFmax_len%C3%97d_model%EF%BC%9Bunsqueeze(0)%E8%A1%A8%E7%A4%BA%E5%9C%A8%E7%AC%AC0%E4%B8%AA%E4%BD%8D%E7%BD%AE%E5%8A%A01%E4%B8%AA%E7%BB%B4%E5%BA%A6%EF%BC%8Cpe%E5%BD%A2%E7%8A%B6%E5%8F%98%E4%B8%BA1%C3%97max_len%C3%97d_model%EF%BC%9Btranspose(0%2C%201)%E8%A1%A8%E7%A4%BA%E4%BA%A4%E6%8D%A2%E7%AC%AC0%E7%BB%B4%E5%92%8C%E7%AC%AC1%E7%BB%B4%EF%BC%8Cpe%E5%BD%A2%E7%8A%B6%E5%8F%98%E4%B8%BAmax_len%C3%971%C3%97d_model%E3%80%82%0A%0A%E4%B9%8B%E5%90%8E%E6%98%AF%E4%BD%8D%E7%BD%AE%E7%BC%96%E7%A0%81%E7%9A%84%E5%AE%9E%E7%8E%B0%E5%87%BD%E6%95%B0forward%EF%BC%9A%0A%0A%60%60%60python%0A%20%20%20%20def%20forward(self%2C%20x)%3A%0A%20%20%20%20%20%20%20%20%23%20x%3A%20%5Bseq_len%2C%20batch_size%2C%20d_model%5D%EF%BC%8Cx%E6%98%AFembedding%E5%90%8E%E7%9A%84%E8%AF%8D%E5%90%91%E9%87%8F%0A%20%20%20%20%20%20%20%20x%20%3D%20x%20%2B%20self.pe%5B%3Ax.size(0)%2C%20%3A%5D%20%23%20embedding%E5%90%8E%E7%9A%84%E8%AF%8D%E5%90%91%E9%87%8Fx%E5%92%8C%E4%BD%8D%E7%BD%AE%E7%BC%96%E7%A0%81%E7%9B%B8%E5%8A%A0%0A%20%20%20%20%20%20%20%20return%20self.dropout(x)%0A%60%60%60%0A%0A%E5%85%B6%E4%B8%AD%60pe%5B%3Ax.size(0)%2C%20%3A%5D%60%E8%A1%A8%E7%A4%BA%E7%AC%AC0%E7%BB%B4%E4%BB%8E%E5%BC%80%E5%A4%B4%E5%8F%96%E5%88%B0%E7%AC%ACx.size(0)%E4%B8%AA%EF%BC%8C%E7%AC%AC1%E7%BB%B4%E4%BB%8E%E5%BC%80%E5%A4%B4%E5%8F%96%E5%88%B0%E7%BB%93%E5%B0%BE%E3%80%82%0A%0A%E4%BB%A5%E4%B8%8A%E5%B0%B1%E6%98%AF%E4%BD%8D%E7%BD%AE%E7%BC%96%E7%A0%81%E5%87%BD%E6%95%B0%E7%9A%84%E5%AE%9E%E7%8E%B0%E3%80%82%0A%0A%23%23%23%23%23%20%E7%AC%A6%E5%8F%B7%E7%9F%A9%E9%98%B5%0A%0A%E6%8E%A5%E4%B8%8B%E6%9D%A5%E6%98%AFEncoder%E5%B1%82forward%E5%87%BD%E6%95%B0%E4%B8%AD%20get_attn_pad_mask%E5%87%BD%E6%95%B0%E7%9A%84%E5%AE%9E%E7%8E%B0%EF%BC%88attn%E6%98%AF%E6%8C%87attention%EF%BC%89%EF%BC%9A%0A%0A%60%60%60python%0Aenc_self_attn_mask%20%3D%20get_attn_pad_mask(enc_inputs%2C%20enc_inputs)%0A%60%60%60%0A%0A%E8%BF%99%E4%B8%AA%E5%87%BD%E6%95%B0%E6%98%AF%E7%94%A8%E6%9D%A5%E5%91%8A%E8%AF%89%E5%90%8E%E9%9D%A2%E7%9A%84%E5%B1%82%E6%88%96%E6%A8%A1%E5%9E%8B%EF%BC%8C%E5%9C%A8%E5%8E%9F%E5%A7%8B%E5%8F%A5%E5%AD%90%E7%9A%84%E8%BE%93%E5%85%A5%E4%B8%AD%E5%93%AA%E4%BA%9B%E9%83%A8%E5%88%86%E6%98%AF%E8%A2%ABpad%E7%AC%A6%E5%8F%B7%E5%A1%AB%E5%85%85%E7%9A%84%EF%BC%8C%E5%9C%A8%E8%AE%A1%E7%AE%97%E8%87%AA%E6%B3%A8%E6%84%8F%E5%8A%9B%E5%92%8C%E4%BA%A4%E4%BA%92%E6%B3%A8%E6%84%8F%E5%8A%9B%E7%9A%84%E6%97%B6%E5%80%99%E5%8E%BB%E6%8E%89pad%E7%AC%A6%E5%8F%B7%E7%9A%84%E5%BD%B1%E5%93%8D%E3%80%82%0A%0A%3E%20%E4%B8%80%E4%B8%AAbatch%E4%B8%AD%E6%89%80%E6%9C%89%E5%8F%A5%E5%AD%90%E7%9A%84%E9%95%BF%E5%BA%A6%E6%98%AF%E4%B8%8D%E4%B8%80%E8%87%B4%E7%9A%84%EF%BC%8C%E4%B8%BA%E4%BA%86%E8%AE%A9%E5%8F%A5%E5%AD%90%E7%BB%84%E6%88%90%E7%9F%A9%E9%98%B5%E6%9B%B4%E5%A5%BD%E7%9A%84%E8%A2%AB%E6%A8%A1%E5%9E%8B%E5%A4%84%E7%90%86%E8%80%8C%E8%AE%BE%E5%AE%9A%E4%BA%86max_length%EF%BC%8C%E9%95%BF%E5%BA%A6%E8%B6%85%E5%87%BAmax_length%E7%9A%84%E9%83%A8%E5%88%86%E6%88%AA%E6%96%AD%E4%B8%8D%E8%A6%81%EF%BC%8C%E5%B0%8F%E4%BA%8Emax_length%E7%9A%84%E9%83%A8%E5%88%86%E7%94%A8pad%E7%AC%A6%E5%8F%B7%E5%A1%AB%E5%85%85%E3%80%82%0A%3E%0A%3E%20%E5%9C%A8%E8%AE%A1%E7%AE%97%E6%AF%8F%E4%B8%AA%E5%AD%97%E7%AC%A6%E4%B8%8E%E5%85%B6%E4%BB%96%E5%AD%97%E7%AC%A6%E7%9A%84attention%E5%80%BC%EF%BC%88%E7%9B%B8%E5%85%B3%E6%80%A7%EF%BC%89%E6%97%B6%EF%BC%8C%E6%9C%89%E8%BF%99%E6%A0%B7%E7%9A%84%E8%AE%A1%E7%AE%97%E5%85%AC%E5%BC%8F%EF%BC%9A%24Attention(Q%2CK%2CV)%3Dsoftmax(%5Cfrac%7BQK%5E%7BT%7D%20%7D%7B%5Csqrt%7Bd%7Bk%7D%20%7D%20%7D%20)V%24%EF%BC%8C%E5%85%B6%E4%B8%AD%24%5Cfrac%7BQK%5E%7BT%7D%20%7D%7B%5Csqrt%7Bd%7Bk%7D%20%7D%20%7D%20%24%E5%9C%A8%E8%AE%A1%E7%AE%97%E5%90%8E%E6%98%AF%E4%B8%80%E4%B8%AA%E7%9B%B8%E4%BC%BC%E5%BA%A6%E7%9F%A9%E9%98%B5%EF%BC%8C%E7%9B%B8%E4%BC%BC%E5%BA%A6%E7%9F%A9%E9%98%B5%E4%B8%AD%E5%8C%85%E5%90%AB%E6%AF%8F%E4%B8%AA%E5%AD%97%E7%AC%A6%E5%92%8Cpad%E7%AC%A6%E5%8F%B7%E7%9A%84%E7%9B%B8%E4%BC%BC%E5%BA%A6%EF%BC%8C%E4%BD%86pad%E7%AC%A6%E5%8F%B7%E5%8F%AA%E6%98%AF%E7%94%A8%E6%9D%A5%E5%A1%AB%E5%85%85%E5%8F%A5%E5%AD%90%E9%95%BF%E5%BA%A6%E7%9A%84%EF%BC%8C%E6%89%80%E4%BB%A5%E6%88%91%E4%BB%AC%E4%B8%8D%E9%9C%80%E8%A6%81%E8%AE%A1%E7%AE%97%E5%AD%97%E7%AC%A6%E5%92%8Cpad%E7%AC%A6%E5%8F%B7%E7%9A%84%E7%9B%B8%E4%BC%BC%E5%BA%A6%E3%80%82%E5%9B%A0%E6%AD%A4%E5%9C%A8%E8%AE%A1%E7%AE%97%E6%97%B6%E8%A6%81%E6%8A%8A%E7%9B%B8%E4%BC%BC%E5%BA%A6%E7%9F%A9%E9%98%B5%E4%B8%AD%E6%AF%8F%E4%B8%AA%E5%AD%97%E7%AC%A6%E5%92%8Cpad%E7%AC%A6%E5%8F%B7%E7%9A%84%E7%9B%B8%E4%BC%BC%E5%BA%A6%E5%8E%BB%E6%8E%89%EF%BC%8C%E5%8E%BB%E6%8E%89%E5%B0%B1%E9%9C%80%E8%A6%81%E7%9F%A5%E9%81%93%E8%BF%99%E4%B8%80%E7%9B%B8%E4%BC%BC%E5%BA%A6%E5%9C%A8%E7%9F%A9%E9%98%B5%E4%B8%AD%E7%9A%84%E4%BD%8D%E7%BD%AE%EF%BC%8Cget_attn_pad_mask%E5%87%BD%E6%95%B0%E5%B0%B1%E6%98%AF%E7%94%A8%E6%9D%A5%E8%8E%B7%E5%8F%96%E8%BF%99%E4%B8%80%E4%BD%8D%E7%BD%AE%E7%9A%84%E3%80%82%0A%3E%0A%3E%20%E8%BF%99%E4%B8%80%E8%BF%87%E7%A8%8B%E7%94%A8%E5%9B%BE%E6%9D%A5%E8%A1%A8%E7%A4%BA%E5%A6%82%E4%B8%8B%EF%BC%9A%0A%3E%0A%3E%20%3Cimg%20src%3D%22Transformer%E4%BB%A3%E7%A0%81%E8%AE%B2%E8%A7%A3.assets%2Fimage-20220516193406587.png%22%20alt%3D%22image-20220516193406587%22%20style%3D%22zoom%3A%2050%25%3B%22%20%2F%3E%0A%3E%0A%3E%20%E5%85%B6%E4%B8%AD20%E8%A1%A8%E7%A4%BA%E5%8D%B7%E5%92%8C%E5%8D%B7%E7%9A%84%E7%9B%B8%E4%BC%BC%E5%BA%A6%EF%BC%8C5%E8%A1%A8%E7%A4%BA%E5%8D%B7%E5%92%8C%E8%B5%B7%E7%9A%84%E7%9B%B8%E4%BC%BC%E5%BA%A6%EF%BC%8C%E4%BB%A5%E6%AD%A4%E7%B1%BB%E6%8E%A8%EF%BC%8C9%E8%A1%A8%E7%A4%BA%E5%8D%B7%E5%92%8Cpad%E7%AC%A6%E5%8F%B7%E7%9A%84%E7%9B%B8%E4%BC%BC%E5%BA%A6%EF%BC%8C%E9%82%A3%E4%B9%889%E8%BF%99%E4%B8%AA%E7%9B%B8%E4%BC%BC%E5%BA%A6%E6%98%AF%E4%B8%8D%E9%9C%80%E8%A6%81%E8%80%83%E8%99%91%E7%9A%84%EF%BC%8C%E8%A6%81%E5%9C%A8%E4%B9%8B%E5%90%8E%E7%9A%84%E8%AE%A1%E7%AE%97%E4%B8%AD%E5%BF%BD%E7%95%A5%E6%8E%89%E3%80%82%E6%8A%8A%E5%AD%97%E7%AC%A6%E4%B8%8Epad%E7%9A%84%E7%9B%B8%E4%BC%BC%E5%BA%A6%E5%9C%A8%E7%9F%A9%E9%98%B5%E4%B8%AD%E7%9A%84%E4%BD%8D%E7%BD%AE%E6%A0%87%E5%87%BA%E6%9D%A5%E3%80%81%E7%94%A81%E8%A1%A8%E7%A4%BA%EF%BC%8C%E5%B0%B1%E5%BE%97%E5%88%B0%E7%AC%A6%E5%8F%B7%E7%9F%A9%E9%98%B5%E3%80%82get_attn_pad_mask%E5%87%BD%E6%95%B0%E5%B0%B1%E6%98%AF%E7%94%A8%E6%9D%A5%E5%BE%97%E5%88%B0%E8%BF%99%E4%B8%AA%E7%AC%A6%E5%8F%B7%E7%9F%A9%E9%98%B5%E7%9A%84%E3%80%82%0A%0Aget_attn_pad_mask%E5%87%BD%E6%95%B0%E5%AE%9E%E7%8E%B0%E4%BB%A3%E7%A0%81%EF%BC%9A%0A%0A%60%60%60python%0Adef%20get_attn_pad_mask(seq_q%2C%20seq_k)%3A%0A%20%20%20%20batch_size%2C%20len_q%20%3D%20seq_q.size()%0A%20%20%20%20batch_size%2C%20len_k%20%3D%20seq_k.size()%0A%20%20%20%20pad_attn_mask%20%3D%20seq_k.data.eq(0).unsqueeze(1)%20%20%23%20batch_size%20x%201%20x%20len_k%2C%20one%20is%20masking%0A%20%20%20%20return%20pad_attn_mask.expand(batch_size%2C%20len_q%2C%20len_k)%20%20%23%20batch_size%20x%20len_q%20x%20len_k%0A%60%60%60%0A%0A%E6%B3%A8%E6%84%8F%EF%BC%9Aseq_q%E5%92%8Cseq_k%E7%9A%84%E5%BD%A2%E7%8A%B6%E4%B8%8D%E4%B8%80%E5%AE%9A%E4%B8%80%E8%87%B4%E3%80%82%E5%9C%A8%E8%87%AA%E6%B3%A8%E6%84%8F%E5%8A%9B%E5%B1%82%EF%BC%8C%E5%AE%83%E4%BB%AC%E7%9A%84%E5%BD%A2%E7%8A%B6%E6%98%AF%E4%B8%80%E8%87%B4%E7%9A%84%E3%80%82%E5%9C%A8%E4%BA%A4%E4%BA%92%E6%B3%A8%E6%84%8F%E5%8A%9B%E5%B1%82%E4%B8%AD%EF%BC%8Cq%E6%9D%A5%E8%87%AA%E8%A7%A3%E7%A0%81%E7%AB%AF%EF%BC%8Ck%E6%9D%A5%E8%87%AA%E7%BC%96%E7%A0%81%E7%AB%AF%EF%BC%8C%E5%BD%A2%E7%8A%B6%E6%98%AF%E4%B8%8D%E4%B8%80%E8%87%B4%E7%9A%84%E3%80%82%E5%85%B6%E6%AC%A1%EF%BC%8C%E5%8F%AA%E5%91%8A%E8%AF%89%E6%A8%A1%E5%9E%8B%E7%BC%96%E7%A0%81%E7%AB%AF%E7%9A%84pad%E7%AC%A6%E5%8F%B7%E4%BF%A1%E6%81%AF%E5%B0%B1%E5%8F%AF%E4%BB%A5%EF%BC%8C%E8%A7%A3%E7%A0%81%E7%AB%AF%E7%9A%84pad%E4%BF%A1%E6%81%AF%E5%9C%A8%E4%BA%A4%E4%BA%92%E6%B3%A8%E6%84%8F%E5%8A%9B%E5%B1%82%E6%98%AF%E6%B2%A1%E6%9C%89%E7%94%A8%E5%88%B0%E7%9A%84%E3%80%82%0A%0A%E4%B8%8B%E9%9D%A2%E8%BF%99%E8%A1%8C%E4%BB%A3%E7%A0%81%E5%B0%B1%E6%98%AF%E7%94%A8%E6%9D%A5%E5%91%8A%E8%AF%89%E6%A8%A1%E5%9E%8B%E7%BC%96%E7%A0%81%E7%AB%AF%E4%BC%A0%E6%9D%A5%E7%9A%84seq_k%E4%B8%AD%E5%93%AA%E4%BA%9B%E4%BD%8D%E7%BD%AE%E6%98%AFpad%E7%AC%A6%E5%8F%B7%EF%BC%9A%0A%0A%60%60%60python%0Apad_attn_mask%20%3D%20seq_k.data.eq(0).unsqueeze(1)%20%20%23%20batch_size%20x%201%20x%20len_k%2C%20one%20is%20masking%0A%60%60%60%0A%0A%60.data%60%E6%98%AF%E5%B0%86seq_k%E7%9F%A9%E9%98%B5%E5%A4%8D%E5%88%B6%E4%B8%80%E4%BB%BD%EF%BC%8C%E5%A4%8D%E5%88%B6%E5%90%8E%E7%9A%84%E7%9F%A9%E9%98%B5%E5%92%8C%E5%8E%9F%E7%9F%A9%E9%98%B5%E5%86%85%E5%AD%98%E4%B8%8D%E5%85%B1%E4%BA%AB%EF%BC%9B%0A%0A%60.eq()%60%E6%98%AF%E5%AF%B9%E4%B8%A4%E4%B8%AA%E5%BC%A0%E9%87%8F%5BTensor%5D(https%3A%2F%2Fso.csdn.net%2Fso%2Fsearch%3Fq%3DTensor%26spm%3D1001.2101.3001.7020)%E8%BF%9B%E8%A1%8C%E9%80%90%E5%85%83%E7%B4%A0%E7%9A%84%E6%AF%94%E8%BE%83%EF%BC%8C%E8%8B%A5%E7%9B%B8%E5%90%8C%E4%BD%8D%E7%BD%AE%E7%9A%84%E4%B8%A4%E4%B8%AA%E5%85%83%E7%B4%A0%E7%9B%B8%E5%90%8C%EF%BC%8C%E5%88%99%E8%BF%94%E5%9B%9ETrue%EF%BC%9B%E8%8B%A5%E4%B8%8D%E5%90%8C%EF%BC%8C%E8%BF%94%E5%9B%9EFalse%E3%80%82%60.eq(0)%60%E5%B0%B1%E8%A1%A8%E7%A4%BA%E4%B8%A4%E4%B8%AA%E7%9F%A9%E9%98%B5%E7%9B%B8%E5%90%8C%E4%BD%8D%E7%BD%AE%E7%9A%84%E5%85%83%E7%B4%A0%E9%83%BD%E6%98%AF0%E5%88%99%E8%BF%94%E5%9B%9ETrue%EF%BC%88%E5%9C%A8%E4%B8%BB%E5%87%BD%E6%95%B0%E9%83%A8%E5%88%86%EF%BC%8C%E6%88%91%E4%BB%AC%E5%AE%9A%E4%B9%89%E7%9A%84%E7%BC%96%E7%A0%81%E7%AB%AF%E8%AF%8D%E8%A1%A8%E5%92%8C%E8%A7%A3%E7%A0%81%E7%AB%AF%E8%AF%8D%E8%A1%A8%E4%B8%AD%EF%BC%8Cpad%E7%AC%A6%E5%8F%B7%E5%AF%B9%E5%BA%94%E7%9A%84%E6%95%B0%E5%AD%97%E6%98%AF0%EF%BC%8C%E5%9B%A0%E6%AD%A4%E6%88%91%E4%BB%AC%E8%A6%81%E6%89%BE%E5%87%BA%E7%9F%A9%E9%98%B5%E4%B8%AD%E5%93%AA%E4%BA%9B%E4%BD%8D%E7%BD%AE%E7%9A%84%E5%85%83%E7%B4%A0%E6%98%AF0%EF%BC%8C%E5%B9%B6%E8%BF%94%E5%9B%9ETrue%E3%80%82True%E4%B9%9F%E5%8F%AF%E4%BB%A5%E6%98%AF1%EF%BC%8C%E8%BF%99%E5%B0%B1%E5%8F%AF%E4%BB%A5%E5%BE%97%E5%88%B0pad%E4%BD%8D%E7%BD%AE%E7%9A%84%E5%85%83%E7%B4%A0%E6%98%AF1%EF%BC%8C%E5%85%B6%E4%BB%96%E4%B8%8D%E6%98%AFpad%E7%9A%84%E5%85%83%E7%B4%A0%E6%98%AF0%E7%9A%84%E7%AC%A6%E5%8F%B7%E7%9F%A9%E9%98%B5%EF%BC%89%EF%BC%9B%0A%0A%60.unsqueeze(1)%60%E8%A1%A8%E7%A4%BA%E5%9C%A8%E7%AC%AC1%E4%B8%AA%E4%BD%8D%E7%BD%AE%E5%8A%A01%E4%B8%AA%E7%BB%B4%E5%BA%A6%EF%BC%8C%E5%BD%A2%E7%8A%B6%E7%94%B1%20batch_size%20%C3%97%20len_k%20%E5%8F%98%E4%B8%BA%20batch_size%20%C3%97%201%20%C3%97%20len_k%E3%80%82%0A%0A%60%60%60python%0A%20return%20pad_attn_mask.expand(batch_size%2C%20len_q%2C%20len_k)%20%20%23%20batch_size%20x%20len_q%20x%20len_k%0A%60%60%60%0A%0A%60input.expand(sizes)%60%20%E5%87%BD%E6%95%B0%E8%83%BD%E5%A4%9F%E5%AE%9E%E7%8E%B0%20input%20%E8%BE%93%E5%85%A5%E5%BC%A0%E9%87%8F%E4%B8%AD%E5%8D%95%E7%BB%B4%E5%BA%A6%E4%B8%8A%E6%95%B0%E6%8D%AE%E7%9A%84%E5%A4%8D%E5%88%B6%E6%93%8D%E4%BD%9C%E3%80%82%E5%85%B6%E4%B8%AD%20%5Csizes%20%E5%88%86%E5%88%AB%E6%8C%87%E5%AE%9A%E4%BA%86%E6%AF%8F%E4%B8%AA%E7%BB%B4%E5%BA%A6%E4%B8%8A%E5%A4%8D%E5%88%B6%E7%9A%84%E5%80%8D%E6%95%B0%EF%BC%8C%E5%AF%B9%E4%BA%8E%E4%B8%8D%E9%9C%80%E8%A6%81%EF%BC%88%E6%88%96%E9%9D%9E%E5%8D%95%E7%BB%B4%E5%BA%A6%EF%BC%89%E8%BF%9B%E8%A1%8C%E5%A4%8D%E5%88%B6%E7%9A%84%E7%BB%B4%E5%BA%A6%EF%BC%8C%E5%AF%B9%E5%BA%94%E4%BD%8D%E7%BD%AE%E4%B8%8A%E5%8F%AF%E4%BB%A5%E5%86%99%E4%B8%8A%E5%8E%9F%E5%A7%8B%E7%BB%B4%E5%BA%A6%E7%9A%84%E5%A4%A7%E5%B0%8F%E6%88%96%E8%80%85%E7%9B%B4%E6%8E%A5%E5%86%99%20-1%E3%80%82%0A%0Apad_attn_mask%E7%9A%84%E9%80%9A%E8%BF%87%E4%B8%8A%E9%9D%A2%E7%9A%84%E6%93%8D%E4%BD%9C%E5%90%8E%E5%BE%97%E5%88%B0%E7%9A%84%E5%BD%A2%E7%8A%B6%E6%98%AF%20batch_size%20%C3%97%201%20%C3%97%20len_k%20%EF%BC%8C%60pad_attn_mask.expand(batch_size%2C%20len_q%2C%20len_k)%60%E5%88%99%E8%A1%A8%E7%A4%BA%E5%AF%B9pad_attn_mask%E5%9C%A8%E5%8E%9F%E6%9D%A5%E6%98%AF1%E7%9A%84%E7%BB%B4%E5%BA%A6%E4%B8%8A%E5%A4%8D%E5%88%B6len_q%E6%AC%A1%E6%95%B0%E6%8D%AE%EF%BC%8C%E6%9C%80%E7%BB%88%E5%BE%97%E5%88%B0%E7%9A%84%E5%BD%A2%E7%8A%B6%E6%98%AFbatch_size%20%C3%97%20len_q%20%C3%97%20len_k%EF%BC%8C%E8%BF%99%E4%B8%80%E5%BD%A2%E7%8A%B6%E5%92%8C%24%5Cfrac%7BQK%5E%7BT%7D%20%7D%7B%5Csqrt%7Bd%7Bk%7D%20%7D%20%7D%20%24%E8%AE%A1%E7%AE%97%E5%90%8E%E5%BE%97%E5%88%B0%E7%9A%84%E7%9F%A9%E9%98%B5%E5%BD%A2%E7%8A%B6%E6%98%AF%E7%9B%B8%E5%90%8C%E7%9A%84%EF%BC%8C%E5%9B%A0%E4%B8%BA%E6%88%91%E4%BB%AC%E8%A6%81%E7%9F%A5%E9%81%93%E7%9F%A9%E9%98%B5%E4%B8%AD%E5%93%AA%E4%B8%AA%E4%BD%8D%E7%BD%AE%E5%AF%B9%E5%BA%94%E5%AD%97%E7%AC%A6%E4%B8%8Epad%E7%AC%A6%E5%8F%B7%E7%9A%84%E7%9B%B8%E4%BC%BC%E5%BA%A6%EF%BC%8C%E6%89%80%E4%BB%A5%E9%9C%80%E8%A6%81%E7%9B%B8%E5%90%8C%E5%BD%A2%E7%8A%B6%E7%9A%84%E7%9F%A9%E9%98%B5%E3%80%82%E8%BF%99%E4%B9%9F%E6%98%AF%E4%B8%BA%E4%BB%80%E4%B9%88%E5%89%8D%E9%9D%A2%E8%A6%81%E7%94%A8%60.unsqueeze(1)%60%E5%A2%9E%E5%8A%A01%E4%B8%AA%E7%BB%B4%E5%BA%A6%E7%9A%84%E5%8E%9F%E5%9B%A0%E3%80%82%0A%0A%E4%BB%A5%E4%B8%8A%E5%B0%B1%E6%98%AF%20getattnpadmask%E5%87%BD%E6%95%B0%E7%9A%84%E5%AE%9E%E7%8E%B0%E3%80%82%0A%0A%23%23%23%23%23%20%E5%89%8D%E9%A6%88%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C%E5%92%8C%E8%87%AA%E6%B3%A8%E6%84%8F%E5%8A%9B%E5%B1%82%0A%0A%E4%B9%8B%E5%90%8E%E7%BB%A7%E7%BB%ADEncoder%E5%B1%82%E7%9A%84forward%E5%87%BD%E6%95%B0%EF%BC%8C%E6%9D%A5%E5%88%86%E6%9E%90getattnpadmask%E5%87%BD%E6%95%B0%E5%90%8E%E9%9D%A2%E7%9A%84%E4%BB%A3%E7%A0%81%EF%BC%9A%0A%0A%60%60%60python%0Aencselfattns%20%3D%20%5B%5D%0Afor%20layer%20in%20self.layers%3A%23%20%E5%8E%BB%E7%9C%8BEncoderLayer%E5%B1%82%E5%87%BD%E6%95%B0%3A5.%0A%20%20%20%20encoutputs%2C%20encselfattn%20%3D%20layer(enc_outputs%2C%20enc_self_attn_mask)%0A%20%20%20%20enc_self_attns.append(enc_self_attn)%0A%20%20%20%20return%20enc_outputs%2C%20enc_self_attns%0A%60%60%60%0A%0A%E8%BF%99%E6%98%AF%E5%89%8D%E9%A6%88%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C%E5%92%8C%E8%87%AA%E6%B3%A8%E6%84%8F%E5%8A%9B%E5%B1%82%E7%9A%84%E7%BB%84%E6%88%90%E9%83%A8%E5%88%86%E3%80%82%E5%85%B6%E4%B8%AD%E7%9A%84for%E5%BE%AA%E7%8E%AF%E6%98%AF%E6%8A%8A%E6%AF%8F%E4%B8%80%E5%B1%82%E7%9A%84%E8%BE%93%E5%87%BA%E4%BD%9C%E4%B8%BA%E4%B8%8B%E4%B8%80%E5%B1%82%E7%9A%84%E8%BE%93%E5%85%A5%EF%BC%8C%E5%8F%AA%E9%9C%80%E5%88%86%E6%9E%90%E4%B8%80%E5%B1%82%E7%9A%84%E4%BB%A3%E7%A0%81%E5%8D%B3%E5%8F%AF%EF%BC%9A%0A%0A%60%60%60python%0Aenc_outputs%2C%20enc_self_attn%20%3D%20layer(enc_outputs%2C%20enc_self_attn_mask)%0A%60%60%60%0A%0Aenc_outputs%E6%98%AF%E4%B8%8A%E4%B8%80%E5%B1%82%E7%BC%96%E7%A0%81%E5%99%A8%E7%9A%84%E8%BE%93%E5%87%BA%EF%BC%8Cenc_self_attn_mask%E6%98%AF%20get_attn_pad_mask%E5%87%BD%E6%95%B0%E5%BE%97%E5%88%B0%E7%9A%84%E7%AC%A6%E5%8F%B7%E7%9F%A9%E9%98%B5%EF%BC%8C%E8%BF%99%E4%B8%A4%E4%B8%AA%E4%BD%9C%E4%B8%BA%E6%AF%8F%E4%B8%80%E5%B1%82%E7%9A%84%E8%BE%93%E5%85%A5%EF%BC%8C%E5%85%B7%E4%BD%93%E5%AE%9E%E7%8E%B0%EF%BC%88%E5%BE%97%E5%88%B0%E8%BE%93%E5%87%BA%E7%9A%84%E8%BF%87%E7%A8%8B%EF%BC%89%E7%9C%8B%E4%B8%8B%E9%9D%A2%E7%9A%84%E5%87%BD%E6%95%B0%EF%BC%9A%0A%0A%60%60%60python%0A%23%205.%20EncoderLayer%20%EF%BC%9A%E5%8C%85%E5%90%AB%E4%B8%A4%E4%B8%AA%E9%83%A8%E5%88%86%EF%BC%8C%E5%A4%9A%E5%A4%B4%E6%B3%A8%E6%84%8F%E5%8A%9B%E6%9C%BA%E5%88%B6%E5%92%8C%E5%89%8D%E9%A6%88%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C%0Aclass%20EncoderLayer(nn.Module)%3A%0A%20%20%20%20def%20__init(self)%3A%20%23%20%E5%88%9D%E5%A7%8B%E5%8C%96%E5%87%BD%E6%95%B0%0A%20%20%20%20%20%20%20%20super(EncoderLayer%2C%20self).__init()%0A%20%20%20%20%20%20%20%20self.enc_self_attn%20%3D%20MultiHeadAttention()%20%23%20%E8%87%AA%E6%B3%A8%E6%84%8F%E5%8A%9B%E5%B1%82%EF%BC%8C%E4%BD%BF%E7%94%A8%E4%BA%86%E5%A4%9A%E5%A4%B4%E6%B3%A8%E6%84%8F%E5%8A%9B%E6%9C%BA%E5%88%B6%EF%BC%8C%E6%A0%B8%E5%BF%83%E9%83%A8%E5%88%86%0A%20%20%20%20%20%20%20%20self.pos_ffn%20%3D%20PoswiseFeedForwardNet()%20%23%20%E5%89%8D%E9%A6%88%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C%E5%B1%82%EF%BC%8C%E5%B0%B1%E6%98%AFLinear%E5%B1%82%0A%0A%20%20%20%20def%20forward(self%2C%20enc_inputs%2C%20enc_self_attn_mask)%3A%0A%20%20%20%20%20%20%20%20enc_outputs%2C%20attn%20%3D%20self.enc_self_attn(enc_inputs%2C%20enc_inputs%2C%20enc_inputs%2C%20enc_self_attn_mask)%20%23%20enc_inputs%3A%20%5Bbatch_size%20x%20seq_len_q%20x%20d_model%5D%0A%20%20%20%20%20%20%20%20enc_outputs%20%3D%20self.pos_ffn(enc_outputs)%20%20%23%20enc_outputs%3A%20%5Bbatch_size%20x%20len_q%20x%20d_model%5D%0A%20%20%20%20%20%20%20%20return%20enc_outputs%2C%20attn%0A%60%60%60%0A%0A%E5%9C%A8%E5%88%9D%E5%A7%8B%E5%8C%96%E5%87%BD%E6%95%B0%E4%B8%AD%EF%BC%8C%E5%85%88%E6%98%AF%E8%87%AA%E6%B3%A8%E6%84%8F%E5%8A%9B%E5%B1%82enc_self_attn%EF%BC%8C%E4%BD%BF%E7%94%A8%E4%BA%86%E5%A4%9A%E5%A4%B4%E6%B3%A8%E6%84%8F%E5%8A%9B%E6%9C%BA%E5%88%B6%EF%BC%8C%E6%98%AF%E6%95%B4%E4%B8%AA%E4%BB%A3%E7%A0%81%E7%9A%84%E6%A0%B8%E5%BF%83%E9%83%A8%E5%88%86%EF%BC%9B%E7%84%B6%E5%90%8E%E6%98%AF%E5%89%8D%E9%A6%88%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C%E5%B1%82%EF%BC%8C%E5%85%B6%E5%AE%9E%E5%B0%B1%E6%98%AFLinear%E5%B1%82%EF%BC%88%E5%85%A8%E8%BF%9E%E6%8E%A5%E5%B1%82%EF%BC%89%E3%80%82%0A%0A%E5%9C%A8%E5%AE%9E%E7%8E%B0%E5%87%BD%E6%95%B0forward%E4%B8%AD%EF%BC%8C%E9%A6%96%E5%85%88%E6%98%AF%E8%87%AA%E6%B3%A8%E6%84%8F%E5%8A%9B%E5%B1%82%EF%BC%9A%0A%0A%60%60%60python%0Aenc_outputs%2C%20attn%20%3D%20self.enc_self_attn(enc_inputs%2C%20enc_inputs%2C%20enc_inputs%2C%20enc_self_attn_mask)%0A%60%60%60%0A%0A%E5%AE%83%E6%9C%894%E4%B8%AA%E8%BE%93%E5%85%A5%EF%BC%8C%E5%89%8D3%E4%B8%AAenc_inputs%E4%B8%8E%E6%9C%80%E5%8E%9F%E5%A7%8B%E7%9A%84Q%20K%20V%E7%9F%A9%E9%98%B5%E7%9A%84%E5%BD%A2%E7%8A%B6%E7%9B%B8%E5%90%8C%EF%BC%8C%E9%83%BD%E6%98%AF%5Bbatch_size%20%C3%97%20seq_len_q%20%C3%97%20d_model%5D%EF%BC%9B%E7%AC%AC4%E4%B8%AA%E8%BE%93%E5%85%A5enc_self_attn_mask%E6%98%AF%E5%89%8D%E9%9D%A2%E5%BE%97%E5%88%B0%E7%9A%84%E7%AC%A6%E5%8F%B7%E7%9F%A9%E9%98%B5%E3%80%82%E4%B8%8B%E9%9D%A2%E6%9D%A5%E5%88%86%E6%9E%90%E8%BF%99%E4%B8%80%E5%B1%82%E7%9A%84%E5%AE%9E%E7%8E%B0%EF%BC%9A%0A%0A%23%23%23%23%23%23%20%E8%87%AA%E6%B3%A8%E6%84%8F%E5%8A%9B%E5%B1%82%0A%0A%E5%AE%9E%E7%8E%B0%E4%BB%A3%E7%A0%81%EF%BC%9A%0A%0A%60%60%60python%0A%23%206.%20MultiHeadAttention%0Aclass%20MultiHeadAttention(nn.Module)%3A%0A%20%20%20%20def%20__init(self)%3A%0A%20%20%20%20%20%20%20%20super(MultiHeadAttention%2C%20self).__init()%0A%20%20%20%20%20%20%20%20%23%20%E8%BE%93%E5%85%A5%E8%BF%9B%E6%9D%A5%E7%9A%84QKV%E6%98%AF%E7%9B%B8%E7%AD%89%E7%9A%84%EF%BC%8C%E6%88%91%E4%BB%AC%E4%BC%9A%E4%BD%BF%E7%94%A8%E6%98%A0%E5%B0%84linear%E5%81%9A%E4%B8%80%E4%B8%AA%E6%98%A0%E5%B0%84%E5%BE%97%E5%88%B0%E5%8F%82%E6%95%B0%E7%9F%A9%E9%98%B5Wq%2C%20Wk%2C%20Wv%0A%20%20%20%20%20%20%20%20self.W_Q%20%3D%20nn.Linear(d_model%2C%20d_k%20%20n_heads)%0A%20%20%20%20%20%20%20%20self.W_K%20%3D%20nn.Linear(d_model%2C%20d_k%20%20n_heads)%0A%20%20%20%20%20%20%20%20self.W_V%20%3D%20nn.Linear(d_model%2C%20d_v%20%20n_heads)%0A%20%20%20%20%20%20%20%20self.linear%20%3D%20nn.Linear(n_heads%20%20d_v%2C%20d_model)%0A%20%20%20%20%20%20%20%20self.layer_norm%20%3D%20nn.LayerNorm(d_model)%0A%0A%20%20%20%20def%20forward(self%2C%20Q%2C%20K%2C%20V%2C%20attn_mask)%3A%0A%0A%20%20%20%20%20%20%20%20%23%20%E8%BF%99%E4%B8%AA%E5%A4%9A%E5%A4%B4%E5%88%86%E4%B8%BA%E8%BF%99%E5%87%A0%E4%B8%AA%E6%AD%A5%E9%AA%A4%EF%BC%8C%E9%A6%96%E5%85%88%E6%98%A0%E5%B0%84%E5%88%86%E5%A4%B4%EF%BC%8C%E7%84%B6%E5%90%8E%E8%AE%A1%E7%AE%97atten_scores%EF%BC%8C%E7%84%B6%E5%90%8E%E8%AE%A1%E7%AE%97atten_value%3B%0A%20%20%20%20%20%20%20%20residual%2C%20batch_size%20%3D%20Q%2C%20Q.size(0)%0A%20%20%20%20%20%20%20%20%23%20(B%2C%20S%2C%20D)%20-proj-%3E%20(B%2C%20S%2C%20D)%20-split-%3E%20(B%2C%20S%2C%20H%2C%20W)%20-trans-%3E%20(B%2C%20H%2C%20S%2C%20W)%0A%0A%20%20%20%20%20%20%20%20%23%E4%B8%8B%E9%9D%A2%E8%BF%99%E4%B8%AA%E5%B0%B1%E6%98%AF%E5%85%88%E6%98%A0%E5%B0%84%EF%BC%8C%E5%90%8E%E5%88%86%E5%A4%B4%EF%BC%9B%E4%B8%80%E5%AE%9A%E8%A6%81%E6%B3%A8%E6%84%8F%E7%9A%84%E6%98%AFq%E5%92%8Ck%E5%88%86%E5%A4%B4%E4%B9%8B%E5%90%8E%E7%BB%B4%E5%BA%A6%E6%98%AF%E4%B8%80%E8%87%B4%E7%9A%84%EF%BC%8C%E6%89%80%E4%BB%A5%E4%B8%80%E7%9C%8B%E8%BF%99%E9%87%8C%E9%83%BD%E6%98%AFdk%0A%20%20%20%20%20%20%20%20q_s%20%3D%20self.W_Q(Q).view(batch_size%2C%20-1%2C%20n_heads%2C%20d_k).transpose(1%2C2)%20%20%23%20q_s%3A%20%5Bbatch_size%20x%20n_heads%20x%20len_q%20x%20d_k%5D%0A%20%20%20%20%20%20%20%20k_s%20%3D%20self.W_K(K).view(batch_size%2C%20-1%2C%20n_heads%2C%20d_k).transpose(1%2C2)%20%20%23%20k_s%3A%20%5Bbatch_size%20x%20n_heads%20x%20len_k%20x%20d_k%5D%0A%20%20%20%20%20%20%20%20v_s%20%3D%20self.W_V(V).view(batch_size%2C%20-1%2C%20n_heads%2C%20d_v).transpose(1%2C2)%20%20%23%20v_s%3A%20%5Bbatch_size%20x%20n_heads%20x%20len_k%20x%20d_v%5D%0A%0A%20%20%20%20%20%20%20%20%23%20%E8%BE%93%E5%85%A5%E8%BF%9B%E6%9D%A5%E7%9A%84attn_mask%E5%BD%A2%E7%8A%B6%E6%98%AF%20batch_size%20x%20len_q%20x%20len_k%EF%BC%8C%E7%BB%8F%E8%BF%87%E4%B8%8B%E9%9D%A2%E8%BF%99%E4%B8%AA%E4%BB%A3%E7%A0%81%E5%BE%97%E5%88%B0%E6%96%B0%E7%9A%84attn_mask%20%3A%20%5Bbatch_size%20x%20n_heads%20x%20len_q%20x%20len_k%5D%E3%80%82%E5%B0%B1%E6%98%AF%E6%8A%8Apad%E4%BF%A1%E6%81%AF%E9%87%8D%E5%A4%8D%E5%88%B0%E4%BA%86n%E4%B8%AA%E5%A4%B4%E4%B8%8A%0A%20%20%20%20%20%20%20%20attn_mask%20%3D%20attn_mask.unsqueeze(1).repeat(1%2C%20n_heads%2C%201%2C%201)%0A%0A%20%20%20%20%20%20%20%20%23%20%E7%84%B6%E5%90%8E%E8%AE%A1%E7%AE%97%20ScaledDotProductAttention%20%E8%BF%99%E4%B8%AA%E5%87%BD%E6%95%B0%EF%BC%8C%E5%8E%BB7.%E7%9C%8B%E4%B8%80%E4%B8%8B%0A%20%20%20%20%20%20%20%20%23%20%E5%BE%97%E5%88%B0%E7%9A%84%E7%BB%93%E6%9E%9C%E6%9C%89%E4%B8%A4%E4%B8%AA%EF%BC%9Acontext%3A%20%5Bbatch_size%20x%20n_heads%20x%20len_q%20x%20d_v%5D%2C%20attn%3A%20%5Bbatch_size%20x%20n_heads%20x%20len_q%20x%20len_k%5D%0A%20%20%20%20%20%20%20%20context%2C%20attn%20%3D%20ScaledDotProductAttention()(q_s%2C%20k_s%2C%20v_s%2C%20attn_mask)%0A%20%20%20%20%20%20%20%20context%20%3D%20context.transpose(1%2C%202).contiguous().view(batch_size%2C%20-1%2C%20n_heads%20%20d_v)%20%23%20context%3A%20%5Bbatch_size%20x%20len_q%20x%20n_heads%20%20d_v%5D%0A%20%20%20%20%20%20%20%20output%20%3D%20self.linear(context)%0A%20%20%20%20%20%20%20%20return%20self.layer_norm(output%20%2B%20residual)%2C%20attn%20%23%20output%3A%20%5Bbatch_size%20x%20len_q%20x%20d_model%5D%0A%60%60%60%0A%0A-%20%E5%88%9D%E5%A7%8B%E5%8C%96%E5%87%BD%E6%95%B0%60def%20__init(self)%60%E4%B8%AD%EF%BC%9A%0A%0A%60%60%60python%0Aself.W_Q%20%3D%20nn.Linear(d_model%2C%20d_k%20%20n_heads)%20%23%20d_model%20%E6%98%A0%E5%B0%84%E5%88%B0%20d_k%20%20n_heads%0Aself.W_K%20%3D%20nn.Linear(d_model%2C%20d_k%20%20n_heads)%20%23%20d_model%20%E6%98%A0%E5%B0%84%E5%88%B0%20d_k%20%20n_heads%0Aself.W_V%20%3D%20nn.Linear(d_model%2C%20d_v%20%20n_heads)%20%23%20d_model%20%E6%98%A0%E5%B0%84%E5%88%B0%20d_v%20%20n_heads%0A%60%60%60%0A%0A%E5%8F%AF%E4%BB%A5%E7%9C%8B%E5%88%B0W_Q%E5%92%8CW_K%E9%83%BD%E6%98%AF%E5%B0%86d_model%20%E6%98%A0%E5%B0%84%E5%88%B0%20d_k%20%20n_heads%EF%BC%8C%E8%A6%81%E4%BF%9D%E8%AF%81%E6%9C%80%E5%90%8E%E5%BE%97%E5%88%B0%E7%9A%84Q%E3%80%81K%E7%9F%A9%E9%98%B5%E7%9A%84%E7%BB%B4%E5%BA%A6%E6%98%AF%E7%9B%B8%E5%90%8C%E7%9A%84%EF%BC%88%E5%9B%A0%E4%B8%BA%E5%9C%A8%E8%AE%A1%E7%AE%97%E6%B3%A8%E6%84%8F%E5%8A%9B%E5%80%BC%E7%9A%84%E5%85%AC%E5%BC%8F%E4%B8%AD%E6%9C%89%24QK%5E%7BT%7D%20%24%EF%BC%8C%E5%A6%82%E6%9E%9CQ%E3%80%81K%E7%9F%A9%E9%98%B5%E7%9A%84%E7%BB%B4%E5%BA%A6%E4%B8%8D%E5%90%8C%E5%B0%B1%E6%97%A0%E6%B3%95%E7%9B%B8%E4%B9%98%EF%BC%89%E3%80%82%0A%0A-%20%E5%AE%9E%E7%8E%B0%E5%87%BD%E6%95%B0%60forward(self%2C%20Q%2C%20K%2C%20V%2C%20attn_mask)%60%E4%B8%AD%EF%BC%9A%0A%0A%E8%BE%93%E5%85%A5%E8%BF%9B%E6%9D%A5%E7%9A%84%E6%95%B0%E6%8D%AE%E5%BD%A2%E7%8A%B6%EF%BC%9A%20Q%3A%20%5Bbatch_size%20%C3%97%20len_q%20%C3%97%20d_model%5D%EF%BC%8CK%3A%20%5Bbatch_size%20%C3%97%20len_k%20%C3%97%20d_model%5D%EF%BC%8CV%3A%20%5Bbatch_size%20%C3%97%20len_k%20%C3%97%20d_model%5D%E3%80%82%0A%0A%E4%B8%8B%E9%9D%A2%E7%9A%84%E4%BB%A3%E7%A0%81%E9%9D%9E%E5%B8%B8%E9%87%8D%E8%A6%81%EF%BC%8C%E9%A6%96%E5%85%88%60W_Q(Q)%60%E6%98%AF%E5%AF%B9Q%E7%9F%A9%E9%98%B5%E8%BF%9B%E8%A1%8C%E6%98%A0%E5%B0%84%EF%BC%88%E5%8F%82%E8%80%83%E4%B8%8A%E9%9D%A2%E7%9A%84%E5%88%9D%E5%A7%8B%E5%8C%96%E5%87%BD%E6%95%B0%E5%AF%B9W_Q%E7%9A%84%E5%AE%9A%E4%B9%89%EF%BC%89%EF%BC%9B%E7%84%B6%E5%90%8E%E4%BD%BF%E7%94%A8view%E5%87%BD%E6%95%B0%E8%BF%9B%E8%A1%8C%E5%88%86%E5%A4%B4%EF%BC%8C%E5%88%86%E6%88%90%E4%BA%86n_heads%E4%B8%AA%E5%A4%B4%EF%BC%8C%E6%AF%8F%E4%B8%AA%E5%A4%B4%E6%98%AFd_k%E7%BB%B4%E5%BA%A6%EF%BC%9Btranspose(1%2C2)%E6%98%AF%E4%BA%A4%E6%8D%A2%E7%AC%AC1%E7%BB%B4%E5%92%8C%E7%AC%AC2%E7%BB%B4%EF%BC%88%E7%BB%B4%E5%BA%A6%E4%BB%8E0%E5%BC%80%E5%A7%8B%EF%BC%89%E3%80%82%0A%0A%60%60%60python%0Aq_s%20%3D%20self.W_Q(Q).view(batch_size%2C%20-1%2C%20n_heads%2C%20d_k).transpose(1%2C2)%20%20%23%20q_s%3A%20%5Bbatch_size%20x%20n_heads%20x%20len_q%20x%20d_k%5D%0Ak_s%20%3D%20self.W_K(K).view(batch_size%2C%20-1%2C%20n_heads%2C%20d_k).transpose(1%2C2)%20%20%23%20k_s%3A%20%5Bbatch_size%20x%20n_heads%20x%20len_k%20x%20d_k%5D%0Av_s%20%3D%20self.W_V(V).view(batch_size%2C%20-1%2C%20n_heads%2C%20d_v).transpose(1%2C2)%20%20%23%20v_s%3A%20%5Bbatch_size%20x%20n_heads%20x%20len_k%20x%20d_v%5D%0A%60%60%60%0A%0A%E6%B3%A8%E6%84%8F%E8%BF%99%E9%87%8C%E5%BE%97%E5%88%B0%E7%9A%84q%E3%80%81k%E7%9F%A9%E9%98%B5%E7%9A%84%E7%BB%B4%E5%BA%A6%E6%98%AF%E7%9B%B8%E5%90%8C%E7%9A%84%EF%BC%8C%E5%AE%83%E4%BB%AC%E7%9A%84%E6%AF%8F%E4%B8%AA%E5%A4%B4%E9%83%BD%E6%98%AFd_k%E7%BB%B4%E5%BA%A6%E3%80%82%0A%0A%3E%20%E5%9B%A0%E4%B8%BA%E5%9C%A8%E8%AE%A1%E7%AE%97%E6%B3%A8%E6%84%8F%E5%8A%9B%E5%80%BC%E7%9A%84%E5%85%AC%E5%BC%8F%E4%B8%AD%E6%9C%89%24QK%5E%7BT%7D%20%24%EF%BC%8C%E5%A6%82%E6%9E%9CQ%E3%80%81K%E7%9F%A9%E9%98%B5%E7%9A%84%E7%BB%B4%E5%BA%A6%E4%B8%8D%E5%90%8C%E5%B0%B1%E6%97%A0%E6%B3%95%E7%9B%B8%E4%B9%98%E3%80%82%0A%3E%0A%3E%20%E5%8D%95%E4%B8%AAq%E3%80%81k%E7%9F%A9%E9%98%B5%E7%9A%84%E7%BB%B4%E5%BA%A6%E6%98%AFbatch_size%20%C3%97%20len_q%20%C3%97%20d_model%E5%92%8Cbatch_size%20%C3%97%20len_k%20%C3%97%20d_model%EF%BC%8C%E5%85%B6%E4%B8%ADlen_q%20%3D%20len_k%20%3D%20%E5%8F%A5%E5%AD%90%E9%95%BF%E5%BA%A6%20seq_len%EF%BC%8Cn_heads%E8%A1%A8%E7%A4%BA%E6%9C%89n_heads%E4%B8%AAq%E3%80%81k%E7%9F%A9%E9%98%B5%EF%BC%8C%E6%89%80%E4%BB%A5q%E3%80%81k%E7%9A%84%E7%BB%B4%E5%BA%A6%E6%98%AF%E7%9B%B8%E5%90%8C%E7%9A%84%E3%80%82%E5%8F%AF%E4%BB%A5%E9%80%9A%E8%BF%87%E4%B8%8B%E5%9B%BE%E6%9D%A5%E7%90%86%E8%A7%A3%EF%BC%9A%0A%3E%0A%3E%20%3Cimg%20src%3D%22Transformer%E4%BB%A3%E7%A0%81%E8%AE%B2%E8%A7%A3.assets%2Fn_heads%E5%9B%BE%E7%A4%BA.png%22%20alt%3D%22n_heads%E5%9B%BE%E7%A4%BA%22%20style%3D%22zoom%3A%2030%25%3B%22%20%2F%3E%0A%0A%3E%20view()%E5%87%BD%E6%95%B0%E8%AE%B2%E8%A7%A3%EF%BC%9A%0A%3E%0A%3E%20view()%E5%87%BD%E6%95%B0%E8%A1%A8%E7%A4%BA%E9%87%8D%E6%96%B0%E8%B0%83%E6%95%B4Tensor%E7%9A%84%E5%BD%A2%E7%8A%B6%EF%BC%8C%E4%BE%8B%E5%A6%82%EF%BC%9A%0A%3E%0A%3E%20%60%60%60python%0A%3E%20a%3Dtorch.Tensor(%5B%5B%5B1%2C2%2C3%5D%2C%5B4%2C5%2C6%5D%5D%5D)%0A%3E%20print(a.view(3%2C2))%0A%3E%20’’’%0A%3E%20%E8%BE%93%E5%87%BA%EF%BC%9Atensor(%5B%5B1.%2C%202.%5D%2C%0A%3E%20%09%20%20%20%20%5B3.%2C%204.%5D%2C%0A%3E%20%20%20%20%20%20%20%20%20%5B5.%2C%206.%5D%5D)%0A%3E%20’’’%0A%3E%20%60%60%60%0A%3E%0A%3E%20%E5%85%B6%E4%B8%AD%E5%8F%82%E6%95%B0-1%E8%A1%A8%E7%A4%BA%E4%BF%9D%E8%AF%81%E5%85%83%E7%B4%A0%E7%9A%84%E6%80%BB%E6%95%B0%E4%B8%8D%E5%8F%98%E7%9A%84%E5%89%8D%E6%8F%90%E4%B8%8B%EF%BC%8C%E8%87%AA%E5%8A%A8%E8%B0%83%E6%95%B4%E8%BF%99%E4%B8%AA%E7%BB%B4%E5%BA%A6%E4%B8%8A%E7%9A%84%E5%85%83%E7%B4%A0%E4%B8%AA%E6%95%B0%E3%80%82%E9%82%A3%E4%B9%88%60view(batch_size%2C%20-1%2C%20n_heads%2C%20d_k)%60%E4%B9%8B%E5%90%8Eq%E7%9F%A9%E9%98%B5%E7%9A%84%E7%BB%B4%E5%BA%A6%E5%8F%98%E4%B8%BAbatch_size%20%C3%97%20len_q%20%C3%97%20n_heads%20%C3%97%20d_k%E3%80%82%0A%3E%0A%3E%20%E4%B9%8B%E5%90%8E%E5%86%8Dtranspose(1%2C2)%E5%B0%B1%E5%BE%97%E5%88%B0%E6%9C%80%E7%BB%88%E7%9A%84q%E7%9F%A9%E9%98%B5%EF%BC%8C%E5%BD%A2%E7%8A%B6%E6%98%AFbatch_size%20%C3%97%20n_heads%20%C3%97%20len_q%20%C3%97%20d_k%EF%BC%8Ck%E3%80%81v%E7%9F%A9%E9%98%B5%E5%90%8C%E7%90%86%E3%80%82%0A%0A%E7%94%B1%E4%BA%8E%E5%88%86%E5%A4%B4%E5%8E%9F%E6%9D%A5%E7%9A%84%E5%8D%95%E4%B8%AAq%E3%80%81k%E3%80%81v%E7%9F%A9%E9%98%B5%E5%8F%98%E6%88%90%E4%BA%86n_heads%E4%B8%AA%EF%BC%8C%E6%89%80%E4%BB%A5%E7%AC%A6%E5%8F%B7%E7%9F%A9%E9%98%B5%E4%B9%9F%E8%A6%81%E8%BF%9B%E8%A1%8C%E5%88%86%E5%A4%B4%EF%BC%8C%E5%88%86%E6%88%90n_heads%E4%B8%AA%E5%A4%B4%EF%BC%9A%0A%0A%60%60%60python%0Aattn_mask%20%3D%20attn_mask.unsqueeze(1).repeat(1%2C%20n_heads%2C%201%2C%201)%0A%60%60%60%0A%0A%E9%A6%96%E5%85%88%E5%AF%B9%E7%AC%A6%E5%8F%B7%E7%9F%A9%E9%98%B5attn_mask%E5%9C%A8%E7%AC%AC1%E7%BB%B4%E5%A4%84%E5%A2%9E%E5%8A%A0%E4%BA%86%E4%B8%80%E4%B8%AA%E7%BB%B4%E5%BA%A6%EF%BC%8C%E5%BD%A2%E7%8A%B6%E7%94%B1batch_size%20%C3%97%20len_q%20%C3%97%20len_k%E5%8F%98%E4%B8%BAbatch_size%20%C3%97%201%20%C3%97%20len_q%20%C3%97%20len_k%EF%BC%8C%E6%8E%A5%E7%9D%80%E7%94%A8repeat()%E5%87%BD%E6%95%B0%E5%9C%A8%E7%AC%AC1%E7%BB%B4%E9%87%8D%E5%A4%8Dn_heads%E6%AC%A1%EF%BC%8C%E5%BE%97%E5%88%B0%E6%9C%80%E7%BB%88%E7%9A%84%E7%AC%A6%E5%8F%B7%E7%9F%A9%E9%98%B5%EF%BC%8C%E5%85%B6%E5%BD%A2%E7%8A%B6%E4%B8%BAbatch_size%20%C3%97%20n_heads%20%C3%97%20len_q%20%C3%97%20len_k%E3%80%82%0A%0A———%0A%0A%E4%B9%8B%E5%90%8E%E6%98%AF*%E8%AE%A1%E7%AE%97Attention%E5%80%BC%EF%BC%8C%E4%BC%A0%E5%85%A5%E5%89%8D%E9%9D%A2%E8%AE%A1%E7%AE%97%E7%9A%84q%E3%80%81k%E3%80%81v%E7%9F%A9%E9%98%B5%E5%92%8C%E7%AC%A6%E5%8F%B7%E7%9F%A9%E9%98%B5attn_mask%EF%BC%9A%0A%0A%60%60%60python%0Acontext%2C%20attn%20%3D%20ScaledDotProductAttention()(q_s%2C%20k_s%2C%20v_s%2C%20attn_mask)%0A%60%60%60%0A%0A%E8%BF%99%E8%A1%8C%E4%BB%A3%E7%A0%81%E4%B9%9F%E5%B0%B1%E6%98%AF%E5%AE%9E%E7%8E%B0%E5%85%AC%E5%BC%8F%EF%BC%9A%24Attention(Q%2CK%2CV)%3Dsoftmax(%5Cfrac%7BQK%5E%7BT%7D%20%7D%7B%5Csqrt%7Bd%7Bk%7D%20%7D%20%7D%20)V%24%EF%BC%8C%E5%AE%9E%E7%8E%B0%E4%BB%A3%E7%A0%81%E5%A6%82%E4%B8%8B%EF%BC%9A%0A%0A%60%60%60python%0A%23%207.%20ScaledDotProductAttention%0Aclass%20ScaledDotProductAttention(nn.Module)%3A%0A%20%20%20%20def%20init(self)%3A%0A%20%20%20%20%20%20%20%20super(ScaledDotProductAttention%2C%20self).init()%0A%0A%20%20%20%20def%20forward(self%2C%20Q%2C%20K%2C%20V%2C%20attnmask)%3A%0A%20%20%20%20%20%20%20%20%23%20%E8%BE%93%E5%85%A5%E8%BF%9B%E6%9D%A5%E7%9A%84%E7%BB%B4%E5%BA%A6%E5%88%86%E5%88%AB%E6%98%AF%20%5Bbatch_size%20x%20n_heads%20x%20len_q%20x%20d_k%5D%20%20K%EF%BC%9A%20%5Bbatch_size%20x%20n_heads%20x%20len_k%20x%20d_k%5D%20%20V%3A%20%5Bbatch_size%20x%20n_heads%20x%20len_k%20x%20d_v%5D%0A%20%20%20%20%20%20%20%20%23%20%E9%A6%96%E5%85%88%E7%BB%8F%E8%BF%87matmul%E5%87%BD%E6%95%B0%E5%BE%97%E5%88%B0%E7%9A%84scores%E5%BD%A2%E7%8A%B6%E6%98%AF%20%3A%20%5Bbatch_size%20x%20n_heads%20x%20len_q%20x%20len_k%5D%0A%20%20%20%20%20%20%20%20scores%20%3D%20torch.matmul(Q%2C%20K.transpose(-1%2C%20-2))%20%2F%20np.sqrt(d_k)%0A%0A%20%20%20%20%20%20%20%20%23%20Fills%20elements%20of%20self%20tensor%20with%20value%20where%20mask%20is%20one.%0A%20%20%20%20%20%20%20%20scores.masked_fill(attnmask%2C%20-1e9)%20%0A%20%20%20%20%20%20%20%20attn%20%3D%20nn.Softmax(dim%3D-1)(scores)%0A%20%20%20%20%20%20%20%20context%20%3D%20torch.matmul(attn%2C%20V)%0A%20%20%20%20%20%20%20%20return%20context%2C%20attn%0A%60%60%60%0A%0A%E9%A6%96%E5%85%88%E8%AE%A1%E7%AE%97%24%5Cfrac%7BQK%5E%7BT%7D%20%7D%7B%5Csqrt%7Bd%7Bk%7D%20%7D%20%7D%24%EF%BC%9A%0A%0A%60%60%60python%0Ascores%20%3D%20torch.matmul(Q%2C%20K.transpose(-1%2C%20-2))%20%2F%20np.sqrt(dk)%0A%23%20torch.matmul%E8%A1%A8%E7%A4%BA%E7%9B%B8%E4%B9%98%EF%BC%8CK.transpose(-1%2C%20-2)%E5%B0%B1%E6%98%AFK%E7%9A%84%E8%BD%AC%E7%BD%AE%EF%BC%8Ctorch.matmul(Q%2C%20K.transpose(-1%2C%20-2))%E5%B0%B1%E6%98%AFQ%E4%B9%98%E4%BB%A5K%E7%9A%84%E8%BD%AC%E7%BD%AE%EF%BC%8Cnp.sqrt(d_k)%E6%98%AF%E6%A0%B9%E5%8F%B7d_k%E3%80%82%0A%60%60%60%0A%0A%E7%84%B6%E5%90%8E%E6%A0%B9%E6%8D%AE%E7%AC%A6%E5%8F%B7%E7%9F%A9%E9%98%B5attn_mask%E6%8A%8A%E8%A2%ABmask%E7%9A%84%E5%9C%B0%E6%96%B9%E7%BD%AE%E4%B8%BA%E6%97%A0%E9%99%90%E5%B0%8F%EF%BC%8Csoftmax%E4%B9%8B%E5%90%8E%E5%9F%BA%E6%9C%AC%E5%B0%B1%E6%98%AF0%EF%BC%8C%E5%AF%B9q%E7%9A%84%E5%8D%95%E8%AF%8D%E4%B8%8D%E8%B5%B7%E4%BD%9C%E7%94%A8%EF%BC%9A%0A%0A%60%60%60python%0A%23%20%E4%B9%9F%E5%B0%B1%E6%98%AF%E7%AC%A6%E5%8F%B7%E7%9F%A9%E9%98%B5%E4%B8%AD%E5%93%AA%E4%B8%AA%E4%BD%8D%E7%BD%AE%E7%9A%84%E5%85%83%E7%B4%A0%E6%98%AF1%EF%BC%8C%E5%B0%B1%E6%8A%8Ascores%E4%B8%AD%E8%BF%99%E4%B8%AA%E4%BD%8D%E7%BD%AE%E7%9A%84%E5%85%83%E7%B4%A0%E8%AE%BE%E4%B8%BA%E6%97%A0%E9%99%90%E5%B0%8F%0Ascores.masked_fill(attnmask%2C%20-1e9)%20%0Aattn%20%3D%20nn.Softmax(dim%3D-1)(scores)%20%23%20dim%3D-1%E8%A1%A8%E7%A4%BA%E5%AF%B9%E6%AF%8F%E4%B8%80%E6%A8%AA%E8%A1%8C%E5%81%9Asoftmax%0A%60%60%60%0A%0Asoftmax%E4%B9%8B%E5%90%8E%EF%BC%8C%E5%86%8D%E4%B8%8E%E7%9F%A9%E9%98%B5v%E7%9B%B8%E4%B9%98%EF%BC%9A%0A%0A%60%60%60pythhon%0Acontext%20%3D%20torch.matmul(attn%2C%20V)%0A%60%60%60%0A%0A———%0A%0A%E8%AE%A1%E7%AE%97Attention%E4%B9%8B%E5%90%8E%EF%BC%8C%E7%BB%A7%E7%BB%AD%E8%87%AA%E6%B3%A8%E6%84%8F%E5%8A%9B%E5%B1%82%E7%9A%84%E5%AE%9E%E7%8E%B0%E5%87%BD%E6%95%B0forward%EF%BC%8C%E5%90%8E%E9%9D%A2%E7%9A%84%E4%BB%A3%E7%A0%81%E5%B0%B1%E6%98%AF%E4%B8%80%E4%BA%9B%E5%B8%B8%E8%A7%84%E6%93%8D%E4%BD%9C%EF%BC%9A%0A%0A%60%60%60python%0A%20%20%20%20context%20%3D%20context.transpose(1%2C%202).contiguous().view(batchsize%2C%20-1%2C%20nheads%20%20d_v)%20%23%20context%3A%20%5Bbatch_size%20x%20len_q%20x%20n_heads%20%20dv%5D%0A%20%20%20%20output%20%3D%20self.linear(context)%0Areturn%20self.layernorm(output%20%2B%20residual)%2C%20attn%20%23%20output%3A%20%5Bbatch_size%20x%20len_q%20x%20d_model%5D%0A%60%60%60%0A%0A%0A%0A%0A%0A%0A%0A%0A%0A%0A%0A%23%23%23%23%23%23%20%E5%89%8D%E9%A6%88%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C%E5%B1%82%0A%0A%E8%87%AA%E6%B3%A8%E6%84%8F%E5%8A%9B%E5%B1%82%E7%BB%93%E6%9D%9F%E5%90%8E%EF%BC%8C%E5%B0%B1%E6%98%AFEncoder%E5%B1%82%E7%9A%84%E5%89%8D%E9%A6%88%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C%E5%B1%82%EF%BC%8C%E5%AF%B9%E5%BA%94%E4%BB%A3%E7%A0%81%E6%98%AFEncoder%E4%B8%AD%E5%AE%9E%E7%8E%B0%E5%87%BD%E6%95%B0forward%E4%B8%AD%E7%9A%84%E8%BF%99%E4%B8%80%E8%A1%8C%EF%BC%9A%0A%0A%60%60%60python%0Aenc_outputs%20%3D%20self.pos_ffn(enc_outputs)%20%20%23%20enc_outputs%3A%20%5Bbatch_size%20x%20len_q%20x%20d_model%5D%0A%60%60%60%0A%0A%0A%0A%0A%0A%0A%0A%0A%0A%0A%0A%23%23%23%204.4.Decoder%E5%B1%82%E7%9A%84%E5%AE%9A%E4%B9%89%0A%0ADecoder%E5%B1%82%E7%9A%84%E5%AE%9E%E7%8E%B0%E4%BB%A3%E7%A0%81%EF%BC%9A%0A%0A%60%60%60python%0A%23%23%209.%20Decoder%0A%0Aclass%20Decoder(nn.Module)%3A%0A%20%20%20%20def%20__init(self)%3A%0A%20%20%20%20%20%20%20%20super(Decoder%2C%20self).__init()%0A%20%20%20%20%20%20%20%20self.tgt_emb%20%3D%20nn.Embedding(tgt_vocab_size%2C%20d_model)%20%20%23%20%E5%AD%97%E7%AC%A6%E8%BD%AC%E4%B8%BA%E8%AF%8D%E5%90%91%E9%87%8F%0A%20%20%20%20%20%20%20%20self.pos_emb%20%3D%20PositionalEncoding(d_model)%20%20%23%20%E4%BD%8D%E7%BD%AE%E7%BC%96%E7%A0%81%0A%20%20%20%20%20%20%20%20self.layers%20%3D%20nn.ModuleList(%5BDecoderLayer()%20for%20%20in%20range(nlayers)%5D)%20%20%23%20%E8%A7%A3%E7%A0%81%E5%B1%82%E5%A0%86%E5%8F%A0n%E4%B8%AA%0A%0A%20%20%20%20%23%20decinputs%20%E8%A7%A3%E7%A0%81%E7%AB%AF%E8%BE%93%E5%85%A5%EF%BC%8Cencoutputs%20%E7%BC%96%E7%A0%81%E7%AB%AF%E8%BE%93%E5%87%BA%0A%20%20%20%20def%20forward(self%2C%20decinputs%2C%20encinputs%2C%20encoutputs)%3A%20%23%20decinputs%20%3A%20%5Bbatch_size%20x%20target_len%5D%0A%20%20%20%20%20%20%20%20dec_outputs%20%3D%20self.tgt_emb(dec_inputs)%20%20%23%20embedding%EF%BC%8C%5Bbatch_size%2C%20tgt_len%2C%20d_model%5D%0A%20%20%20%20%20%20%20%20dec_outputs%20%3D%20self.pos_emb(dec_outputs.transpose(0%2C%201)).transpose(0%2C%201)%20%23%20position%20encoding%EF%BC%8C%5Bbatch_size%2C%20tgt_len%2C%20d_model%5D%0A%20%20%20%20%20%20%20%20%23%20%E4%B8%8A%E9%9D%A2%E7%9A%84%E4%B8%A4%E8%A1%8C%E4%BB%A3%E7%A0%81%E5%89%8D%E9%9D%A2%E9%83%BD%E8%AE%B2%E8%BF%87%0A%0A%20%20%20%20%20%20%20%20%23%23%20Decoder%E6%A0%B8%E5%BF%83%E9%83%A8%E5%88%86%EF%BC%8Cget_attn_pad_mask%20%E8%87%AA%E6%B3%A8%E6%84%8F%E5%8A%9B%E5%B1%82%E7%9A%84pad%20%E9%83%A8%E5%88%86%0A%20%20%20%20%20%20%20%20dec_self_attn_pad_mask%20%3D%20get_attn_pad_mask(dec_inputs%2C%20dec_inputs)%0A%0A%20%20%20%20%20%20%20%20%23%23%20get_attn_subsequent_mask%20%E8%87%AA%E6%B3%A8%E6%84%8F%E5%8A%9B%E5%B1%82%E7%9A%84mask%E9%83%A8%E5%88%86%0A%20%20%20%20%20%20%20%20dec_self_attn_subsequent_mask%20%3D%20get_attn_subsequent_mask(dec_inputs)%0A%0A%20%20%20%20%20%20%20%20%23%23%20%E5%89%8D%E9%9D%A2%E4%B8%A4%E4%B8%AA%E7%9F%A9%E9%98%B5%E7%9B%B8%E5%8A%A0%EF%BC%8C%E5%A4%A7%E4%BA%8E0%E7%9A%84%E4%B8%BA1%EF%BC%8C%E4%B8%8D%E5%A4%A7%E4%BA%8E0%E7%9A%84%E4%B8%BA0%EF%BC%8C%E4%B8%BA1%E7%9A%84%E5%9C%A8%E4%B9%8B%E5%90%8E%E5%B0%B1%E4%BC%9A%E8%A2%ABfill%E5%88%B0%E6%97%A0%E9%99%90%E5%B0%8F%0A%20%20%20%20%20%20%20%20dec_self_attn_mask%20%3D%20torch.gt((dec_self_attn_pad_mask%20%2B%20dec_self_attn_subsequent_mask)%2C%200)%0A%0A%20%20%20%20%20%20%20%20%23%23%20%E8%BF%99%E4%B8%AA%E5%81%9A%E7%9A%84%E6%98%AF%E4%BA%A4%E4%BA%92%E6%B3%A8%E6%84%8F%E5%8A%9B%E6%9C%BA%E5%88%B6%E4%B8%AD%E7%9A%84mask%E7%9F%A9%E9%98%B5%EF%BC%8Cenc%E7%9A%84%E8%BE%93%E5%85%A5%E6%98%AFk%EF%BC%8C%E6%88%91%E5%8E%BB%E7%9C%8B%E8%BF%99%E4%B8%AAk%E9%87%8C%E9%9D%A2%E5%93%AA%E4%BA%9B%E6%98%AFpad%E7%AC%A6%E5%8F%B7%EF%BC%8C%E7%BB%99%E5%88%B0%E5%90%8E%E9%9D%A2%E7%9A%84%E6%A8%A1%E5%9E%8B%EF%BC%9B%E6%B3%A8%E6%84%8Fq%E8%82%AF%E5%AE%9A%E4%B9%9F%E6%9C%89pad%E7%AC%A6%E5%8F%B7%EF%BC%8C%E4%BD%86%E8%BF%99%E9%87%8C%E6%98%AF%E4%B8%8D%E5%9C%A8%E6%84%8F%E7%9A%84%EF%BC%8C%E4%B9%8B%E5%89%8D%E8%AF%B4%E4%BA%86%E5%A5%BD%E5%A4%9A%E6%AC%A1%E4%BA%86%E5%93%88%0A%20%20%20%20%20%20%20%20dec_enc_attn_mask%20%3D%20get_attn_pad_mask(dec_inputs%2C%20enc_inputs)%0A%0A%20%20%20%20%20%20%20%20dec_self_attns%2C%20dec_enc_attns%20%3D%20%5B%5D%2C%20%5B%5D%0A%20%20%20%20%20%20%20%20for%20layer%20in%20self.layers%3A%0A%20%20%20%20%20%20%20%20%20%20%20%20dec_outputs%2C%20dec_self_attn%2C%20dec_enc_attn%20%3D%20layer(dec_outputs%2C%20enc_outputs%2C%20dec_self_attn_mask%2C%20dec_enc_attn_mask)%0A%20%20%20%20%20%20%20%20%20%20%20%20dec_self_attns.append(dec_self_attn)%0A%20%20%20%20%20%20%20%20%20%20%20%20dec_enc_attns.append(dec_enc_attn)%0A%20%20%20%20%20%20%20%20return%20dec_outputs%2C%20dec_self_attns%2C%20dec_enc_attns%0A%60%60%60%0A%0A%E5%88%9D%E5%A7%8B%E5%8C%96%E5%87%BD%E6%95%B0%60def%20__init(self)%60%E7%9A%84%E4%BB%8B%E7%BB%8D%E5%86%99%E5%9C%A8%E4%BA%86%E4%BB%A3%E7%A0%81%E4%B8%AD%EF%BC%8C%E4%B8%8B%E9%9D%A2%E7%9B%B4%E6%8E%A5%E4%BB%8B%E7%BB%8D%E5%AE%9E%E7%8E%B0%E5%87%BD%E6%95%B0forward%EF%BC%9A%0A%0ADecoder%E5%B1%82%E5%9C%A8%E8%87%AA%E6%B3%A8%E6%84%8F%E5%8A%9B%E5%B1%82%E5%81%9A2%E4%B8%AAmask%EF%BC%8C%E4%B8%80%E4%B8%AA%E6%98%AF%E5%AF%B9%E8%87%AA%E8%BA%ABpad%E7%AC%A6%E5%8F%B7%E7%9A%84mask%EF%BC%8C%E4%B8%80%E4%B8%AA%E6%98%AF%E5%AF%B9%E5%BD%93%E5%89%8D%E5%8D%95%E8%AF%8D%E5%90%8E%E7%9C%8B%E4%B8%8D%E5%88%B0%E7%9A%84%E5%8D%95%E8%AF%8D%E7%9A%84mask%EF%BC%9B%E5%9C%A8%E4%BA%A4%E4%BA%92%E6%B3%A8%E6%84%8F%E5%8A%9B%E5%B1%82%EF%BC%8C%E5%8F%AA%E5%AF%B9%E7%BC%96%E7%A0%81%E5%B1%82%E9%82%A3%E4%BA%9B%E6%98%AFpad%E9%83%A8%E5%88%86%E7%9A%84%E5%8D%95%E8%AF%8D%E5%81%9Amask%E3%80%82%0A%0A%E9%A6%96%E5%85%88%E6%98%AF%E5%B0%86Decoder%E7%9A%84inputs%E4%B8%AD%20pad%E7%AC%A6%E5%8F%B7%E7%9A%84%E9%83%A8%E5%88%86%E7%BD%AE%E4%B8%BA1%EF%BC%8C%E6%9C%80%E5%90%8E%E5%BE%97%E5%88%B0%E7%AC%A6%E5%8F%B7%E7%9F%A9%E9%98%B5%EF%BC%9A%0A%0A%60%60%60python%0Adec_self_attn_pad_mask%20%3D%20get_attn_pad_mask(dec_inputs%2C%20dec_inputs)%0A%60%60%60%0A%0A%E7%84%B6%E5%90%8E%E6%98%AF%E5%AF%B9%E5%BD%93%E5%89%8D%E5%8D%95%E8%AF%8D%E4%B9%8B%E5%90%8E%E7%9C%8B%E4%B8%8D%E5%88%B0%E7%9A%84%E5%8D%95%E8%AF%8D%E5%81%9Amask%EF%BC%9A%0A%0A%60%60%60python%0Adec_self_attn_subsequent_mask%20%3D%20get_attn_subsequent_mask(dec_inputs)%0A%60%60%60%0A%0A%E8%BF%99%E8%A1%8C%E4%BB%A3%E7%A0%81%E5%BE%97%E5%88%B0%E4%B8%80%E4%B8%AA%E4%B8%8A%E4%B8%89%E8%A7%92%E4%B8%BA1%E7%9A%84%E7%9F%A9%E9%98%B5%EF%BC%8C%E5%A6%82%E5%9B%BE%E6%89%80%E7%A4%BA%EF%BC%9A%0A%0A%3Cimg%20src%3D%22Transformer%E4%BB%A3%E7%A0%81%E8%AE%B2%E8%A7%A3.assets%2F%E4%B8%8A%E4%B8%89%E8%A7%92%E4%B8%BA1.png%22%20alt%3D%22%E4%B8%8A%E4%B8%89%E8%A7%92%E4%B8%BA1%22%20style%3D%22zoom%3A%2033%25%3B%22%20%2F%3E%0A%0A%E8%BE%93%E5%85%A5%E4%B8%BAS%E6%97%B6%E5%8F%AA%E8%83%BD%E7%9C%8B%E5%88%B0S%EF%BC%8C%E7%9C%8B%E4%B8%8D%E5%88%B0%E5%8D%B7%E3%80%81%E8%B5%B7%E3%80%81%E6%9D%A5%EF%BC%9B%E8%BE%93%E5%85%A5%E4%B8%BA%E5%8D%B7%E6%97%B6%EF%BC%8C%E5%8F%AA%E8%83%BD%E7%9C%8B%E5%88%B0S%E5%92%8C%E5%8D%B7%EF%BC%8C%E7%9C%8B%E4%B8%8D%E5%88%B0%E8%B5%B7%E3%80%81%E6%9D%A5%EF%BC%8C%E5%90%8E%E9%9D%A2%E7%9A%84%E8%BE%93%E5%85%A5%E4%B9%9F%E4%BB%A5%E6%AD%A4%E7%B1%BB%E6%8E%A8%E3%80%82%0A%0A%E4%B9%8B%E5%90%8E%E5%B0%86%E7%AC%A6%E5%8F%B7%E7%9F%A9%E9%98%B5%E5%92%8C%E4%B8%8A%E4%B8%89%E8%A7%92%E7%9F%A9%E9%98%B5%E7%9B%B8%E5%8A%A0%EF%BC%9A%0A%0A%60%60%60python%0Adec_self_attn_mask%20%3D%20torch.gt((dec_self_attn_pad_mask%20%2B%20dec_self_attn_subsequent_mask)%2C%200)%0A%60%60%60%0A%0A%E7%9B%B8%E5%8A%A0%E5%90%8E%EF%BC%8C%E7%9F%A9%E9%98%B5%E4%B8%AD%E5%A4%A7%E4%BA%8E0%E7%9A%84%E5%85%83%E7%B4%A0%E4%B8%BA1%EF%BC%8C%E4%B8%8D%E5%A4%A7%E4%BA%8E0%E7%9A%84%E4%B8%BA0%EF%BC%8C%E4%B8%BA1%E7%9A%84%E5%9C%A8%E4%B9%8B%E5%90%8E%E5%B0%B1%E4%BC%9A%E8%A2%ABfill%E5%88%B0%E6%97%A0%E9%99%90%E5%B0%8F%E3%80%82%E8%BF%99%E6%A0%B7%E5%BE%97%E5%88%B0%E7%9A%84%E4%B9%9F%E6%98%AF%E4%B8%AA%E7%AC%A6%E5%8F%B7%E7%9F%A9%E9%98%B5%EF%BC%8C%E7%9F%A9%E9%98%B5%E4%B8%AD%E4%B8%BA1%E7%9A%84%E9%83%A8%E5%88%86%E5%B0%B1%E6%98%AF%E8%A2%ABmask%E7%9A%84%E9%83%A8%E5%88%86%EF%BC%8C%E4%B9%9F%E5%B0%B1%E6%98%AF%E8%A6%81%E5%BF%BD%E7%95%A5%E7%9A%84%E9%83%A8%E5%88%86%E3%80%82%0A%0A%E6%8E%A5%E7%9D%80%E6%98%AF%E5%81%9A%E4%BA%A4%E4%BA%92%E6%B3%A8%E6%84%8F%E5%8A%9B%E5%B1%82%E7%9A%84mask%E7%9F%A9%E9%98%B5%EF%BC%8C%E5%89%8D%E9%9D%A2%E8%AF%B4%E4%BA%A4%E4%BA%92%E6%B3%A8%E6%84%8F%E5%8A%9B%E5%B1%82%E5%8F%AA%E5%AF%B9%E7%BC%96%E7%A0%81%E5%B1%82%E9%82%A3%E4%BA%9B%E6%98%AFpad%E9%83%A8%E5%88%86%E7%9A%84%E5%8D%95%E8%AF%8D%E5%81%9Amask%EF%BC%8C%E8%BF%99%E9%87%8C%E7%9A%84%E8%BE%93%E5%85%A5enc_inputs%E5%B0%B1%E6%98%AF%E5%91%8A%E8%AF%89%E8%A7%A3%E7%A0%81%E7%AB%AF%E5%93%AA%E4%BA%9B%E9%83%A8%E5%88%86%E6%98%AFpad%E7%AC%A6%E5%8F%B7%EF%BC%9A%0A%0A%60%60%60python%0Adec_enc_attn_mask%20%3D%20get_attn_pad_mask(dec_inputs%2C%20enc_inputs)%0A%60%60%60%0A%0A%E7%9F%A5%E9%81%93%E5%93%AA%E4%BA%9B%E4%BD%8D%E7%BD%AE%E6%98%AFpad%E7%AC%A6%E5%8F%B7%E5%90%8E%EF%BC%8C%E6%8A%8A%E8%BF%99%E4%BA%9B%E4%BD%8D%E7%BD%AE%E7%9A%84%E5%85%83%E7%B4%A0%E7%BD%AE%E4%B8%BA1%EF%BC%8C%E5%B0%B1%E5%BE%97%E5%88%B0%E4%BA%86mask%E7%9F%A9%E9%98%B5%E3%80%82%0A%0A%E6%9C%80%E5%90%8E%E6%98%AFfor%E5%BE%AA%E7%8E%AF%EF%BC%8C%E5%85%B6%E4%B8%AD%E4%B8%BB%E8%A6%81%E6%98%AFDecoderLayer%E5%B1%82%EF%BC%8C%E4%BB%A3%E7%A0%81%E5%AE%9E%E7%8E%B0%E5%A6%82%E4%B8%8B%EF%BC%9A%0A%0A%60%60%60python%0A%23%23%2010.%0Aclass%20DecoderLayer(nn.Module)%3A%0A%20%20%20%20def%20__init(self)%3A%0A%20%20%20%20%20%20%20%20super(DecoderLayer%2C%20self).__init()%0A%20%20%20%20%20%20%20%20self.dec_self_attn%20%3D%20MultiHeadAttention()%20%23%20%E8%87%AA%E6%B3%A8%E6%84%8F%E5%8A%9B%E5%B1%82%0A%20%20%20%20%20%20%20%20self.dec_enc_attn%20%3D%20MultiHeadAttention()%20%23%20%E4%BA%A4%E4%BA%92%E6%B3%A8%E6%84%8F%E5%8A%9B%E5%B1%82%0A%20%20%20%20%20%20%20%20self.pos_ffn%20%3D%20PoswiseFeedForwardNet()%20%23%20%E5%89%8D%E9%A6%88%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C%0A%0A%20%20%20%20def%20forward(self%2C%20dec_inputs%2C%20enc_outputs%2C%20dec_self_attn_mask%2C%20dec_enc_attn_mask)%3A%0A%20%20%20%20%20%20%20%20dec_outputs%2C%20dec_self_attn%20%3D%20self.dec_self_attn(dec_inputs%2C%20dec_inputs%2C%20dec_inputs%2C%20dec_self_attn_mask)%0A%20%20%20%20%20%20%20%20dec_outputs%2C%20dec_enc_attn%20%3D%20self.dec_enc_attn(dec_outputs%2C%20enc_outputs%2C%20enc_outputs%2C%20dec_enc_attn_mask)%0A%20%20%20%20%20%20%20%20dec_outputs%20%3D%20self.pos_ffn(dec_outputs)%0A%20%20%20%20%20%20%20%20return%20dec_outputs%2C%20dec_self_attn%2C%20dec_enc_attn%0A%60%60%60%0A%0A%E6%95%B4%E4%B8%AA%E8%BF%87%E7%A8%8B%E4%B9%9F%E5%92%8CEncoder%E4%B8%AD%E7%9A%84%E5%B7%AE%E4%B8%8D%E5%A4%9A%E3%80%82%0A#card=math&code=%E5%9C%A8%E4%BB%A3%E7%A0%81%E4%B8%AD%E9%80%9A%E5%B8%B8%E5%B0%862i%2Fdmodel%E8%BF%99%E6%A0%B7%E7%9A%84%E6%AC%A1%E6%96%B9%E4%BD%BF%E7%94%A8log%E5%87%BD%E6%95%B0%E6%8B%BF%E4%B8%8B%E6%9D%A5%EF%BC%8C%E7%94%A8e%E6%8C%87%E6%95%B0%E6%9D%A5%E4%BB%A3%E6%9B%BF%EF%BC%8C%E4%B8%8A%E9%9D%A2%E5%85%AC%E5%BC%8F%E4%B8%AD%E7%9A%84%E6%9C%80%E5%90%8E%E4%B8%80%E4%B8%AA%E5%BC%8F%E5%AD%90%E5%B0%B1%E6%98%AF%E6%88%91%E4%BB%AC%E8%A6%81%E5%86%99%E5%88%B0%E4%BB%A3%E7%A0%81%E4%B8%AD%E7%9A%84%E5%BC%8F%E5%AD%90%EF%BC%88python%E4%B8%ADlog%E5%87%BD%E6%95%B0%E7%9A%84%E5%BA%95%E6%95%B0%E9%BB%98%E8%AE%A4%E4%B8%BAe%EF%BC%8C%E6%89%80%E4%BB%A5%E6%9C%80%E5%90%8E%E5%B0%86log_e%2010000%E7%AE%80%E5%86%99%E4%B8%BAlog10000%EF%BC%89%E3%80%82e%E6%8C%87%E6%95%B0%E9%83%A8%E5%88%86%E5%AF%B9%E5%BA%94%E7%9A%84%E4%BB%A3%E7%A0%81%E4%B8%BA%EF%BC%9A%0A%0A%60%60%60py%0Adiv_term%20%3D%20torch.exp%28torch.arange%280%2C%20d_model%2C%202%29.float%28%29%20%2A%20%28-math.log%2810000.0%29%20%2F%20d_model%29%29%0A%60%60%60%0A%0A%E5%85%B6%E4%B8%AD%60torch.arange%280%2C%20d_model%2C%202%29.float%28%29%60%E5%AF%B9%E5%BA%94%E7%9A%84%E6%98%AF%E5%85%83%E7%B4%A0%E7%B4%A2%E5%BC%95%EF%BC%882i%E6%88%962i%2B1%EF%BC%89%EF%BC%8C%60torch.arange%280%2C%20d_model%2C%202%29%60%E8%BF%94%E5%9B%9E%E4%B8%80%E4%B8%AA%E4%B8%80%E7%BB%B4%E7%9A%84%E5%BC%A0%E9%87%8F%EF%BC%8C%E5%BC%A0%E9%87%8F%E4%B8%AD%E7%9A%84%E5%85%83%E7%B4%A0%E5%8F%96%E5%80%BC%E8%8C%83%E5%9B%B4%E6%98%AF%5B0%2C%20d_model%29%EF%BC%8C%E6%AD%A5%E9%95%BF%E4%B8%BA2%E3%80%82%0A%0A%E4%BB%A5%E4%B8%8B%E4%BB%A3%E7%A0%81%E5%88%86%E5%88%AB%E5%AF%B9%E5%BA%94%E5%81%B6%E6%95%B0%E4%BD%8D%E7%BD%AE%E8%AE%A1%E7%AE%97%E5%85%AC%E5%BC%8F%E5%92%8C%E5%A5%87%E6%95%B0%E4%BD%8D%E7%BD%AE%E8%AE%A1%E7%AE%97%E5%85%AC%E5%BC%8F%EF%BC%9A%0A%0A%60%60%60python%0Ape%5B%3A%2C%200%3A%3A2%5D%20%3D%20torch.sin%28position%20%2A%20div_term%29%20%23%20%E5%81%B6%E6%95%B0%E4%BD%8D%E7%BD%AE%0Ape%5B%3A%2C%201%3A%3A2%5D%20%3D%20torch.cos%28position%20%2A%20div_term%29%20%23%20%E5%A5%87%E6%95%B0%E4%BD%8D%E7%BD%AE%0A%60%60%60%0A%0A%60pe%5B%3A%2C%200%3A%3A2%5D%60%E5%B0%B1%E6%98%AF%E4%BB%8E0%E5%8F%96%E5%88%B0%E7%BB%93%E5%B0%BE%EF%BC%8C%E6%AD%A5%E9%95%BF%E4%B8%BA2%EF%BC%8C%E4%BB%A3%E8%A1%A8%E7%9A%84%E6%98%AF%E5%81%B6%E6%95%B0%E4%BD%8D%E7%BD%AE%EF%BC%8C%60pe%5B%3A%2C%201%3A%3A2%5D%60%E5%B0%B1%E6%98%AF%E4%BB%8E1%E5%8F%96%E5%88%B0%E7%BB%93%E5%B0%BE%EF%BC%8C%E6%AD%A5%E9%95%BF%E4%B8%BA2%EF%BC%8C%E4%BB%A3%E8%A1%A8%E7%9A%84%E6%98%AF%E5%A5%87%E6%95%B0%E4%BD%8D%E7%BD%AE%E3%80%82%0A%0A%E4%B8%8B%E9%9D%A2%E7%9A%84%E4%BB%A3%E7%A0%81%E5%AF%B9%E4%B8%8A%E9%9D%A2%E5%BE%97%E5%88%B0%E7%9A%84pe%E7%9F%A9%E9%98%B5%E8%BF%9B%E8%A1%8C%E5%A2%9E%E7%BB%B4%E5%92%8C%E8%BD%AC%E7%BD%AE%EF%BC%9A%0A%0A%60%60%60python%0Ape%20%3D%20pe.unsqueeze%280%29.transpose%280%2C%201%29%0A%60%60%60%0A%0Ape%E7%9A%84%E5%88%9D%E5%A7%8B%E5%BD%A2%E7%8A%B6%E6%98%AFmax_len%C3%97d_model%EF%BC%9Bunsqueeze%280%29%E8%A1%A8%E7%A4%BA%E5%9C%A8%E7%AC%AC0%E4%B8%AA%E4%BD%8D%E7%BD%AE%E5%8A%A01%E4%B8%AA%E7%BB%B4%E5%BA%A6%EF%BC%8Cpe%E5%BD%A2%E7%8A%B6%E5%8F%98%E4%B8%BA1%C3%97max_len%C3%97d_model%EF%BC%9Btranspose%280%2C%201%29%E8%A1%A8%E7%A4%BA%E4%BA%A4%E6%8D%A2%E7%AC%AC0%E7%BB%B4%E5%92%8C%E7%AC%AC1%E7%BB%B4%EF%BC%8Cpe%E5%BD%A2%E7%8A%B6%E5%8F%98%E4%B8%BAmax_len%C3%971%C3%97d_model%E3%80%82%0A%0A%E4%B9%8B%E5%90%8E%E6%98%AF%E4%BD%8D%E7%BD%AE%E7%BC%96%E7%A0%81%E7%9A%84%E5%AE%9E%E7%8E%B0%E5%87%BD%E6%95%B0forward%EF%BC%9A%0A%0A%60%60%60python%0A%20%20%20%20def%20forward%28self%2C%20x%29%3A%0A%20%20%20%20%20%20%20%20%23%20x%3A%20%5Bseq_len%2C%20batch_size%2C%20d_model%5D%EF%BC%8Cx%E6%98%AFembedding%E5%90%8E%E7%9A%84%E8%AF%8D%E5%90%91%E9%87%8F%0A%20%20%20%20%20%20%20%20x%20%3D%20x%20%2B%20self.pe%5B%3Ax.size%280%29%2C%20%3A%5D%20%23%20embedding%E5%90%8E%E7%9A%84%E8%AF%8D%E5%90%91%E9%87%8Fx%E5%92%8C%E4%BD%8D%E7%BD%AE%E7%BC%96%E7%A0%81%E7%9B%B8%E5%8A%A0%0A%20%20%20%20%20%20%20%20return%20self.dropout%28x%29%0A%60%60%60%0A%0A%E5%85%B6%E4%B8%AD%60pe%5B%3Ax.size%280%29%2C%20%3A%5D%60%E8%A1%A8%E7%A4%BA%E7%AC%AC0%E7%BB%B4%E4%BB%8E%E5%BC%80%E5%A4%B4%E5%8F%96%E5%88%B0%E7%AC%ACx.size%280%29%E4%B8%AA%EF%BC%8C%E7%AC%AC1%E7%BB%B4%E4%BB%8E%E5%BC%80%E5%A4%B4%E5%8F%96%E5%88%B0%E7%BB%93%E5%B0%BE%E3%80%82%0A%0A%E4%BB%A5%E4%B8%8A%E5%B0%B1%E6%98%AF%E4%BD%8D%E7%BD%AE%E7%BC%96%E7%A0%81%E5%87%BD%E6%95%B0%E7%9A%84%E5%AE%9E%E7%8E%B0%E3%80%82%0A%0A%23%23%23%23%23%20%E7%AC%A6%E5%8F%B7%E7%9F%A9%E9%98%B5%0A%0A%E6%8E%A5%E4%B8%8B%E6%9D%A5%E6%98%AFEncoder%E5%B1%82forward%E5%87%BD%E6%95%B0%E4%B8%AD%20get_attn_pad_mask%E5%87%BD%E6%95%B0%E7%9A%84%E5%AE%9E%E7%8E%B0%EF%BC%88attn%E6%98%AF%E6%8C%87attention%EF%BC%89%EF%BC%9A%0A%0A%60%60%60python%0Aenc_self_attn_mask%20%3D%20get_attn_pad_mask%28enc_inputs%2C%20enc_inputs%29%0A%60%60%60%0A%0A%E8%BF%99%E4%B8%AA%E5%87%BD%E6%95%B0%E6%98%AF%E7%94%A8%E6%9D%A5%E5%91%8A%E8%AF%89%E5%90%8E%E9%9D%A2%E7%9A%84%E5%B1%82%E6%88%96%E6%A8%A1%E5%9E%8B%EF%BC%8C%E5%9C%A8%E5%8E%9F%E5%A7%8B%E5%8F%A5%E5%AD%90%E7%9A%84%E8%BE%93%E5%85%A5%E4%B8%AD%E5%93%AA%E4%BA%9B%E9%83%A8%E5%88%86%E6%98%AF%E8%A2%ABpad%E7%AC%A6%E5%8F%B7%E5%A1%AB%E5%85%85%E7%9A%84%EF%BC%8C%E5%9C%A8%E8%AE%A1%E7%AE%97%E8%87%AA%E6%B3%A8%E6%84%8F%E5%8A%9B%E5%92%8C%E4%BA%A4%E4%BA%92%E6%B3%A8%E6%84%8F%E5%8A%9B%E7%9A%84%E6%97%B6%E5%80%99%E5%8E%BB%E6%8E%89pad%E7%AC%A6%E5%8F%B7%E7%9A%84%E5%BD%B1%E5%93%8D%E3%80%82%0A%0A%3E%20%E4%B8%80%E4%B8%AAbatch%E4%B8%AD%E6%89%80%E6%9C%89%E5%8F%A5%E5%AD%90%E7%9A%84%E9%95%BF%E5%BA%A6%E6%98%AF%E4%B8%8D%E4%B8%80%E8%87%B4%E7%9A%84%EF%BC%8C%E4%B8%BA%E4%BA%86%E8%AE%A9%E5%8F%A5%E5%AD%90%E7%BB%84%E6%88%90%E7%9F%A9%E9%98%B5%E6%9B%B4%E5%A5%BD%E7%9A%84%E8%A2%AB%E6%A8%A1%E5%9E%8B%E5%A4%84%E7%90%86%E8%80%8C%E8%AE%BE%E5%AE%9A%E4%BA%86max_length%EF%BC%8C%E9%95%BF%E5%BA%A6%E8%B6%85%E5%87%BAmax_length%E7%9A%84%E9%83%A8%E5%88%86%E6%88%AA%E6%96%AD%E4%B8%8D%E8%A6%81%EF%BC%8C%E5%B0%8F%E4%BA%8Emax_length%E7%9A%84%E9%83%A8%E5%88%86%E7%94%A8pad%E7%AC%A6%E5%8F%B7%E5%A1%AB%E5%85%85%E3%80%82%0A%3E%0A%3E%20%E5%9C%A8%E8%AE%A1%E7%AE%97%E6%AF%8F%E4%B8%AA%E5%AD%97%E7%AC%A6%E4%B8%8E%E5%85%B6%E4%BB%96%E5%AD%97%E7%AC%A6%E7%9A%84attention%E5%80%BC%EF%BC%88%E7%9B%B8%E5%85%B3%E6%80%A7%EF%BC%89%E6%97%B6%EF%BC%8C%E6%9C%89%E8%BF%99%E6%A0%B7%E7%9A%84%E8%AE%A1%E7%AE%97%E5%85%AC%E5%BC%8F%EF%BC%9A%24Attention%28Q%2CK%2CV%29%3Dsoftmax%28%5Cfrac%7BQK%5E%7BT%7D%20%7D%7B%5Csqrt%7Bd%7Bk%7D%20%7D%20%7D%20%29V%24%EF%BC%8C%E5%85%B6%E4%B8%AD%24%5Cfrac%7BQK%5E%7BT%7D%20%7D%7B%5Csqrt%7Bd%7Bk%7D%20%7D%20%7D%20%24%E5%9C%A8%E8%AE%A1%E7%AE%97%E5%90%8E%E6%98%AF%E4%B8%80%E4%B8%AA%E7%9B%B8%E4%BC%BC%E5%BA%A6%E7%9F%A9%E9%98%B5%EF%BC%8C%E7%9B%B8%E4%BC%BC%E5%BA%A6%E7%9F%A9%E9%98%B5%E4%B8%AD%E5%8C%85%E5%90%AB%E6%AF%8F%E4%B8%AA%E5%AD%97%E7%AC%A6%E5%92%8Cpad%E7%AC%A6%E5%8F%B7%E7%9A%84%E7%9B%B8%E4%BC%BC%E5%BA%A6%EF%BC%8C%E4%BD%86pad%E7%AC%A6%E5%8F%B7%E5%8F%AA%E6%98%AF%E7%94%A8%E6%9D%A5%E5%A1%AB%E5%85%85%E5%8F%A5%E5%AD%90%E9%95%BF%E5%BA%A6%E7%9A%84%EF%BC%8C%E6%89%80%E4%BB%A5%E6%88%91%E4%BB%AC%E4%B8%8D%E9%9C%80%E8%A6%81%E8%AE%A1%E7%AE%97%E5%AD%97%E7%AC%A6%E5%92%8Cpad%E7%AC%A6%E5%8F%B7%E7%9A%84%E7%9B%B8%E4%BC%BC%E5%BA%A6%E3%80%82%E5%9B%A0%E6%AD%A4%E5%9C%A8%E8%AE%A1%E7%AE%97%E6%97%B6%E8%A6%81%E6%8A%8A%E7%9B%B8%E4%BC%BC%E5%BA%A6%E7%9F%A9%E9%98%B5%E4%B8%AD%E6%AF%8F%E4%B8%AA%E5%AD%97%E7%AC%A6%E5%92%8Cpad%E7%AC%A6%E5%8F%B7%E7%9A%84%E7%9B%B8%E4%BC%BC%E5%BA%A6%E5%8E%BB%E6%8E%89%EF%BC%8C%E5%8E%BB%E6%8E%89%E5%B0%B1%E9%9C%80%E8%A6%81%E7%9F%A5%E9%81%93%E8%BF%99%E4%B8%80%E7%9B%B8%E4%BC%BC%E5%BA%A6%E5%9C%A8%E7%9F%A9%E9%98%B5%E4%B8%AD%E7%9A%84%E4%BD%8D%E7%BD%AE%EF%BC%8Cget_attn_pad_mask%E5%87%BD%E6%95%B0%E5%B0%B1%E6%98%AF%E7%94%A8%E6%9D%A5%E8%8E%B7%E5%8F%96%E8%BF%99%E4%B8%80%E4%BD%8D%E7%BD%AE%E7%9A%84%E3%80%82%0A%3E%0A%3E%20%E8%BF%99%E4%B8%80%E8%BF%87%E7%A8%8B%E7%94%A8%E5%9B%BE%E6%9D%A5%E8%A1%A8%E7%A4%BA%E5%A6%82%E4%B8%8B%EF%BC%9A%0A%3E%0A%3E%20%3Cimg%20src%3D%22Transformer%E4%BB%A3%E7%A0%81%E8%AE%B2%E8%A7%A3.assets%2Fimage-20220516193406587.png%22%20alt%3D%22image-20220516193406587%22%20style%3D%22zoom%3A%2050%25%3B%22%20%2F%3E%0A%3E%0A%3E%20%E5%85%B6%E4%B8%AD20%E8%A1%A8%E7%A4%BA%E5%8D%B7%E5%92%8C%E5%8D%B7%E7%9A%84%E7%9B%B8%E4%BC%BC%E5%BA%A6%EF%BC%8C5%E8%A1%A8%E7%A4%BA%E5%8D%B7%E5%92%8C%E8%B5%B7%E7%9A%84%E7%9B%B8%E4%BC%BC%E5%BA%A6%EF%BC%8C%E4%BB%A5%E6%AD%A4%E7%B1%BB%E6%8E%A8%EF%BC%8C9%E8%A1%A8%E7%A4%BA%E5%8D%B7%E5%92%8Cpad%E7%AC%A6%E5%8F%B7%E7%9A%84%E7%9B%B8%E4%BC%BC%E5%BA%A6%EF%BC%8C%E9%82%A3%E4%B9%889%E8%BF%99%E4%B8%AA%E7%9B%B8%E4%BC%BC%E5%BA%A6%E6%98%AF%E4%B8%8D%E9%9C%80%E8%A6%81%E8%80%83%E8%99%91%E7%9A%84%EF%BC%8C%E8%A6%81%E5%9C%A8%E4%B9%8B%E5%90%8E%E7%9A%84%E8%AE%A1%E7%AE%97%E4%B8%AD%E5%BF%BD%E7%95%A5%E6%8E%89%E3%80%82%E6%8A%8A%E5%AD%97%E7%AC%A6%E4%B8%8Epad%E7%9A%84%E7%9B%B8%E4%BC%BC%E5%BA%A6%E5%9C%A8%E7%9F%A9%E9%98%B5%E4%B8%AD%E7%9A%84%E4%BD%8D%E7%BD%AE%E6%A0%87%E5%87%BA%E6%9D%A5%E3%80%81%E7%94%A81%E8%A1%A8%E7%A4%BA%EF%BC%8C%E5%B0%B1%E5%BE%97%E5%88%B0%E7%AC%A6%E5%8F%B7%E7%9F%A9%E9%98%B5%E3%80%82get_attn_pad_mask%E5%87%BD%E6%95%B0%E5%B0%B1%E6%98%AF%E7%94%A8%E6%9D%A5%E5%BE%97%E5%88%B0%E8%BF%99%E4%B8%AA%E7%AC%A6%E5%8F%B7%E7%9F%A9%E9%98%B5%E7%9A%84%E3%80%82%0A%0Aget_attn_pad_mask%E5%87%BD%E6%95%B0%E5%AE%9E%E7%8E%B0%E4%BB%A3%E7%A0%81%EF%BC%9A%0A%0A%60%60%60python%0Adef%20get_attn_pad_mask%28seq_q%2C%20seq_k%29%3A%0A%20%20%20%20batch_size%2C%20len_q%20%3D%20seq_q.size%28%29%0A%20%20%20%20batch_size%2C%20len_k%20%3D%20seq_k.size%28%29%0A%20%20%20%20pad_attn_mask%20%3D%20seq_k.data.eq%280%29.unsqueeze%281%29%20%20%23%20batch_size%20x%201%20x%20len_k%2C%20one%20is%20masking%0A%20%20%20%20return%20pad_attn_mask.expand%28batch_size%2C%20len_q%2C%20len_k%29%20%20%23%20batch_size%20x%20len_q%20x%20len_k%0A%60%60%60%0A%0A%E6%B3%A8%E6%84%8F%EF%BC%9Aseq_q%E5%92%8Cseq_k%E7%9A%84%E5%BD%A2%E7%8A%B6%E4%B8%8D%E4%B8%80%E5%AE%9A%E4%B8%80%E8%87%B4%E3%80%82%E5%9C%A8%E8%87%AA%E6%B3%A8%E6%84%8F%E5%8A%9B%E5%B1%82%EF%BC%8C%E5%AE%83%E4%BB%AC%E7%9A%84%E5%BD%A2%E7%8A%B6%E6%98%AF%E4%B8%80%E8%87%B4%E7%9A%84%E3%80%82%E5%9C%A8%E4%BA%A4%E4%BA%92%E6%B3%A8%E6%84%8F%E5%8A%9B%E5%B1%82%E4%B8%AD%EF%BC%8Cq%E6%9D%A5%E8%87%AA%E8%A7%A3%E7%A0%81%E7%AB%AF%EF%BC%8Ck%E6%9D%A5%E8%87%AA%E7%BC%96%E7%A0%81%E7%AB%AF%EF%BC%8C%E5%BD%A2%E7%8A%B6%E6%98%AF%E4%B8%8D%E4%B8%80%E8%87%B4%E7%9A%84%E3%80%82%E5%85%B6%E6%AC%A1%EF%BC%8C%E5%8F%AA%E5%91%8A%E8%AF%89%E6%A8%A1%E5%9E%8B%E7%BC%96%E7%A0%81%E7%AB%AF%E7%9A%84pad%E7%AC%A6%E5%8F%B7%E4%BF%A1%E6%81%AF%E5%B0%B1%E5%8F%AF%E4%BB%A5%EF%BC%8C%E8%A7%A3%E7%A0%81%E7%AB%AF%E7%9A%84pad%E4%BF%A1%E6%81%AF%E5%9C%A8%E4%BA%A4%E4%BA%92%E6%B3%A8%E6%84%8F%E5%8A%9B%E5%B1%82%E6%98%AF%E6%B2%A1%E6%9C%89%E7%94%A8%E5%88%B0%E7%9A%84%E3%80%82%0A%0A%E4%B8%8B%E9%9D%A2%E8%BF%99%E8%A1%8C%E4%BB%A3%E7%A0%81%E5%B0%B1%E6%98%AF%E7%94%A8%E6%9D%A5%E5%91%8A%E8%AF%89%E6%A8%A1%E5%9E%8B%E7%BC%96%E7%A0%81%E7%AB%AF%E4%BC%A0%E6%9D%A5%E7%9A%84seq_k%E4%B8%AD%E5%93%AA%E4%BA%9B%E4%BD%8D%E7%BD%AE%E6%98%AFpad%E7%AC%A6%E5%8F%B7%EF%BC%9A%0A%0A%60%60%60python%0Apad_attn_mask%20%3D%20seq_k.data.eq%280%29.unsqueeze%281%29%20%20%23%20batch_size%20x%201%20x%20len_k%2C%20one%20is%20masking%0A%60%60%60%0A%0A%60.data%60%E6%98%AF%E5%B0%86seq_k%E7%9F%A9%E9%98%B5%E5%A4%8D%E5%88%B6%E4%B8%80%E4%BB%BD%EF%BC%8C%E5%A4%8D%E5%88%B6%E5%90%8E%E7%9A%84%E7%9F%A9%E9%98%B5%E5%92%8C%E5%8E%9F%E7%9F%A9%E9%98%B5%E5%86%85%E5%AD%98%E4%B8%8D%E5%85%B1%E4%BA%AB%EF%BC%9B%0A%0A%60.eq%28%29%60%E6%98%AF%E5%AF%B9%E4%B8%A4%E4%B8%AA%E5%BC%A0%E9%87%8F%5BTensor%5D%28https%3A%2F%2Fso.csdn.net%2Fso%2Fsearch%3Fq%3DTensor%26spm%3D1001.2101.3001.7020%29%E8%BF%9B%E8%A1%8C%E9%80%90%E5%85%83%E7%B4%A0%E7%9A%84%E6%AF%94%E8%BE%83%EF%BC%8C%E8%8B%A5%E7%9B%B8%E5%90%8C%E4%BD%8D%E7%BD%AE%E7%9A%84%E4%B8%A4%E4%B8%AA%E5%85%83%E7%B4%A0%E7%9B%B8%E5%90%8C%EF%BC%8C%E5%88%99%E8%BF%94%E5%9B%9ETrue%EF%BC%9B%E8%8B%A5%E4%B8%8D%E5%90%8C%EF%BC%8C%E8%BF%94%E5%9B%9EFalse%E3%80%82%60.eq%280%29%60%E5%B0%B1%E8%A1%A8%E7%A4%BA%E4%B8%A4%E4%B8%AA%E7%9F%A9%E9%98%B5%E7%9B%B8%E5%90%8C%E4%BD%8D%E7%BD%AE%E7%9A%84%E5%85%83%E7%B4%A0%E9%83%BD%E6%98%AF0%E5%88%99%E8%BF%94%E5%9B%9ETrue%EF%BC%88%E5%9C%A8%E4%B8%BB%E5%87%BD%E6%95%B0%E9%83%A8%E5%88%86%EF%BC%8C%E6%88%91%E4%BB%AC%E5%AE%9A%E4%B9%89%E7%9A%84%E7%BC%96%E7%A0%81%E7%AB%AF%E8%AF%8D%E8%A1%A8%E5%92%8C%E8%A7%A3%E7%A0%81%E7%AB%AF%E8%AF%8D%E8%A1%A8%E4%B8%AD%EF%BC%8Cpad%E7%AC%A6%E5%8F%B7%E5%AF%B9%E5%BA%94%E7%9A%84%E6%95%B0%E5%AD%97%E6%98%AF0%EF%BC%8C%E5%9B%A0%E6%AD%A4%E6%88%91%E4%BB%AC%E8%A6%81%E6%89%BE%E5%87%BA%E7%9F%A9%E9%98%B5%E4%B8%AD%E5%93%AA%E4%BA%9B%E4%BD%8D%E7%BD%AE%E7%9A%84%E5%85%83%E7%B4%A0%E6%98%AF0%EF%BC%8C%E5%B9%B6%E8%BF%94%E5%9B%9ETrue%E3%80%82True%E4%B9%9F%E5%8F%AF%E4%BB%A5%E6%98%AF1%EF%BC%8C%E8%BF%99%E5%B0%B1%E5%8F%AF%E4%BB%A5%E5%BE%97%E5%88%B0pad%E4%BD%8D%E7%BD%AE%E7%9A%84%E5%85%83%E7%B4%A0%E6%98%AF1%EF%BC%8C%E5%85%B6%E4%BB%96%E4%B8%8D%E6%98%AFpad%E7%9A%84%E5%85%83%E7%B4%A0%E6%98%AF0%E7%9A%84%E7%AC%A6%E5%8F%B7%E7%9F%A9%E9%98%B5%EF%BC%89%EF%BC%9B%0A%0A%60.unsqueeze%281%29%60%E8%A1%A8%E7%A4%BA%E5%9C%A8%E7%AC%AC1%E4%B8%AA%E4%BD%8D%E7%BD%AE%E5%8A%A01%E4%B8%AA%E7%BB%B4%E5%BA%A6%EF%BC%8C%E5%BD%A2%E7%8A%B6%E7%94%B1%20batch_size%20%C3%97%20len_k%20%E5%8F%98%E4%B8%BA%20batch_size%20%C3%97%201%20%C3%97%20len_k%E3%80%82%0A%0A%60%60%60python%0A%20return%20pad_attn_mask.expand%28batch_size%2C%20len_q%2C%20len_k%29%20%20%23%20batch_size%20x%20len_q%20x%20len_k%0A%60%60%60%0A%0A%60input.expand%28%2Asizes%29%60%20%E5%87%BD%E6%95%B0%E8%83%BD%E5%A4%9F%E5%AE%9E%E7%8E%B0%20input%20%E8%BE%93%E5%85%A5%E5%BC%A0%E9%87%8F%E4%B8%AD%E5%8D%95%E7%BB%B4%E5%BA%A6%E4%B8%8A%E6%95%B0%E6%8D%AE%E7%9A%84%E5%A4%8D%E5%88%B6%E6%93%8D%E4%BD%9C%E3%80%82%E5%85%B6%E4%B8%AD%20%5C%2Asizes%20%E5%88%86%E5%88%AB%E6%8C%87%E5%AE%9A%E4%BA%86%E6%AF%8F%E4%B8%AA%E7%BB%B4%E5%BA%A6%E4%B8%8A%E5%A4%8D%E5%88%B6%E7%9A%84%E5%80%8D%E6%95%B0%EF%BC%8C%E5%AF%B9%E4%BA%8E%E4%B8%8D%E9%9C%80%E8%A6%81%EF%BC%88%E6%88%96%E9%9D%9E%E5%8D%95%E7%BB%B4%E5%BA%A6%EF%BC%89%E8%BF%9B%E8%A1%8C%E5%A4%8D%E5%88%B6%E7%9A%84%E7%BB%B4%E5%BA%A6%EF%BC%8C%E5%AF%B9%E5%BA%94%E4%BD%8D%E7%BD%AE%E4%B8%8A%E5%8F%AF%E4%BB%A5%E5%86%99%E4%B8%8A%E5%8E%9F%E5%A7%8B%E7%BB%B4%E5%BA%A6%E7%9A%84%E5%A4%A7%E5%B0%8F%E6%88%96%E8%80%85%E7%9B%B4%E6%8E%A5%E5%86%99%20-1%E3%80%82%0A%0Apad_attn_mask%E7%9A%84%E9%80%9A%E8%BF%87%E4%B8%8A%E9%9D%A2%E7%9A%84%E6%93%8D%E4%BD%9C%E5%90%8E%E5%BE%97%E5%88%B0%E7%9A%84%E5%BD%A2%E7%8A%B6%E6%98%AF%20batch_size%20%C3%97%201%20%C3%97%20len_k%20%EF%BC%8C%60pad_attn_mask.expand%28batch_size%2C%20len_q%2C%20len_k%29%60%E5%88%99%E8%A1%A8%E7%A4%BA%E5%AF%B9pad_attn_mask%E5%9C%A8%E5%8E%9F%E6%9D%A5%E6%98%AF1%E7%9A%84%E7%BB%B4%E5%BA%A6%E4%B8%8A%E5%A4%8D%E5%88%B6len_q%E6%AC%A1%E6%95%B0%E6%8D%AE%EF%BC%8C%E6%9C%80%E7%BB%88%E5%BE%97%E5%88%B0%E7%9A%84%E5%BD%A2%E7%8A%B6%E6%98%AFbatch_size%20%C3%97%20len_q%20%C3%97%20len_k%EF%BC%8C%E8%BF%99%E4%B8%80%E5%BD%A2%E7%8A%B6%E5%92%8C%24%5Cfrac%7BQK%5E%7BT%7D%20%7D%7B%5Csqrt%7Bd%7Bk%7D%20%7D%20%7D%20%24%E8%AE%A1%E7%AE%97%E5%90%8E%E5%BE%97%E5%88%B0%E7%9A%84%E7%9F%A9%E9%98%B5%E5%BD%A2%E7%8A%B6%E6%98%AF%E7%9B%B8%E5%90%8C%E7%9A%84%EF%BC%8C%E5%9B%A0%E4%B8%BA%E6%88%91%E4%BB%AC%E8%A6%81%E7%9F%A5%E9%81%93%E7%9F%A9%E9%98%B5%E4%B8%AD%E5%93%AA%E4%B8%AA%E4%BD%8D%E7%BD%AE%E5%AF%B9%E5%BA%94%E5%AD%97%E7%AC%A6%E4%B8%8Epad%E7%AC%A6%E5%8F%B7%E7%9A%84%E7%9B%B8%E4%BC%BC%E5%BA%A6%EF%BC%8C%E6%89%80%E4%BB%A5%E9%9C%80%E8%A6%81%E7%9B%B8%E5%90%8C%E5%BD%A2%E7%8A%B6%E7%9A%84%E7%9F%A9%E9%98%B5%E3%80%82%E8%BF%99%E4%B9%9F%E6%98%AF%E4%B8%BA%E4%BB%80%E4%B9%88%E5%89%8D%E9%9D%A2%E8%A6%81%E7%94%A8%60.unsqueeze%281%29%60%E5%A2%9E%E5%8A%A01%E4%B8%AA%E7%BB%B4%E5%BA%A6%E7%9A%84%E5%8E%9F%E5%9B%A0%E3%80%82%0A%0A%E4%BB%A5%E4%B8%8A%E5%B0%B1%E6%98%AF%20getattnpadmask%E5%87%BD%E6%95%B0%E7%9A%84%E5%AE%9E%E7%8E%B0%E3%80%82%0A%0A%23%23%23%23%23%20%E5%89%8D%E9%A6%88%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C%E5%92%8C%E8%87%AA%E6%B3%A8%E6%84%8F%E5%8A%9B%E5%B1%82%0A%0A%E4%B9%8B%E5%90%8E%E7%BB%A7%E7%BB%ADEncoder%E5%B1%82%E7%9A%84forward%E5%87%BD%E6%95%B0%EF%BC%8C%E6%9D%A5%E5%88%86%E6%9E%90getattnpadmask%E5%87%BD%E6%95%B0%E5%90%8E%E9%9D%A2%E7%9A%84%E4%BB%A3%E7%A0%81%EF%BC%9A%0A%0A%60%60%60python%0Aencselfattns%20%3D%20%5B%5D%0Afor%20layer%20in%20self.layers%3A%23%20%E5%8E%BB%E7%9C%8BEncoderLayer%E5%B1%82%E5%87%BD%E6%95%B0%3A5.%0A%20%20%20%20encoutputs%2C%20encselfattn%20%3D%20layer%28enc_outputs%2C%20enc_self_attn_mask%29%0A%20%20%20%20enc_self_attns.append%28enc_self_attn%29%0A%20%20%20%20return%20enc_outputs%2C%20enc_self_attns%0A%60%60%60%0A%0A%E8%BF%99%E6%98%AF%E5%89%8D%E9%A6%88%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C%E5%92%8C%E8%87%AA%E6%B3%A8%E6%84%8F%E5%8A%9B%E5%B1%82%E7%9A%84%E7%BB%84%E6%88%90%E9%83%A8%E5%88%86%E3%80%82%E5%85%B6%E4%B8%AD%E7%9A%84for%E5%BE%AA%E7%8E%AF%E6%98%AF%E6%8A%8A%E6%AF%8F%E4%B8%80%E5%B1%82%E7%9A%84%E8%BE%93%E5%87%BA%E4%BD%9C%E4%B8%BA%E4%B8%8B%E4%B8%80%E5%B1%82%E7%9A%84%E8%BE%93%E5%85%A5%EF%BC%8C%E5%8F%AA%E9%9C%80%E5%88%86%E6%9E%90%E4%B8%80%E5%B1%82%E7%9A%84%E4%BB%A3%E7%A0%81%E5%8D%B3%E5%8F%AF%EF%BC%9A%0A%0A%60%60%60python%0Aenc_outputs%2C%20enc_self_attn%20%3D%20layer%28enc_outputs%2C%20enc_self_attn_mask%29%0A%60%60%60%0A%0Aenc_outputs%E6%98%AF%E4%B8%8A%E4%B8%80%E5%B1%82%E7%BC%96%E7%A0%81%E5%99%A8%E7%9A%84%E8%BE%93%E5%87%BA%EF%BC%8Cenc_self_attn_mask%E6%98%AF%20get_attn_pad_mask%E5%87%BD%E6%95%B0%E5%BE%97%E5%88%B0%E7%9A%84%E7%AC%A6%E5%8F%B7%E7%9F%A9%E9%98%B5%EF%BC%8C%E8%BF%99%E4%B8%A4%E4%B8%AA%E4%BD%9C%E4%B8%BA%E6%AF%8F%E4%B8%80%E5%B1%82%E7%9A%84%E8%BE%93%E5%85%A5%EF%BC%8C%E5%85%B7%E4%BD%93%E5%AE%9E%E7%8E%B0%EF%BC%88%E5%BE%97%E5%88%B0%E8%BE%93%E5%87%BA%E7%9A%84%E8%BF%87%E7%A8%8B%EF%BC%89%E7%9C%8B%E4%B8%8B%E9%9D%A2%E7%9A%84%E5%87%BD%E6%95%B0%EF%BC%9A%0A%0A%60%60%60python%0A%23%205.%20EncoderLayer%20%EF%BC%9A%E5%8C%85%E5%90%AB%E4%B8%A4%E4%B8%AA%E9%83%A8%E5%88%86%EF%BC%8C%E5%A4%9A%E5%A4%B4%E6%B3%A8%E6%84%8F%E5%8A%9B%E6%9C%BA%E5%88%B6%E5%92%8C%E5%89%8D%E9%A6%88%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C%0Aclass%20EncoderLayer%28nn.Module%29%3A%0A%20%20%20%20def%20__init%28self%29%3A%20%23%20%E5%88%9D%E5%A7%8B%E5%8C%96%E5%87%BD%E6%95%B0%0A%20%20%20%20%20%20%20%20super%28EncoderLayer%2C%20self%29.__init%28%29%0A%20%20%20%20%20%20%20%20self.enc_self_attn%20%3D%20MultiHeadAttention%28%29%20%23%20%E8%87%AA%E6%B3%A8%E6%84%8F%E5%8A%9B%E5%B1%82%EF%BC%8C%E4%BD%BF%E7%94%A8%E4%BA%86%E5%A4%9A%E5%A4%B4%E6%B3%A8%E6%84%8F%E5%8A%9B%E6%9C%BA%E5%88%B6%EF%BC%8C%E6%A0%B8%E5%BF%83%E9%83%A8%E5%88%86%0A%20%20%20%20%20%20%20%20self.pos_ffn%20%3D%20PoswiseFeedForwardNet%28%29%20%23%20%E5%89%8D%E9%A6%88%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C%E5%B1%82%EF%BC%8C%E5%B0%B1%E6%98%AFLinear%E5%B1%82%0A%0A%20%20%20%20def%20forward%28self%2C%20enc_inputs%2C%20enc_self_attn_mask%29%3A%0A%20%20%20%20%20%20%20%20enc_outputs%2C%20attn%20%3D%20self.enc_self_attn%28enc_inputs%2C%20enc_inputs%2C%20enc_inputs%2C%20enc_self_attn_mask%29%20%23%20enc_inputs%3A%20%5Bbatch_size%20x%20seq_len_q%20x%20d_model%5D%0A%20%20%20%20%20%20%20%20enc_outputs%20%3D%20self.pos_ffn%28enc_outputs%29%20%20%23%20enc_outputs%3A%20%5Bbatch_size%20x%20len_q%20x%20d_model%5D%0A%20%20%20%20%20%20%20%20return%20enc_outputs%2C%20attn%0A%60%60%60%0A%0A%E5%9C%A8%E5%88%9D%E5%A7%8B%E5%8C%96%E5%87%BD%E6%95%B0%E4%B8%AD%EF%BC%8C%E5%85%88%E6%98%AF%E8%87%AA%E6%B3%A8%E6%84%8F%E5%8A%9B%E5%B1%82enc_self_attn%EF%BC%8C%E4%BD%BF%E7%94%A8%E4%BA%86%E5%A4%9A%E5%A4%B4%E6%B3%A8%E6%84%8F%E5%8A%9B%E6%9C%BA%E5%88%B6%EF%BC%8C%E6%98%AF%E6%95%B4%E4%B8%AA%E4%BB%A3%E7%A0%81%E7%9A%84%E6%A0%B8%E5%BF%83%E9%83%A8%E5%88%86%EF%BC%9B%E7%84%B6%E5%90%8E%E6%98%AF%E5%89%8D%E9%A6%88%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C%E5%B1%82%EF%BC%8C%E5%85%B6%E5%AE%9E%E5%B0%B1%E6%98%AFLinear%E5%B1%82%EF%BC%88%E5%85%A8%E8%BF%9E%E6%8E%A5%E5%B1%82%EF%BC%89%E3%80%82%0A%0A%E5%9C%A8%E5%AE%9E%E7%8E%B0%E5%87%BD%E6%95%B0forward%E4%B8%AD%EF%BC%8C%E9%A6%96%E5%85%88%E6%98%AF%E8%87%AA%E6%B3%A8%E6%84%8F%E5%8A%9B%E5%B1%82%EF%BC%9A%0A%0A%60%60%60python%0Aenc_outputs%2C%20attn%20%3D%20self.enc_self_attn%28enc_inputs%2C%20enc_inputs%2C%20enc_inputs%2C%20enc_self_attn_mask%29%0A%60%60%60%0A%0A%E5%AE%83%E6%9C%894%E4%B8%AA%E8%BE%93%E5%85%A5%EF%BC%8C%E5%89%8D3%E4%B8%AAenc_inputs%E4%B8%8E%E6%9C%80%E5%8E%9F%E5%A7%8B%E7%9A%84Q%20K%20V%E7%9F%A9%E9%98%B5%E7%9A%84%E5%BD%A2%E7%8A%B6%E7%9B%B8%E5%90%8C%EF%BC%8C%E9%83%BD%E6%98%AF%5Bbatch_size%20%C3%97%20seq_len_q%20%C3%97%20d_model%5D%EF%BC%9B%E7%AC%AC4%E4%B8%AA%E8%BE%93%E5%85%A5enc_self_attn_mask%E6%98%AF%E5%89%8D%E9%9D%A2%E5%BE%97%E5%88%B0%E7%9A%84%E7%AC%A6%E5%8F%B7%E7%9F%A9%E9%98%B5%E3%80%82%E4%B8%8B%E9%9D%A2%E6%9D%A5%E5%88%86%E6%9E%90%E8%BF%99%E4%B8%80%E5%B1%82%E7%9A%84%E5%AE%9E%E7%8E%B0%EF%BC%9A%0A%0A%23%23%23%23%23%23%20%E8%87%AA%E6%B3%A8%E6%84%8F%E5%8A%9B%E5%B1%82%0A%0A%E5%AE%9E%E7%8E%B0%E4%BB%A3%E7%A0%81%EF%BC%9A%0A%0A%60%60%60python%0A%23%206.%20MultiHeadAttention%0Aclass%20MultiHeadAttention%28nn.Module%29%3A%0A%20%20%20%20def%20__init%28self%29%3A%0A%20%20%20%20%20%20%20%20super%28MultiHeadAttention%2C%20self%29.__init%28%29%0A%20%20%20%20%20%20%20%20%23%20%E8%BE%93%E5%85%A5%E8%BF%9B%E6%9D%A5%E7%9A%84QKV%E6%98%AF%E7%9B%B8%E7%AD%89%E7%9A%84%EF%BC%8C%E6%88%91%E4%BB%AC%E4%BC%9A%E4%BD%BF%E7%94%A8%E6%98%A0%E5%B0%84linear%E5%81%9A%E4%B8%80%E4%B8%AA%E6%98%A0%E5%B0%84%E5%BE%97%E5%88%B0%E5%8F%82%E6%95%B0%E7%9F%A9%E9%98%B5Wq%2C%20Wk%2C%20Wv%0A%20%20%20%20%20%20%20%20self.W_Q%20%3D%20nn.Linear%28d_model%2C%20d_k%20%2A%20n_heads%29%0A%20%20%20%20%20%20%20%20self.W_K%20%3D%20nn.Linear%28d_model%2C%20d_k%20%2A%20n_heads%29%0A%20%20%20%20%20%20%20%20self.W_V%20%3D%20nn.Linear%28d_model%2C%20d_v%20%2A%20n_heads%29%0A%20%20%20%20%20%20%20%20self.linear%20%3D%20nn.Linear%28n_heads%20%2A%20d_v%2C%20d_model%29%0A%20%20%20%20%20%20%20%20self.layer_norm%20%3D%20nn.LayerNorm%28d_model%29%0A%0A%20%20%20%20def%20forward%28self%2C%20Q%2C%20K%2C%20V%2C%20attn_mask%29%3A%0A%0A%20%20%20%20%20%20%20%20%23%20%E8%BF%99%E4%B8%AA%E5%A4%9A%E5%A4%B4%E5%88%86%E4%B8%BA%E8%BF%99%E5%87%A0%E4%B8%AA%E6%AD%A5%E9%AA%A4%EF%BC%8C%E9%A6%96%E5%85%88%E6%98%A0%E5%B0%84%E5%88%86%E5%A4%B4%EF%BC%8C%E7%84%B6%E5%90%8E%E8%AE%A1%E7%AE%97atten_scores%EF%BC%8C%E7%84%B6%E5%90%8E%E8%AE%A1%E7%AE%97atten_value%3B%0A%20%20%20%20%20%20%20%20residual%2C%20batch_size%20%3D%20Q%2C%20Q.size%280%29%0A%20%20%20%20%20%20%20%20%23%20%28B%2C%20S%2C%20D%29%20-proj-%3E%20%28B%2C%20S%2C%20D%29%20-split-%3E%20%28B%2C%20S%2C%20H%2C%20W%29%20-trans-%3E%20%28B%2C%20H%2C%20S%2C%20W%29%0A%0A%20%20%20%20%20%20%20%20%23%E4%B8%8B%E9%9D%A2%E8%BF%99%E4%B8%AA%E5%B0%B1%E6%98%AF%E5%85%88%E6%98%A0%E5%B0%84%EF%BC%8C%E5%90%8E%E5%88%86%E5%A4%B4%EF%BC%9B%E4%B8%80%E5%AE%9A%E8%A6%81%E6%B3%A8%E6%84%8F%E7%9A%84%E6%98%AFq%E5%92%8Ck%E5%88%86%E5%A4%B4%E4%B9%8B%E5%90%8E%E7%BB%B4%E5%BA%A6%E6%98%AF%E4%B8%80%E8%87%B4%E7%9A%84%EF%BC%8C%E6%89%80%E4%BB%A5%E4%B8%80%E7%9C%8B%E8%BF%99%E9%87%8C%E9%83%BD%E6%98%AFdk%0A%20%20%20%20%20%20%20%20q_s%20%3D%20self.W_Q%28Q%29.view%28batch_size%2C%20-1%2C%20n_heads%2C%20d_k%29.transpose%281%2C2%29%20%20%23%20q_s%3A%20%5Bbatch_size%20x%20n_heads%20x%20len_q%20x%20d_k%5D%0A%20%20%20%20%20%20%20%20k_s%20%3D%20self.W_K%28K%29.view%28batch_size%2C%20-1%2C%20n_heads%2C%20d_k%29.transpose%281%2C2%29%20%20%23%20k_s%3A%20%5Bbatch_size%20x%20n_heads%20x%20len_k%20x%20d_k%5D%0A%20%20%20%20%20%20%20%20v_s%20%3D%20self.W_V%28V%29.view%28batch_size%2C%20-1%2C%20n_heads%2C%20d_v%29.transpose%281%2C2%29%20%20%23%20v_s%3A%20%5Bbatch_size%20x%20n_heads%20x%20len_k%20x%20d_v%5D%0A%0A%20%20%20%20%20%20%20%20%23%20%E8%BE%93%E5%85%A5%E8%BF%9B%E6%9D%A5%E7%9A%84attn_mask%E5%BD%A2%E7%8A%B6%E6%98%AF%20batch_size%20x%20len_q%20x%20len_k%EF%BC%8C%E7%BB%8F%E8%BF%87%E4%B8%8B%E9%9D%A2%E8%BF%99%E4%B8%AA%E4%BB%A3%E7%A0%81%E5%BE%97%E5%88%B0%E6%96%B0%E7%9A%84attn_mask%20%3A%20%5Bbatch_size%20x%20n_heads%20x%20len_q%20x%20len_k%5D%E3%80%82%E5%B0%B1%E6%98%AF%E6%8A%8Apad%E4%BF%A1%E6%81%AF%E9%87%8D%E5%A4%8D%E5%88%B0%E4%BA%86n%E4%B8%AA%E5%A4%B4%E4%B8%8A%0A%20%20%20%20%20%20%20%20attn_mask%20%3D%20attn_mask.unsqueeze%281%29.repeat%281%2C%20n_heads%2C%201%2C%201%29%0A%0A%20%20%20%20%20%20%20%20%23%20%E7%84%B6%E5%90%8E%E8%AE%A1%E7%AE%97%20ScaledDotProductAttention%20%E8%BF%99%E4%B8%AA%E5%87%BD%E6%95%B0%EF%BC%8C%E5%8E%BB7.%E7%9C%8B%E4%B8%80%E4%B8%8B%0A%20%20%20%20%20%20%20%20%23%20%E5%BE%97%E5%88%B0%E7%9A%84%E7%BB%93%E6%9E%9C%E6%9C%89%E4%B8%A4%E4%B8%AA%EF%BC%9Acontext%3A%20%5Bbatch_size%20x%20n_heads%20x%20len_q%20x%20d_v%5D%2C%20attn%3A%20%5Bbatch_size%20x%20n_heads%20x%20len_q%20x%20len_k%5D%0A%20%20%20%20%20%20%20%20context%2C%20attn%20%3D%20ScaledDotProductAttention%28%29%28q_s%2C%20k_s%2C%20v_s%2C%20attn_mask%29%0A%20%20%20%20%20%20%20%20context%20%3D%20context.transpose%281%2C%202%29.contiguous%28%29.view%28batch_size%2C%20-1%2C%20n_heads%20%2A%20d_v%29%20%23%20context%3A%20%5Bbatch_size%20x%20len_q%20x%20n_heads%20%2A%20d_v%5D%0A%20%20%20%20%20%20%20%20output%20%3D%20self.linear%28context%29%0A%20%20%20%20%20%20%20%20return%20self.layer_norm%28output%20%2B%20residual%29%2C%20attn%20%23%20output%3A%20%5Bbatch_size%20x%20len_q%20x%20d_model%5D%0A%60%60%60%0A%0A-%20%E5%88%9D%E5%A7%8B%E5%8C%96%E5%87%BD%E6%95%B0%60def%20__init%28self%29%60%E4%B8%AD%EF%BC%9A%0A%0A%60%60%60python%0Aself.W_Q%20%3D%20nn.Linear%28d_model%2C%20d_k%20%2A%20n_heads%29%20%23%20d_model%20%E6%98%A0%E5%B0%84%E5%88%B0%20d_k%20%2A%20n_heads%0Aself.W_K%20%3D%20nn.Linear%28d_model%2C%20d_k%20%2A%20n_heads%29%20%23%20d_model%20%E6%98%A0%E5%B0%84%E5%88%B0%20d_k%20%2A%20n_heads%0Aself.W_V%20%3D%20nn.Linear%28d_model%2C%20d_v%20%2A%20n_heads%29%20%23%20d_model%20%E6%98%A0%E5%B0%84%E5%88%B0%20d_v%20%2A%20n_heads%0A%60%60%60%0A%0A%E5%8F%AF%E4%BB%A5%E7%9C%8B%E5%88%B0W_Q%E5%92%8CW_K%E9%83%BD%E6%98%AF%E5%B0%86d_model%20%E6%98%A0%E5%B0%84%E5%88%B0%20d_k%20%2A%20n_heads%EF%BC%8C%E8%A6%81%E4%BF%9D%E8%AF%81%E6%9C%80%E5%90%8E%E5%BE%97%E5%88%B0%E7%9A%84Q%E3%80%81K%E7%9F%A9%E9%98%B5%E7%9A%84%E7%BB%B4%E5%BA%A6%E6%98%AF%E7%9B%B8%E5%90%8C%E7%9A%84%EF%BC%88%E5%9B%A0%E4%B8%BA%E5%9C%A8%E8%AE%A1%E7%AE%97%E6%B3%A8%E6%84%8F%E5%8A%9B%E5%80%BC%E7%9A%84%E5%85%AC%E5%BC%8F%E4%B8%AD%E6%9C%89%24QK%5E%7BT%7D%20%24%EF%BC%8C%E5%A6%82%E6%9E%9CQ%E3%80%81K%E7%9F%A9%E9%98%B5%E7%9A%84%E7%BB%B4%E5%BA%A6%E4%B8%8D%E5%90%8C%E5%B0%B1%E6%97%A0%E6%B3%95%E7%9B%B8%E4%B9%98%EF%BC%89%E3%80%82%0A%0A-%20%E5%AE%9E%E7%8E%B0%E5%87%BD%E6%95%B0%60forward%28self%2C%20Q%2C%20K%2C%20V%2C%20attn_mask%29%60%E4%B8%AD%EF%BC%9A%0A%0A%E8%BE%93%E5%85%A5%E8%BF%9B%E6%9D%A5%E7%9A%84%E6%95%B0%E6%8D%AE%E5%BD%A2%E7%8A%B6%EF%BC%9A%20Q%3A%20%5Bbatch_size%20%C3%97%20len_q%20%C3%97%20d_model%5D%EF%BC%8CK%3A%20%5Bbatch_size%20%C3%97%20len_k%20%C3%97%20d_model%5D%EF%BC%8CV%3A%20%5Bbatch_size%20%C3%97%20len_k%20%C3%97%20d_model%5D%E3%80%82%0A%0A%E4%B8%8B%E9%9D%A2%E7%9A%84%E4%BB%A3%E7%A0%81%E9%9D%9E%E5%B8%B8%E9%87%8D%E8%A6%81%EF%BC%8C%E9%A6%96%E5%85%88%60W_Q%28Q%29%60%E6%98%AF%E5%AF%B9Q%E7%9F%A9%E9%98%B5%E8%BF%9B%E8%A1%8C%E6%98%A0%E5%B0%84%EF%BC%88%E5%8F%82%E8%80%83%E4%B8%8A%E9%9D%A2%E7%9A%84%E5%88%9D%E5%A7%8B%E5%8C%96%E5%87%BD%E6%95%B0%E5%AF%B9W_Q%E7%9A%84%E5%AE%9A%E4%B9%89%EF%BC%89%EF%BC%9B%E7%84%B6%E5%90%8E%E4%BD%BF%E7%94%A8view%E5%87%BD%E6%95%B0%E8%BF%9B%E8%A1%8C%E5%88%86%E5%A4%B4%EF%BC%8C%E5%88%86%E6%88%90%E4%BA%86n_heads%E4%B8%AA%E5%A4%B4%EF%BC%8C%E6%AF%8F%E4%B8%AA%E5%A4%B4%E6%98%AFd_k%E7%BB%B4%E5%BA%A6%EF%BC%9Btranspose%281%2C2%29%E6%98%AF%E4%BA%A4%E6%8D%A2%E7%AC%AC1%E7%BB%B4%E5%92%8C%E7%AC%AC2%E7%BB%B4%EF%BC%88%E7%BB%B4%E5%BA%A6%E4%BB%8E0%E5%BC%80%E5%A7%8B%EF%BC%89%E3%80%82%0A%0A%60%60%60python%0Aq_s%20%3D%20self.W_Q%28Q%29.view%28batch_size%2C%20-1%2C%20n_heads%2C%20d_k%29.transpose%281%2C2%29%20%20%23%20q_s%3A%20%5Bbatch_size%20x%20n_heads%20x%20len_q%20x%20d_k%5D%0Ak_s%20%3D%20self.W_K%28K%29.view%28batch_size%2C%20-1%2C%20n_heads%2C%20d_k%29.transpose%281%2C2%29%20%20%23%20k_s%3A%20%5Bbatch_size%20x%20n_heads%20x%20len_k%20x%20d_k%5D%0Av_s%20%3D%20self.W_V%28V%29.view%28batch_size%2C%20-1%2C%20n_heads%2C%20d_v%29.transpose%281%2C2%29%20%20%23%20v_s%3A%20%5Bbatch_size%20x%20n_heads%20x%20len_k%20x%20d_v%5D%0A%60%60%60%0A%0A%E6%B3%A8%E6%84%8F%E8%BF%99%E9%87%8C%E5%BE%97%E5%88%B0%E7%9A%84q%E3%80%81k%E7%9F%A9%E9%98%B5%E7%9A%84%E7%BB%B4%E5%BA%A6%E6%98%AF%E7%9B%B8%E5%90%8C%E7%9A%84%EF%BC%8C%E5%AE%83%E4%BB%AC%E7%9A%84%E6%AF%8F%E4%B8%AA%E5%A4%B4%E9%83%BD%E6%98%AFd_k%E7%BB%B4%E5%BA%A6%E3%80%82%0A%0A%3E%20%E5%9B%A0%E4%B8%BA%E5%9C%A8%E8%AE%A1%E7%AE%97%E6%B3%A8%E6%84%8F%E5%8A%9B%E5%80%BC%E7%9A%84%E5%85%AC%E5%BC%8F%E4%B8%AD%E6%9C%89%24QK%5E%7BT%7D%20%24%EF%BC%8C%E5%A6%82%E6%9E%9CQ%E3%80%81K%E7%9F%A9%E9%98%B5%E7%9A%84%E7%BB%B4%E5%BA%A6%E4%B8%8D%E5%90%8C%E5%B0%B1%E6%97%A0%E6%B3%95%E7%9B%B8%E4%B9%98%E3%80%82%0A%3E%0A%3E%20%E5%8D%95%E4%B8%AAq%E3%80%81k%E7%9F%A9%E9%98%B5%E7%9A%84%E7%BB%B4%E5%BA%A6%E6%98%AFbatch_size%20%C3%97%20len_q%20%C3%97%20d_model%E5%92%8Cbatch_size%20%C3%97%20len_k%20%C3%97%20d_model%EF%BC%8C%E5%85%B6%E4%B8%ADlen_q%20%3D%20len_k%20%3D%20%E5%8F%A5%E5%AD%90%E9%95%BF%E5%BA%A6%20seq_len%EF%BC%8Cn_heads%E8%A1%A8%E7%A4%BA%E6%9C%89n_heads%E4%B8%AAq%E3%80%81k%E7%9F%A9%E9%98%B5%EF%BC%8C%E6%89%80%E4%BB%A5q%E3%80%81k%E7%9A%84%E7%BB%B4%E5%BA%A6%E6%98%AF%E7%9B%B8%E5%90%8C%E7%9A%84%E3%80%82%E5%8F%AF%E4%BB%A5%E9%80%9A%E8%BF%87%E4%B8%8B%E5%9B%BE%E6%9D%A5%E7%90%86%E8%A7%A3%EF%BC%9A%0A%3E%0A%3E%20%3Cimg%20src%3D%22Transformer%E4%BB%A3%E7%A0%81%E8%AE%B2%E8%A7%A3.assets%2Fn_heads%E5%9B%BE%E7%A4%BA.png%22%20alt%3D%22n_heads%E5%9B%BE%E7%A4%BA%22%20style%3D%22zoom%3A%2030%25%3B%22%20%2F%3E%0A%0A%3E%20view%28%29%E5%87%BD%E6%95%B0%E8%AE%B2%E8%A7%A3%EF%BC%9A%0A%3E%0A%3E%20view%28%29%E5%87%BD%E6%95%B0%E8%A1%A8%E7%A4%BA%E9%87%8D%E6%96%B0%E8%B0%83%E6%95%B4Tensor%E7%9A%84%E5%BD%A2%E7%8A%B6%EF%BC%8C%E4%BE%8B%E5%A6%82%EF%BC%9A%0A%3E%0A%3E%20%60%60%60python%0A%3E%20a%3Dtorch.Tensor%28%5B%5B%5B1%2C2%2C3%5D%2C%5B4%2C5%2C6%5D%5D%5D%29%0A%3E%20print%28a.view%283%2C2%29%29%0A%3E%20%27%27%27%0A%3E%20%E8%BE%93%E5%87%BA%EF%BC%9Atensor%28%5B%5B1.%2C%202.%5D%2C%0A%3E%20%09%20%20%20%20%5B3.%2C%204.%5D%2C%0A%3E%20%20%20%20%20%20%20%20%20%5B5.%2C%206.%5D%5D%29%0A%3E%20%27%27%27%0A%3E%20%60%60%60%0A%3E%0A%3E%20%E5%85%B6%E4%B8%AD%E5%8F%82%E6%95%B0-1%E8%A1%A8%E7%A4%BA%E4%BF%9D%E8%AF%81%E5%85%83%E7%B4%A0%E7%9A%84%E6%80%BB%E6%95%B0%E4%B8%8D%E5%8F%98%E7%9A%84%E5%89%8D%E6%8F%90%E4%B8%8B%EF%BC%8C%E8%87%AA%E5%8A%A8%E8%B0%83%E6%95%B4%E8%BF%99%E4%B8%AA%E7%BB%B4%E5%BA%A6%E4%B8%8A%E7%9A%84%E5%85%83%E7%B4%A0%E4%B8%AA%E6%95%B0%E3%80%82%E9%82%A3%E4%B9%88%60view%28batch_size%2C%20-1%2C%20n_heads%2C%20d_k%29%60%E4%B9%8B%E5%90%8Eq%E7%9F%A9%E9%98%B5%E7%9A%84%E7%BB%B4%E5%BA%A6%E5%8F%98%E4%B8%BAbatch_size%20%C3%97%20len_q%20%C3%97%20n_heads%20%C3%97%20d_k%E3%80%82%0A%3E%0A%3E%20%E4%B9%8B%E5%90%8E%E5%86%8Dtranspose%281%2C2%29%E5%B0%B1%E5%BE%97%E5%88%B0%E6%9C%80%E7%BB%88%E7%9A%84q%E7%9F%A9%E9%98%B5%EF%BC%8C%E5%BD%A2%E7%8A%B6%E6%98%AFbatch_size%20%C3%97%20n_heads%20%C3%97%20len_q%20%C3%97%20d_k%EF%BC%8Ck%E3%80%81v%E7%9F%A9%E9%98%B5%E5%90%8C%E7%90%86%E3%80%82%0A%0A%E7%94%B1%E4%BA%8E%E5%88%86%E5%A4%B4%E5%8E%9F%E6%9D%A5%E7%9A%84%E5%8D%95%E4%B8%AAq%E3%80%81k%E3%80%81v%E7%9F%A9%E9%98%B5%E5%8F%98%E6%88%90%E4%BA%86n_heads%E4%B8%AA%EF%BC%8C%E6%89%80%E4%BB%A5%E7%AC%A6%E5%8F%B7%E7%9F%A9%E9%98%B5%E4%B9%9F%E8%A6%81%E8%BF%9B%E8%A1%8C%E5%88%86%E5%A4%B4%EF%BC%8C%E5%88%86%E6%88%90n_heads%E4%B8%AA%E5%A4%B4%EF%BC%9A%0A%0A%60%60%60python%0Aattn_mask%20%3D%20attn_mask.unsqueeze%281%29.repeat%281%2C%20n_heads%2C%201%2C%201%29%0A%60%60%60%0A%0A%E9%A6%96%E5%85%88%E5%AF%B9%E7%AC%A6%E5%8F%B7%E7%9F%A9%E9%98%B5attn_mask%E5%9C%A8%E7%AC%AC1%E7%BB%B4%E5%A4%84%E5%A2%9E%E5%8A%A0%E4%BA%86%E4%B8%80%E4%B8%AA%E7%BB%B4%E5%BA%A6%EF%BC%8C%E5%BD%A2%E7%8A%B6%E7%94%B1batch_size%20%C3%97%20len_q%20%C3%97%20len_k%E5%8F%98%E4%B8%BAbatch_size%20%C3%97%201%20%C3%97%20len_q%20%C3%97%20len_k%EF%BC%8C%E6%8E%A5%E7%9D%80%E7%94%A8repeat%28%29%E5%87%BD%E6%95%B0%E5%9C%A8%E7%AC%AC1%E7%BB%B4%E9%87%8D%E5%A4%8Dn_heads%E6%AC%A1%EF%BC%8C%E5%BE%97%E5%88%B0%E6%9C%80%E7%BB%88%E7%9A%84%E7%AC%A6%E5%8F%B7%E7%9F%A9%E9%98%B5%EF%BC%8C%E5%85%B6%E5%BD%A2%E7%8A%B6%E4%B8%BAbatch_size%20%C3%97%20n_heads%20%C3%97%20len_q%20%C3%97%20len_k%E3%80%82%0A%0A———%0A%0A%E4%B9%8B%E5%90%8E%E6%98%AF%2A%2A%E8%AE%A1%E7%AE%97Attention%E5%80%BC%2A%2A%EF%BC%8C%E4%BC%A0%E5%85%A5%E5%89%8D%E9%9D%A2%E8%AE%A1%E7%AE%97%E7%9A%84q%E3%80%81k%E3%80%81v%E7%9F%A9%E9%98%B5%E5%92%8C%E7%AC%A6%E5%8F%B7%E7%9F%A9%E9%98%B5attn_mask%EF%BC%9A%0A%0A%60%60%60python%0Acontext%2C%20attn%20%3D%20ScaledDotProductAttention%28%29%28q_s%2C%20k_s%2C%20v_s%2C%20attn_mask%29%0A%60%60%60%0A%0A%E8%BF%99%E8%A1%8C%E4%BB%A3%E7%A0%81%E4%B9%9F%E5%B0%B1%E6%98%AF%E5%AE%9E%E7%8E%B0%E5%85%AC%E5%BC%8F%EF%BC%9A%24Attention%28Q%2CK%2CV%29%3Dsoftmax%28%5Cfrac%7BQK%5E%7BT%7D%20%7D%7B%5Csqrt%7Bd%7Bk%7D%20%7D%20%7D%20%29V%24%EF%BC%8C%E5%AE%9E%E7%8E%B0%E4%BB%A3%E7%A0%81%E5%A6%82%E4%B8%8B%EF%BC%9A%0A%0A%60%60%60python%0A%23%207.%20ScaledDotProductAttention%0Aclass%20ScaledDotProductAttention%28nn.Module%29%3A%0A%20%20%20%20def%20init%28self%29%3A%0A%20%20%20%20%20%20%20%20super%28ScaledDotProductAttention%2C%20self%29.init%28%29%0A%0A%20%20%20%20def%20forward%28self%2C%20Q%2C%20K%2C%20V%2C%20attnmask%29%3A%0A%20%20%20%20%20%20%20%20%23%20%E8%BE%93%E5%85%A5%E8%BF%9B%E6%9D%A5%E7%9A%84%E7%BB%B4%E5%BA%A6%E5%88%86%E5%88%AB%E6%98%AF%20%5Bbatch_size%20x%20n_heads%20x%20len_q%20x%20d_k%5D%20%20K%EF%BC%9A%20%5Bbatch_size%20x%20n_heads%20x%20len_k%20x%20d_k%5D%20%20V%3A%20%5Bbatch_size%20x%20n_heads%20x%20len_k%20x%20d_v%5D%0A%20%20%20%20%20%20%20%20%23%20%E9%A6%96%E5%85%88%E7%BB%8F%E8%BF%87matmul%E5%87%BD%E6%95%B0%E5%BE%97%E5%88%B0%E7%9A%84scores%E5%BD%A2%E7%8A%B6%E6%98%AF%20%3A%20%5Bbatch_size%20x%20n_heads%20x%20len_q%20x%20len_k%5D%0A%20%20%20%20%20%20%20%20scores%20%3D%20torch.matmul%28Q%2C%20K.transpose%28-1%2C%20-2%29%29%20%2F%20np.sqrt%28d_k%29%0A%0A%20%20%20%20%20%20%20%20%23%20Fills%20elements%20of%20self%20tensor%20with%20value%20where%20mask%20is%20one.%0A%20%20%20%20%20%20%20%20scores.masked_fill%28attnmask%2C%20-1e9%29%20%0A%20%20%20%20%20%20%20%20attn%20%3D%20nn.Softmax%28dim%3D-1%29%28scores%29%0A%20%20%20%20%20%20%20%20context%20%3D%20torch.matmul%28attn%2C%20V%29%0A%20%20%20%20%20%20%20%20return%20context%2C%20attn%0A%60%60%60%0A%0A%E9%A6%96%E5%85%88%E8%AE%A1%E7%AE%97%24%5Cfrac%7BQK%5E%7BT%7D%20%7D%7B%5Csqrt%7Bd%7Bk%7D%20%7D%20%7D%24%EF%BC%9A%0A%0A%60%60%60python%0Ascores%20%3D%20torch.matmul%28Q%2C%20K.transpose%28-1%2C%20-2%29%29%20%2F%20np.sqrt%28dk%29%0A%23%20torch.matmul%E8%A1%A8%E7%A4%BA%E7%9B%B8%E4%B9%98%EF%BC%8CK.transpose%28-1%2C%20-2%29%E5%B0%B1%E6%98%AFK%E7%9A%84%E8%BD%AC%E7%BD%AE%EF%BC%8Ctorch.matmul%28Q%2C%20K.transpose%28-1%2C%20-2%29%29%E5%B0%B1%E6%98%AFQ%E4%B9%98%E4%BB%A5K%E7%9A%84%E8%BD%AC%E7%BD%AE%EF%BC%8Cnp.sqrt%28d_k%29%E6%98%AF%E6%A0%B9%E5%8F%B7d_k%E3%80%82%0A%60%60%60%0A%0A%E7%84%B6%E5%90%8E%E6%A0%B9%E6%8D%AE%E7%AC%A6%E5%8F%B7%E7%9F%A9%E9%98%B5attn_mask%E6%8A%8A%E8%A2%ABmask%E7%9A%84%E5%9C%B0%E6%96%B9%E7%BD%AE%E4%B8%BA%E6%97%A0%E9%99%90%E5%B0%8F%EF%BC%8Csoftmax%E4%B9%8B%E5%90%8E%E5%9F%BA%E6%9C%AC%E5%B0%B1%E6%98%AF0%EF%BC%8C%E5%AF%B9q%E7%9A%84%E5%8D%95%E8%AF%8D%E4%B8%8D%E8%B5%B7%E4%BD%9C%E7%94%A8%EF%BC%9A%0A%0A%60%60%60python%0A%23%20%E4%B9%9F%E5%B0%B1%E6%98%AF%E7%AC%A6%E5%8F%B7%E7%9F%A9%E9%98%B5%E4%B8%AD%E5%93%AA%E4%B8%AA%E4%BD%8D%E7%BD%AE%E7%9A%84%E5%85%83%E7%B4%A0%E6%98%AF1%EF%BC%8C%E5%B0%B1%E6%8A%8Ascores%E4%B8%AD%E8%BF%99%E4%B8%AA%E4%BD%8D%E7%BD%AE%E7%9A%84%E5%85%83%E7%B4%A0%E8%AE%BE%E4%B8%BA%E6%97%A0%E9%99%90%E5%B0%8F%0Ascores.masked_fill%28attnmask%2C%20-1e9%29%20%0Aattn%20%3D%20nn.Softmax%28dim%3D-1%29%28scores%29%20%23%20dim%3D-1%E8%A1%A8%E7%A4%BA%E5%AF%B9%E6%AF%8F%E4%B8%80%E6%A8%AA%E8%A1%8C%E5%81%9Asoftmax%0A%60%60%60%0A%0Asoftmax%E4%B9%8B%E5%90%8E%EF%BC%8C%E5%86%8D%E4%B8%8E%E7%9F%A9%E9%98%B5v%E7%9B%B8%E4%B9%98%EF%BC%9A%0A%0A%60%60%60pythhon%0Acontext%20%3D%20torch.matmul%28attn%2C%20V%29%0A%60%60%60%0A%0A———%0A%0A%E8%AE%A1%E7%AE%97Attention%E4%B9%8B%E5%90%8E%EF%BC%8C%E7%BB%A7%E7%BB%AD%E8%87%AA%E6%B3%A8%E6%84%8F%E5%8A%9B%E5%B1%82%E7%9A%84%E5%AE%9E%E7%8E%B0%E5%87%BD%E6%95%B0forward%EF%BC%8C%E5%90%8E%E9%9D%A2%E7%9A%84%E4%BB%A3%E7%A0%81%E5%B0%B1%E6%98%AF%E4%B8%80%E4%BA%9B%E5%B8%B8%E8%A7%84%E6%93%8D%E4%BD%9C%EF%BC%9A%0A%0A%60%60%60python%0A%20%20%20%20context%20%3D%20context.transpose%281%2C%202%29.contiguous%28%29.view%28batchsize%2C%20-1%2C%20nheads%20%2A%20dv%29%20%23%20context%3A%20%5Bbatchsize%20x%20len_q%20x%20n_heads%20%2A%20d_v%5D%0A%20%20%20%20output%20%3D%20self.linear%28context%29%0Areturn%20self.layer_norm%28output%20%2B%20residual%29%2C%20attn%20%23%20output%3A%20%5Bbatch_size%20x%20len_q%20x%20d_model%5D%0A%60%60%60%0A%0A%0A%0A%0A%0A%0A%0A%0A%0A%0A%0A%23%23%23%23%23%23%20%E5%89%8D%E9%A6%88%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C%E5%B1%82%0A%0A%E8%87%AA%E6%B3%A8%E6%84%8F%E5%8A%9B%E5%B1%82%E7%BB%93%E6%9D%9F%E5%90%8E%EF%BC%8C%E5%B0%B1%E6%98%AFEncoder%E5%B1%82%E7%9A%84%E5%89%8D%E9%A6%88%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C%E5%B1%82%EF%BC%8C%E5%AF%B9%E5%BA%94%E4%BB%A3%E7%A0%81%E6%98%AFEncoder%E4%B8%AD%E5%AE%9E%E7%8E%B0%E5%87%BD%E6%95%B0forward%E4%B8%AD%E7%9A%84%E8%BF%99%E4%B8%80%E8%A1%8C%EF%BC%9A%0A%0A%60%60%60python%0Aenc_outputs%20%3D%20self.pos_ffn%28enc_outputs%29%20%20%23%20enc_outputs%3A%20%5Bbatch_size%20x%20len_q%20x%20d_model%5D%0A%60%60%60%0A%0A%0A%0A%0A%0A%0A%0A%0A%0A%0A%0A%23%23%23%204.4.Decoder%E5%B1%82%E7%9A%84%E5%AE%9A%E4%B9%89%0A%0ADecoder%E5%B1%82%E7%9A%84%E5%AE%9E%E7%8E%B0%E4%BB%A3%E7%A0%81%EF%BC%9A%0A%0A%60%60%60python%0A%23%23%209.%20Decoder%0A%0Aclass%20Decoder%28nn.Module%29%3A%0A%20%20%20%20def%20__init%28self%29%3A%0A%20%20%20%20%20%20%20%20super%28Decoder%2C%20self%29.__init%28%29%0A%20%20%20%20%20%20%20%20self.tgt_emb%20%3D%20nn.Embedding%28tgt_vocab_size%2C%20d_model%29%20%20%23%20%E5%AD%97%E7%AC%A6%E8%BD%AC%E4%B8%BA%E8%AF%8D%E5%90%91%E9%87%8F%0A%20%20%20%20%20%20%20%20self.pos_emb%20%3D%20PositionalEncoding%28d_model%29%20%20%23%20%E4%BD%8D%E7%BD%AE%E7%BC%96%E7%A0%81%0A%20%20%20%20%20%20%20%20self.layers%20%3D%20nn.ModuleList%28%5BDecoderLayer%28%29%20for%20%20in%20range%28nlayers%29%5D%29%20%20%23%20%E8%A7%A3%E7%A0%81%E5%B1%82%E5%A0%86%E5%8F%A0n%E4%B8%AA%0A%0A%20%20%20%20%23%20decinputs%20%E8%A7%A3%E7%A0%81%E7%AB%AF%E8%BE%93%E5%85%A5%EF%BC%8Cencoutputs%20%E7%BC%96%E7%A0%81%E7%AB%AF%E8%BE%93%E5%87%BA%0A%20%20%20%20def%20forward%28self%2C%20decinputs%2C%20encinputs%2C%20encoutputs%29%3A%20%23%20dec_inputs%20%3A%20%5Bbatch_size%20x%20target_len%5D%0A%20%20%20%20%20%20%20%20dec_outputs%20%3D%20self.tgt_emb%28dec_inputs%29%20%20%23%20embedding%EF%BC%8C%5Bbatch_size%2C%20tgt_len%2C%20d_model%5D%0A%20%20%20%20%20%20%20%20dec_outputs%20%3D%20self.pos_emb%28dec_outputs.transpose%280%2C%201%29%29.transpose%280%2C%201%29%20%23%20position%20encoding%EF%BC%8C%5Bbatch_size%2C%20tgt_len%2C%20d_model%5D%0A%20%20%20%20%20%20%20%20%23%20%E4%B8%8A%E9%9D%A2%E7%9A%84%E4%B8%A4%E8%A1%8C%E4%BB%A3%E7%A0%81%E5%89%8D%E9%9D%A2%E9%83%BD%E8%AE%B2%E8%BF%87%0A%0A%20%20%20%20%20%20%20%20%23%23%20Decoder%E6%A0%B8%E5%BF%83%E9%83%A8%E5%88%86%EF%BC%8Cget_attn_pad_mask%20%E8%87%AA%E6%B3%A8%E6%84%8F%E5%8A%9B%E5%B1%82%E7%9A%84pad%20%E9%83%A8%E5%88%86%0A%20%20%20%20%20%20%20%20dec_self_attn_pad_mask%20%3D%20get_attn_pad_mask%28dec_inputs%2C%20dec_inputs%29%0A%0A%20%20%20%20%20%20%20%20%23%23%20get_attn_subsequent_mask%20%E8%87%AA%E6%B3%A8%E6%84%8F%E5%8A%9B%E5%B1%82%E7%9A%84mask%E9%83%A8%E5%88%86%0A%20%20%20%20%20%20%20%20dec_self_attn_subsequent_mask%20%3D%20get_attn_subsequent_mask%28dec_inputs%29%0A%0A%20%20%20%20%20%20%20%20%23%23%20%E5%89%8D%E9%9D%A2%E4%B8%A4%E4%B8%AA%E7%9F%A9%E9%98%B5%E7%9B%B8%E5%8A%A0%EF%BC%8C%E5%A4%A7%E4%BA%8E0%E7%9A%84%E4%B8%BA1%EF%BC%8C%E4%B8%8D%E5%A4%A7%E4%BA%8E0%E7%9A%84%E4%B8%BA0%EF%BC%8C%E4%B8%BA1%E7%9A%84%E5%9C%A8%E4%B9%8B%E5%90%8E%E5%B0%B1%E4%BC%9A%E8%A2%ABfill%E5%88%B0%E6%97%A0%E9%99%90%E5%B0%8F%0A%20%20%20%20%20%20%20%20dec_self_attn_mask%20%3D%20torch.gt%28%28dec_self_attn_pad_mask%20%2B%20dec_self_attn_subsequent_mask%29%2C%200%29%0A%0A%20%20%20%20%20%20%20%20%23%23%20%E8%BF%99%E4%B8%AA%E5%81%9A%E7%9A%84%E6%98%AF%E4%BA%A4%E4%BA%92%E6%B3%A8%E6%84%8F%E5%8A%9B%E6%9C%BA%E5%88%B6%E4%B8%AD%E7%9A%84mask%E7%9F%A9%E9%98%B5%EF%BC%8Cenc%E7%9A%84%E8%BE%93%E5%85%A5%E6%98%AFk%EF%BC%8C%E6%88%91%E5%8E%BB%E7%9C%8B%E8%BF%99%E4%B8%AAk%E9%87%8C%E9%9D%A2%E5%93%AA%E4%BA%9B%E6%98%AFpad%E7%AC%A6%E5%8F%B7%EF%BC%8C%E7%BB%99%E5%88%B0%E5%90%8E%E9%9D%A2%E7%9A%84%E6%A8%A1%E5%9E%8B%EF%BC%9B%E6%B3%A8%E6%84%8Fq%E8%82%AF%E5%AE%9A%E4%B9%9F%E6%9C%89pad%E7%AC%A6%E5%8F%B7%EF%BC%8C%E4%BD%86%E8%BF%99%E9%87%8C%E6%98%AF%E4%B8%8D%E5%9C%A8%E6%84%8F%E7%9A%84%EF%BC%8C%E4%B9%8B%E5%89%8D%E8%AF%B4%E4%BA%86%E5%A5%BD%E5%A4%9A%E6%AC%A1%E4%BA%86%E5%93%88%0A%20%20%20%20%20%20%20%20dec_enc_attn_mask%20%3D%20get_attn_pad_mask%28dec_inputs%2C%20enc_inputs%29%0A%0A%20%20%20%20%20%20%20%20dec_self_attns%2C%20dec_enc_attns%20%3D%20%5B%5D%2C%20%5B%5D%0A%20%20%20%20%20%20%20%20for%20layer%20in%20self.layers%3A%0A%20%20%20%20%20%20%20%20%20%20%20%20dec_outputs%2C%20dec_self_attn%2C%20dec_enc_attn%20%3D%20layer%28dec_outputs%2C%20enc_outputs%2C%20dec_self_attn_mask%2C%20dec_enc_attn_mask%29%0A%20%20%20%20%20%20%20%20%20%20%20%20dec_self_attns.append%28dec_self_attn%29%0A%20%20%20%20%20%20%20%20%20%20%20%20dec_enc_attns.append%28dec_enc_attn%29%0A%20%20%20%20%20%20%20%20return%20dec_outputs%2C%20dec_self_attns%2C%20dec_enc_attns%0A%60%60%60%0A%0A%E5%88%9D%E5%A7%8B%E5%8C%96%E5%87%BD%E6%95%B0%60def%20__init%28self%29%60%E7%9A%84%E4%BB%8B%E7%BB%8D%E5%86%99%E5%9C%A8%E4%BA%86%E4%BB%A3%E7%A0%81%E4%B8%AD%EF%BC%8C%E4%B8%8B%E9%9D%A2%E7%9B%B4%E6%8E%A5%E4%BB%8B%E7%BB%8D%E5%AE%9E%E7%8E%B0%E5%87%BD%E6%95%B0forward%EF%BC%9A%0A%0ADecoder%E5%B1%82%E5%9C%A8%E8%87%AA%E6%B3%A8%E6%84%8F%E5%8A%9B%E5%B1%82%E5%81%9A2%E4%B8%AAmask%EF%BC%8C%E4%B8%80%E4%B8%AA%E6%98%AF%E5%AF%B9%E8%87%AA%E8%BA%ABpad%E7%AC%A6%E5%8F%B7%E7%9A%84mask%EF%BC%8C%E4%B8%80%E4%B8%AA%E6%98%AF%E5%AF%B9%E5%BD%93%E5%89%8D%E5%8D%95%E8%AF%8D%E5%90%8E%E7%9C%8B%E4%B8%8D%E5%88%B0%E7%9A%84%E5%8D%95%E8%AF%8D%E7%9A%84mask%EF%BC%9B%E5%9C%A8%E4%BA%A4%E4%BA%92%E6%B3%A8%E6%84%8F%E5%8A%9B%E5%B1%82%EF%BC%8C%E5%8F%AA%E5%AF%B9%E7%BC%96%E7%A0%81%E5%B1%82%E9%82%A3%E4%BA%9B%E6%98%AFpad%E9%83%A8%E5%88%86%E7%9A%84%E5%8D%95%E8%AF%8D%E5%81%9Amask%E3%80%82%0A%0A%E9%A6%96%E5%85%88%E6%98%AF%E5%B0%86Decoder%E7%9A%84inputs%E4%B8%AD%20pad%E7%AC%A6%E5%8F%B7%E7%9A%84%E9%83%A8%E5%88%86%E7%BD%AE%E4%B8%BA1%EF%BC%8C%E6%9C%80%E5%90%8E%E5%BE%97%E5%88%B0%E7%AC%A6%E5%8F%B7%E7%9F%A9%E9%98%B5%EF%BC%9A%0A%0A%60%60%60python%0Adec_self_attn_pad_mask%20%3D%20get_attn_pad_mask%28dec_inputs%2C%20dec_inputs%29%0A%60%60%60%0A%0A%E7%84%B6%E5%90%8E%E6%98%AF%E5%AF%B9%E5%BD%93%E5%89%8D%E5%8D%95%E8%AF%8D%E4%B9%8B%E5%90%8E%E7%9C%8B%E4%B8%8D%E5%88%B0%E7%9A%84%E5%8D%95%E8%AF%8D%E5%81%9Amask%EF%BC%9A%0A%0A%60%60%60python%0Adec_self_attn_subsequent_mask%20%3D%20get_attn_subsequent_mask%28dec_inputs%29%0A%60%60%60%0A%0A%E8%BF%99%E8%A1%8C%E4%BB%A3%E7%A0%81%E5%BE%97%E5%88%B0%E4%B8%80%E4%B8%AA%E4%B8%8A%E4%B8%89%E8%A7%92%E4%B8%BA1%E7%9A%84%E7%9F%A9%E9%98%B5%EF%BC%8C%E5%A6%82%E5%9B%BE%E6%89%80%E7%A4%BA%EF%BC%9A%0A%0A%3Cimg%20src%3D%22Transformer%E4%BB%A3%E7%A0%81%E8%AE%B2%E8%A7%A3.assets%2F%E4%B8%8A%E4%B8%89%E8%A7%92%E4%B8%BA1.png%22%20alt%3D%22%E4%B8%8A%E4%B8%89%E8%A7%92%E4%B8%BA1%22%20style%3D%22zoom%3A%2033%25%3B%22%20%2F%3E%0A%0A%E8%BE%93%E5%85%A5%E4%B8%BAS%E6%97%B6%E5%8F%AA%E8%83%BD%E7%9C%8B%E5%88%B0S%EF%BC%8C%E7%9C%8B%E4%B8%8D%E5%88%B0%E5%8D%B7%E3%80%81%E8%B5%B7%E3%80%81%E6%9D%A5%EF%BC%9B%E8%BE%93%E5%85%A5%E4%B8%BA%E5%8D%B7%E6%97%B6%EF%BC%8C%E5%8F%AA%E8%83%BD%E7%9C%8B%E5%88%B0S%E5%92%8C%E5%8D%B7%EF%BC%8C%E7%9C%8B%E4%B8%8D%E5%88%B0%E8%B5%B7%E3%80%81%E6%9D%A5%EF%BC%8C%E5%90%8E%E9%9D%A2%E7%9A%84%E8%BE%93%E5%85%A5%E4%B9%9F%E4%BB%A5%E6%AD%A4%E7%B1%BB%E6%8E%A8%E3%80%82%0A%0A%E4%B9%8B%E5%90%8E%E5%B0%86%E7%AC%A6%E5%8F%B7%E7%9F%A9%E9%98%B5%E5%92%8C%E4%B8%8A%E4%B8%89%E8%A7%92%E7%9F%A9%E9%98%B5%E7%9B%B8%E5%8A%A0%EF%BC%9A%0A%0A%60%60%60python%0Adec_self_attn_mask%20%3D%20torch.gt%28%28dec_self_attn_pad_mask%20%2B%20dec_self_attn_subsequent_mask%29%2C%200%29%0A%60%60%60%0A%0A%E7%9B%B8%E5%8A%A0%E5%90%8E%EF%BC%8C%E7%9F%A9%E9%98%B5%E4%B8%AD%E5%A4%A7%E4%BA%8E0%E7%9A%84%E5%85%83%E7%B4%A0%E4%B8%BA1%EF%BC%8C%E4%B8%8D%E5%A4%A7%E4%BA%8E0%E7%9A%84%E4%B8%BA0%EF%BC%8C%E4%B8%BA1%E7%9A%84%E5%9C%A8%E4%B9%8B%E5%90%8E%E5%B0%B1%E4%BC%9A%E8%A2%ABfill%E5%88%B0%E6%97%A0%E9%99%90%E5%B0%8F%E3%80%82%E8%BF%99%E6%A0%B7%E5%BE%97%E5%88%B0%E7%9A%84%E4%B9%9F%E6%98%AF%E4%B8%AA%E7%AC%A6%E5%8F%B7%E7%9F%A9%E9%98%B5%EF%BC%8C%E7%9F%A9%E9%98%B5%E4%B8%AD%E4%B8%BA1%E7%9A%84%E9%83%A8%E5%88%86%E5%B0%B1%E6%98%AF%E8%A2%ABmask%E7%9A%84%E9%83%A8%E5%88%86%EF%BC%8C%E4%B9%9F%E5%B0%B1%E6%98%AF%E8%A6%81%E5%BF%BD%E7%95%A5%E7%9A%84%E9%83%A8%E5%88%86%E3%80%82%0A%0A%E6%8E%A5%E7%9D%80%E6%98%AF%E5%81%9A%E4%BA%A4%E4%BA%92%E6%B3%A8%E6%84%8F%E5%8A%9B%E5%B1%82%E7%9A%84mask%E7%9F%A9%E9%98%B5%EF%BC%8C%E5%89%8D%E9%9D%A2%E8%AF%B4%E4%BA%A4%E4%BA%92%E6%B3%A8%E6%84%8F%E5%8A%9B%E5%B1%82%E5%8F%AA%E5%AF%B9%E7%BC%96%E7%A0%81%E5%B1%82%E9%82%A3%E4%BA%9B%E6%98%AFpad%E9%83%A8%E5%88%86%E7%9A%84%E5%8D%95%E8%AF%8D%E5%81%9Amask%EF%BC%8C%E8%BF%99%E9%87%8C%E7%9A%84%E8%BE%93%E5%85%A5enc_inputs%E5%B0%B1%E6%98%AF%E5%91%8A%E8%AF%89%E8%A7%A3%E7%A0%81%E7%AB%AF%E5%93%AA%E4%BA%9B%E9%83%A8%E5%88%86%E6%98%AFpad%E7%AC%A6%E5%8F%B7%EF%BC%9A%0A%0A%60%60%60python%0Adec_enc_attn_mask%20%3D%20get_attn_pad_mask%28dec_inputs%2C%20enc_inputs%29%0A%60%60%60%0A%0A%E7%9F%A5%E9%81%93%E5%93%AA%E4%BA%9B%E4%BD%8D%E7%BD%AE%E6%98%AFpad%E7%AC%A6%E5%8F%B7%E5%90%8E%EF%BC%8C%E6%8A%8A%E8%BF%99%E4%BA%9B%E4%BD%8D%E7%BD%AE%E7%9A%84%E5%85%83%E7%B4%A0%E7%BD%AE%E4%B8%BA1%EF%BC%8C%E5%B0%B1%E5%BE%97%E5%88%B0%E4%BA%86mask%E7%9F%A9%E9%98%B5%E3%80%82%0A%0A%E6%9C%80%E5%90%8E%E6%98%AFfor%E5%BE%AA%E7%8E%AF%EF%BC%8C%E5%85%B6%E4%B8%AD%E4%B8%BB%E8%A6%81%E6%98%AFDecoderLayer%E5%B1%82%EF%BC%8C%E4%BB%A3%E7%A0%81%E5%AE%9E%E7%8E%B0%E5%A6%82%E4%B8%8B%EF%BC%9A%0A%0A%60%60%60python%0A%23%23%2010.%0Aclass%20DecoderLayer%28nn.Module%29%3A%0A%20%20%20%20def%20__init%28self%29%3A%0A%20%20%20%20%20%20%20%20super%28DecoderLayer%2C%20self%29.__init%28%29%0A%20%20%20%20%20%20%20%20self.dec_self_attn%20%3D%20MultiHeadAttention%28%29%20%23%20%E8%87%AA%E6%B3%A8%E6%84%8F%E5%8A%9B%E5%B1%82%0A%20%20%20%20%20%20%20%20self.dec_enc_attn%20%3D%20MultiHeadAttention%28%29%20%23%20%E4%BA%A4%E4%BA%92%E6%B3%A8%E6%84%8F%E5%8A%9B%E5%B1%82%0A%20%20%20%20%20%20%20%20self.pos_ffn%20%3D%20PoswiseFeedForwardNet%28%29%20%23%20%E5%89%8D%E9%A6%88%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C%0A%0A%20%20%20%20def%20forward%28self%2C%20dec_inputs%2C%20enc_outputs%2C%20dec_self_attn_mask%2C%20dec_enc_attn_mask%29%3A%0A%20%20%20%20%20%20%20%20dec_outputs%2C%20dec_self_attn%20%3D%20self.dec_self_attn%28dec_inputs%2C%20dec_inputs%2C%20dec_inputs%2C%20dec_self_attn_mask%29%0A%20%20%20%20%20%20%20%20dec_outputs%2C%20dec_enc_attn%20%3D%20self.dec_enc_attn%28dec_outputs%2C%20enc_outputs%2C%20enc_outputs%2C%20dec_enc_attn_mask%29%0A%20%20%20%20%20%20%20%20dec_outputs%20%3D%20self.pos_ffn%28dec_outputs%29%0A%20%20%20%20%20%20%20%20return%20dec_outputs%2C%20dec_self_attn%2C%20dec_enc_attn%0A%60%60%60%0A%0A%E6%95%B4%E4%B8%AA%E8%BF%87%E7%A8%8B%E4%B9%9F%E5%92%8CEncoder%E4%B8%AD%E7%9A%84%E5%B7%AE%E4%B8%8D%E5%A4%9A%E3%80%82%0A)