预训练语言模型的特点及主要组成部分 以GPT为代表的基于自回归的预训练语言模型 以BERT为例的基于自编码的预训练语言模型 结合自然语言任务为例子,介绍预训练语言模型在下游任务中的应用任务

7.1 概述

预训练模型并非NLP“首创”—其来源于CV的迁移学习,首先,以ImageNet为代表的大规模图像数据先进性一次预训练,让模型从海量图像中充分学习如何从图像中提取特征;然后,会根据具体的目标任务,使用相应的领域数据精调,使模型进一步靠近目标任务场景,起到领域适配和任务适配的作用。
广义上的预训练语言模型:泛指提前经过大规模数据训练的语言模型,早期包括Word2Vec、GloVe为代表的静态词向量模型,以及基于上下文建模的CoVe、ELMo等动态词向量模型。
自2018年后,以GPT和BERT为代表的基于深层Transformers的表示模型出现后,预训练语言模型被大家所熟知,目前在NLP领域中提到的预训练语言模型大多指此类模型。
预训练语言模型被认为是近些年来自然语言处理领域中的里程碑事件

相比传统的文本表示模型,预训练语言模型具有“三大”特点——大数据、大模型和大算力

7.1.1 大数据

要想学习更加丰富的文本语义表示,就需要获取文本在不同上下文中出现的情况,因此大规模的文本数据是必不可少的。
预训练数据需要讲究“保质”和“保量”

  • “保质”是希望预训练语料的质量要尽可能高,避免混入过多的低质量语料。这与训练普通的自然语言处理模型的标准基本是一致的;
  • ”保量”是希望预训练语料的规模要尽可能大,从而获取更丰富的上下文信息。

在实际情况中,预训练数据往往来源不同。精细化地预处理所有不同来源的数据是非常困难的。通常不会进行非常精细化地处理,仅会预处理语料的共性问题。通过增大语料规模进一步稀释低质量语料的比重,从而降低质量较差的语料对预训练过程带来的负面影响。
预训练语料的质量越高,训练出来的预训练语言模型的质量也相对更好,这需要在数据处理投入和数据质量之间做出权衡。

7.1.2 大模型

数据规模和模型规模在一定程度上是正相关的。
设计一个参数量较大的模型如何考虑:

  • 模型需要具有较高的并行程度,以弥补大模型带来的训练速度下降的问题;
  • 模型能够捕获并构建上下文信息,以充分挖掘大数据文本中丰富的语义信息。

综合以上两点条件,基于Transformer的神经网络模型成为目前构建预训练语言模型的最佳选择。

选择Transformers的理由

  • Transformer模型具有较高的并行程度,其核心部分的多头注意力机制不依赖于顺序建模,可以快速并行处理
  • Transformer中的多头自注意力机制能够有效地捕获不同词之间的关联程度,并且能够通过多头机制从不同维度刻画这种关联程度,使得模型能够得到更加精准的计算结果。

7.1.3 大算力

拥有了大数据和大模型,但如果没有与之相匹配的大算力,预训练语言模型也很难得以实现。
采用图形处理单元(GraphicsProcessing Unit,GPU)张量处理单元(Tensor Processing Unit,TPU)训练

为什么不用中央处理器CPU来运行深度学习任务

因为CPU和GPU擅长的任务类型是不同的,CPU擅长处理串行运算以及逻辑控制和跳转,而GPU更擅长大规模并行运算。由于深度学习中经常涉及大量的矩阵或张量之间的计算,并且这些计算是可以并行完成的,所以特别适合用GPU处理。

知名GPU品牌

深度学习领域应用范围最广的GPU品牌是英伟达(NVIDIA)。英伟达生产的GPU依靠与之匹配的统一计算设备架构(Compute UnifiedDevice Archi-tecture,CUDA)能够更好地处理复杂的计算问题,同时深度优化多种深度学习基本运算指令。大家熟知的PyTorch、TensorFlow等主流的深度学习框架均提供了基于CUDA的GPU运算支持,并提供了更高层、更抽象的调用方式,使得用户可以更方便地编写深度学习程序。

最受欢迎的深度学习设备

目前广受欢迎的深度学习设备是英伟达Volta系列硬件,其中最为人熟知的型号是V100,其在深度学习框架下的浮点运算性能达到了125 TFLOPS(以NVLink版为例)。V100的人工智能推理吞吐量比CPU高出20倍以上,并且在高性能计算(High Performance Computing,HPC)方面相比CPU高出100倍以上

聊一聊TPU

张量处理单元(TPU)[16]是谷歌公司近年定制开发的专用集成电路(Appli-cation SpecificIntegrated Circuit,ASIC),专门用于加快机器学习任务的训练,但在早期并没有像GPU那样被广为熟知。以前在其他硬件平台上需要花费数周时间进行训练的深度学习模型,在TPU上只需数小时即可完成训练。
目前,TPU主要支持TensorFlow深度学习框架,并逐步完善对PyTorch深度学习框架的支持,基本满足了大多数相关从业人员的需求。
目前,TPU只能通过谷歌云服务器访问使用,无法像GPU一样自行采购使用。一张TPU v2的每小时使用费用是4.5美元,而TPU v3是8美元,价格较为昂贵。不过,对于想体验TPU的用户来说,谷歌公司推出的Colab在线编程平台是一个很好的选择。

7.2 GPT

OpenAI 公司在2018年提出了一种生成式预训练(Generative Pre-Training,GPT)模型用来提升自然语言理解任务的效果,正式将自然语言处理带入“预训练”时代。[GPT论文]

GPT出现带来了什么

  • 正式将自然语言处理带入“预训练”时代,“预训练”时代意味着利用更大规模的文本数据以及更深层的神经网络模型学习更丰富的文本语义表示。
  • GPT的出现打破了自然语言处理各个任务之间的壁垒,使得搭建一个面向特定任务的自然语言处理模型不再需要了解非常多的任务背景,只需要根据任务的输入输出形式应用这些预训练语言模型,就能够达到一个不错的效果。
  • 自然语言处理新范式:预训练+精调
    • 生成式预训练:在大规模文本数据上训练一个高容量的语言模型,从而学习更加丰富的上下文信息;
    • 判别式任务精调:将预训练好的模型适配到下游任务中,并使用有标注数据学习判别式任务。

7.2.1 无监督预训练

GPT的整体结构是一个基于Transformer的单向语言模型,即从左至右对输入文本建模。
image.png

GPT利用常规语言建模的方法优化给定文本序列第七章 预训练语言模型 - 图2的最大似然估计第七章 预训练语言模型 - 图3
第七章 预训练语言模型 - 图4
式中:
k表示语言模型的窗口大小,即基于k个历史词第七章 预训练语言模型 - 图5%22%20aria-hidden%3D%22true%22%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-78%22%20x%3D%220%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%3Cg%20transform%3D%22translate(572%2C-150)%22%3E%0A%20%3Cuse%20transform%3D%22scale(0.707)%22%20xlink%3Ahref%3D%22%23E1-MJMATHI-69%22%20x%3D%220%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20transform%3D%22scale(0.707)%22%20xlink%3Ahref%3D%22%23E1-MJMAIN-2212%22%20x%3D%22345%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20transform%3D%22scale(0.707)%22%20xlink%3Ahref%3D%22%23E1-MJMATHI-6B%22%20x%3D%221124%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%3C%2Fg%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMAIN-2E%22%20x%3D%221836%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMAIN-2E%22%20x%3D%222281%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMAIN-2E%22%20x%3D%222726%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%3Cg%20transform%3D%22translate(3171%2C0)%22%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-78%22%20x%3D%220%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%3Cg%20transform%3D%22translate(572%2C-150)%22%3E%0A%20%3Cuse%20transform%3D%22scale(0.707)%22%20xlink%3Ahref%3D%22%23E1-MJMATHI-69%22%20x%3D%220%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20transform%3D%22scale(0.707)%22%20xlink%3Ahref%3D%22%23E1-MJMAIN-2212%22%20x%3D%22345%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20transform%3D%22scale(0.707)%22%20xlink%3Ahref%3D%22%23E1-MJMAIN-31%22%20x%3D%221124%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%3C%2Fg%3E%0A%3C%2Fg%3E%0A%3C%2Fg%3E%0A%3C%2Fsvg%3E#card=math&code=x%7Bi-k%7D…x%7Bi-1%7D&id=mOSyy)预测当前时刻的词第七章 预训练语言模型 - 图6
第七章 预训练语言模型 - 图7表示神经网络模型的参数,可使用随机梯度下降法优化该似然函数。
对于长度为k的窗口词序列第七章 预训练语言模型 - 图8,通过以下方式计算建模概率P。
第七章 预训练语言模型 - 图9
第七章 预训练语言模型 - 图10
第七章 预训练语言模型 - 图11
式中:
第七章 预训练语言模型 - 图12表示的独热向量表示
第七章 预训练语言模型 - 图13表示词向量矩阵
第七章 预训练语言模型 - 图14表示位置向量矩阵
第七章 预训练语言模型 - 图15表示Transformer的总层数

7.2.2 有监督下游任务精调

精调(Fine-tuning)的目的是在通用语义表示的基础上,根据下游任务(Downstream task)的特性进行领域适配,使之与下游任务的形式更加契合,以获得更好的下游任务应用效果。

下游任务精调通常是由有标注数据进行训练和优化的。假设下游任务的标注数据为第七章 预训练语言模型 - 图16,其中每个样例的输入是第七章 预训练语言模型 - 图17构成的长度为n的文本序列,与之对应的标签为第七章 预训练语言模型 - 图18。首先将文本序列输入预训练的GPT中,获取最后一层的最后一个词对应的隐含层输出。紧接着,将该隐含层输出通过一层全连接层变换,预测最终的标签。
第七章 预训练语言模型 - 图19
式中:
第七章 预训练语言模型 - 图20表示全连接层权重
k表示标签个数
最终,通过优化一下损失函数精调下游任务。
第七章 预训练语言模型 - 图21
另外,为了进一步提升精调后模型的通用性以及收敛速度,可以在下游任务精调时加入一定权重的预训练任务损失。这样做是为了缓解在下游任务精调的过程中出现灾难性遗忘(Catastrophic Forgetting)问题。因为在下游任务精调过程中,GPT的训练目标是优化下游任务数据上的效果,更强调特殊性。因此,势必会对预训练阶段学习的通用知识产生部分的覆盖或擦除,丢失一定的通用性。通过结合下游任务精调损失和预训练任务损失,可以有效地缓解灾难性遗忘问题,在优化下游任务效果的同时保留一定的通用性。在实际应用中,可通过下式精调下游任务
第七章 预训练语言模型 - 图22
式中:
第七章 预训练语言模型 - 图23表示精调任务损失
第七章 预训练语言模型 - 图24表示预训练任务损失
第七章 预训练语言模型 - 图25表示权重,通常取值介于第七章 预训练语言模型 - 图26
特别地,当第七章 预训练语言模型 - 图27时,第七章 预训练语言模型 - 图28一项无效,表示只使用精调任务损失第七章 预训练语言模型 - 图29优化下游任务,当第七章 预训练语言模型 - 图30时,第七章 预训练语言模型 - 图31%22%20aria-hidden%3D%22true%22%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-4C%22%20x%3D%220%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%3Cg%20transform%3D%22translate(681%2C412)%22%3E%0A%20%3Cuse%20transform%3D%22scale(0.707)%22%20xlink%3Ahref%3D%22%23E1-MJMATHI-50%22%20x%3D%220%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20transform%3D%22scale(0.707)%22%20xlink%3Ahref%3D%22%23E1-MJMATHI-54%22%20x%3D%22751%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%3C%2Fg%3E%0A%3C%2Fg%3E%0A%3C%2Fsvg%3E#card=math&code=L%5E%7BPT%7D&id=WjUZv)和第七章 预训练语言模型 - 图32具有相同的权重。在实际应用追踪,通常设置第七章 预训练语言模型 - 图33,因为在精调下游任务过程中,主要目的还是要优化有标注数据集的效果,即优化第七章 预训练语言模型 - 图34
第七章 预训练语言模型 - 图35引入主要是为了提升精调模型的通用性,其重要程度不及第七章 预训练语言模型 - 图36,因此设置第七章 预训练语言模型 - 图37是一个较为合理的值(但不同任务之间可能有一定的区别)。

7.2.3 适配不同的下游任务

不同任务之间的输入形式各不相同,应如何根据不同任务适配GPT的输入形式成为一个问题。
本节介绍自然语言处理中几种典型的任务在GPT中的输入输出形式,其中包括:单句文本分类、文本蕴含、相似度计算和选择型阅读理解

待完善
(1)单句文本分类

(2)文本蕴含

(3)相似度计算

(4)选择型阅读理解

7.3 BERT

BERT(Bidirectional Encoder Representation fromTransformers)是由Devlin等人在2018年提出的基于深层Transformer的预训练语言模型。[BERT论文]
BERT不仅充分利用了大规模无标注文本来挖掘其中丰富的语义信息,同时还进一步加深了自然语言处理模型的深度。

7.3.1 整体架构

BERT的基本模型结构由多层Transformer构成,包含两个预训练任务:掩码语言模型(Masked Language Model,MLM)下一个句子预测(Next Sentence Prediction,NSP)
image.png
模型输入由两端文本X1和X2拼接组成,然后通过BERT建模得到上下文语义表示,最后学习MLM和NSP。
注意:对于MLM的输入形式无特殊要求,可以是一段文本也可以是两短文本;而NSP的要求模型的输入是两端文本。

7.3.2 输入表示

BERT的输入表示(Input Representation)由词向量(TokenEmbeddings)块向量(Segment Embeddings)位置向量(Position Embeddings)之和组成
image.png
在BERT中三种向量维度均为e,因此可以通过以下计算输入序列对应的输入表示第七章 预训练语言模型 - 图40
第七章 预训练语言模型 - 图41
式中:第七章 预训练语言模型 - 图42第七章 预训练语言模型 - 图43第七章 预训练语言模型 - 图44分别表示词向量、块向量、位置向量;三种向量的大小均为第七章 预训练语言模型 - 图45
第七章 预训练语言模型 - 图46表示序列最大长度,e表示词向量维度。

(1)词向量:与传统神经网络模型类似,BERT中的词向量同样通过词向量矩阵将输入文本转换成实值向量表示。假设输入序列x对应的独热向量表示为第七章 预训练语言模型 - 图47,其对应的词向量表示第七章 预训练语言模型 - 图48为:
第七章 预训练语言模型 - 图49
其中,第七章 预训练语言模型 - 图50表示可训练的词向量矩阵;第七章 预训练语言模型 - 图51表示词表大小;第七章 预训练语言模型 - 图52表示词向量维度。

(2)块向量:块向量用来编码当前词属于哪一个块(Segment)。输入序列中每个词对应的块编码(SegmentEncoding)为当前词所在块的序号(从0开始计数)。

  • 当输入序列是单个块时(如单句文本分类),所有词的块编码均为0;
  • 当输入序列是两个块时(如句对文本分类),第一个句子中每个词对应的块编码为0,第二个句子中每个词对应的块编码为1。

需要注意的是,[CLS]位(输入序列中的第一个标记)和第一个块结尾处的[SEP]位(用于分隔不同块的标记)的块编码均为0。
接下来,利用块向量矩阵Ws将块编码第七章 预训练语言模型 - 图53转换为实值向量,得到块向量第七章 预训练语言模型 - 图54
第七章 预训练语言模型 - 图55
其中,第七章 预训练语言模型 - 图56表示可训练的词向量矩阵;第七章 预训练语言模型 - 图57表示块数量;第七章 预训练语言模型 - 图58表示词向量维度。

(3)位置向量
位置向量用来编码每个词的绝对位置。将输入序列中的每个词按照其下标顺序依次转换为位置独热编码。下一步,利用位置向量矩阵第七章 预训练语言模型 - 图59将位置独热编码第七章 预训练语言模型 - 图60转换为实值向量,得到位置向量第七章 预训练语言模型 - 图61
第七章 预训练语言模型 - 图62
其中,第七章 预训练语言模型 - 图63表示可训练的词向量矩阵;第七章 预训练语言模型 - 图64表示最大位置长度;第七章 预训练语言模型 - 图65表示词向量维度。

方便表述,后续输入表示层的操作统一归纳为式:
第七章 预训练语言模型 - 图66
对于给定的原始输入序列第七章 预训练语言模型 - 图67,经过如下处理得到BERT的输入表示第七章 预训练语言模型 - 图68
第七章 预训练语言模型 - 图69
其中,表示输入表示层的最终输出结果,第七章 预训练语言模型 - 图70即词向量、块向量和位置向量之和;第七章 预训练语言模型 - 图71 表示最大序列长度;第七章 预训练语言模型 - 图72表示输入表示维度。

7.3.3 基本预训练任务

与GPT不同的是,BERT并没有采用传统的基于自回归的语言建模方法,而是引入了基于自编码(Auto-Encoding)的预训练任务进行训练。

自回归和自编码的区别: https://zhuanlan.zhihu.com/p/415105799

1.掩码语言模型
传统基于条件概率建模的语言模型只能从左至右(顺序[3])或者是从右至左(逆序)建模文本序列。如果同时进行顺序建模和逆序建模文本,则会导致信息泄露。顺序建模表示根据“历史”的词预测“未来”的词。与之相反,逆序建模是根据“未来”的词预测“历史”的词。如果对上述两者同时建模则会导致在顺序建模时“未来”的词已被逆序建模暴露,进而语言模型倾向于从逆序建模中直接输出相应的词,而非通过“历史”词推理预测,从而使得整个语言模型变得非常简单,无法学习深层次的语义信息。对于逆序建模,同样会遇到类似的问题。
为解决这类问题,上一章的ELMo模型采用独立的前向和后向两个语言模型建模。
借鉴ELMo模型的思路,为了真正实现文本的双向建模,即当前时刻的预测同时依赖于“历史”和“未来”,BERT采用了一种类似完形填空(Cloze)的做法,并称之为掩码语言模型(MLM)。MLM预训练任务直接将输入文本中的部分单词掩码(Mask),并通过深层Transformer模型还原为原单词,从而避免了双向语言模型带来的信息泄露问题,迫使模型使用被掩码词周围的上下文信息还原掩码位置的词。

在BERT中,采用了**15%**的掩码比例,即输入序列中15%的WordPieces子词被掩码。

使用Mask会出现什么矛盾,BERT作者是如何解决的

当掩码时,模型使用 [MASK]标记替换原单词以表示该位置已被掩码。然而,这样会造成预训练阶段和下游任务精调阶段之间的不一致性,因为人为引入的[MASK]标记并不会在实际的下游任务中出现。为了缓解这个问题,当对输入序列掩码时,并非总是将其替换为 [MASK]标记,而会按概率选择以下三种操作中的一种:

  • 以80%的概率替换为 [MASK]标记;
  • 以10%的概率替换为词表中的任意一个随机词;
  • 以10%的概率保持原词不变,即不替换。

可以看到,当要预测[MASK]标记对应的单词时,模型不仅需要理解当前空缺位置之前的词,同时还要理解空缺位置之后的词,从而达到了双向语言建模的目的。
image.png

建模方法

(1)输入层:由于掩码语言模型并不要求输入一定是两段文本,为了描述方便,假设原始输入文本为第七章 预训练语言模型 - 图74,通过上述方法掩码后的输入文本为第七章 预训练语言模型 - 图75第七章 预训练语言模型 - 图76表示输入文本的第i个词,第七章 预训练语言模型 - 图77表示经过掩码处理后的第第七章 预训练语言模型 - 图78个词。对掩码后的输入文本进行如下处理,得到BERT的输入表示第七章 预训练语言模型 - 图79
第七章 预训练语言模型 - 图80
第七章 预训练语言模型 - 图81
其中,[CLS]表示文本序列开始的特殊记号,[SEP]表示文本序列之间的分隔标记。
需要注意的是,如果输入文本的长度n小于BERT的最大序列长度N,需要将补齐标记(Padding Token)[PAD]拼接在输入文本后,直至达到BERT的最大序列长度N。例如,在下面的例子中,假设BERT的最大序列长度N=10,而输入序列长度为7(两个特殊标记加上第七章 预训练语言模型 - 图82第七章 预训练语言模型 - 图83),需要在输入序列后方添加3个[PAD]补齐标记。
第七章 预训练语言模型 - 图84
而如果输入序列X 的长度大于BERT的最大序列长度N,需要对输入序列X截断至BERT的最大序列长度N。例如,在下面的例子中,假设BERT的最大序列长度N=5,而输入序列长度为7(两个特殊标记加上第七章 预训练语言模型 - 图85第七章 预训练语言模型 - 图86),需要对序列截断,使有效序列(输入序列中去除2个特殊标记)长度变为3。
第七章 预训练语言模型 - 图87
为了描述方便,后续将忽略补齐标记 [PAD]的处理,并以N 表示最大序列长度。

(2)BERT编码层:在 BERT 编码层中,BERT 的输入表示第七章 预训练语言模型 - 图88经过 L 层Transformer,借助自注意力机制充分学习文本中的每个词之间的语义关联。
第七章 预训练语言模型 - 图89
为了描述方便,简略层与层之间的标记简化为:
第七章 预训练语言模型 - 图90
其中,第七章 预训练语言模型 - 图91表示最后一层Transformer的输出,即第七章 预训练语言模型 - 图92。通过上述方法最终得到文本的上下文语义表示第七章 预训练语言模型 - 图93,其中第七章 预训练语言模型 - 图94表示BERT的隐含层维度。

(3)输出层:由于掩码语言模型仅对输入文本中的部分词进行了掩码操作,因此并不需要预测输入文本中的每个位置,而只需预测已经掩码的位置
假设集合第七章 预训练语言模型 - 图95表示所有掩码位置的下标,k表示总掩码数量。如果输入文本长度为n,掩码比例为15%,则第七章 预训练语言模型 - 图96。然后,以集合第七章 预训练语言模型 - 图97中的元素为下标,从输入序列的上下文语义表示h中抽取出对应的表示,并将这些表示进行拼接得到掩码表示第七章 预训练语言模型 - 图98
在BERT中,由于输入表示维度e和隐含层维度d相同,可直接利用词向量矩阵第七章 预训练语言模型 - 图99将掩码表示映射到词表空间。对于掩码表示中的第i个分量第七章 预训练语言模型 - 图100,通过下式计算该掩码位置对应的词表上的概率分布Pi。
第七章 预训练语言模型 - 图101

其中,第七章 预训练语言模型 - 图102是全连接层的偏置,最后,在得到掩码位置对应的概率分布第七章 预训练语言模型 - 图103后,与标签第七章 预训练语言模型 - 图104(即原单词第七章 预训练语言模型 - 图105的独热向量表示)计算交叉熵损失,学习模型参数。

代码实现
BERT原版中的生成MLM训练数据的方法

  1. # 待补充

2.下一句预测:
在MLM预训练任务中,模型已经能够根据上下文还原掩码部分的词,从而学习上下文敏感的文本表示。然而,对于阅读理解、文本蕴含等需要两段输入文本的任务来说,仅依靠MLM无法显式地学习两段输入文本之间的关联
下一个句子预测(NSP)任务,以构建两段文本之间的关系。NSP任务是一个二分类任务,需要判断句子B是否是句子A的下一个句子[4],其训练样本由以下方式产生。

  • 正样本:来自自然文本中相邻的两个句子“句子A”和“句子B”,即构成“下一个句子”关系;
  • 负样本:将“句子B”替换为语料库中任意一个其他句子,即构成“非下一个句子”关系。

NSP任务整体的正负样本比例控制在1:1。由于NSP任务的设计原则较为简单,通过上述方法能够自动生成大量的训练样本,所以也可以看作一个无监督学习任务。
image.png
NSP任务的建模方法与MLM任务类似,主要是在输出方面有所区别。

(1)输入层
对于给定的经过掩码处理后的输入文本
第七章 预训练语言模型 - 图107
第七章 预训练语言模型 - 图108
经过如下处理,得到BERT的输入表示第七章 预训练语言模型 - 图109
第七章 预训练语言模型 - 图110
第七章 预训练语言模型 - 图111
其中,[CLS]表示文本序列开始的特殊记号,[SEP]表示文本序列之间的分隔标记。
(2)BERT编码层
和MLM一致:
第七章 预训练语言模型 - 图112
其中,第七章 预训练语言模型 - 图113,N表示最大序列长度,第七章 预训练语言模型 - 图114表示BERT的隐含层维度。

(3)输出层
与MLM任务不同的是,NSP任务只需要判断输入文本第七章 预训练语言模型 - 图115是否是第七章 预训练语言模型 - 图116的下一个句子。因此,在NSP任务中,BERT使用了[CLS]位的隐含层表示进行分类预测。具体地,[CLS]位的隐含层表示由上下文语义表示h的首个分量第七章 预训练语言模型 - 图117构成,因为[CLS]是输入序列中的第一个元素。在得到[CLS]位的隐含层表示第七章 预训练语言模型 - 图118后,通过一个全连接层预测输入文本的分类概率第七章 预训练语言模型 - 图119
第七章 预训练语言模型 - 图120
其中,第七章 预训练语言模型 - 图121是全连接层的偏置,第七章 预训练语言模型 - 图122表示全连接层的权重。
最后,在得到分类概率分布第七章 预训练语言模型 - 图123后,与真实分类标签第七章 预训练语言模型 - 图124计算交叉熵损失,学习模型参数。

7.3.4 更多与训练任务

除了上述的基本预训练任务,还可将MLM任务替换为如下两种进阶预训练任务,以进一步提升预训练难度,从而挖掘出更加丰富的文本语义信息。

1.整词掩码
在MLM任务中,最小的掩码单位是WordPiece子词(中文则是字),而这种掩码方法存在一个问题。当一个整词的部分WordPiece子词被掩码时,仅依靠未被掩码的部分可较为容易地预测出掩码位置对应的原WordPiece子词,存在一定的信息泄露。
image.png
在图7-6(a)中,模型很容易就能将掩码部分(以 [M]标记)的词预测为“果”,因为其前一个字“苹”具有较强的限定性。而在图7-6(b)中,模型可填入的两个字的词可以有很多种,相对来说难度更大。

整词掩码(Whole Word Masking,WWM)预训练任务的提出解决了Word-Piece子词信息泄露的问题。在整词掩码中,仍然沿用传统MLM任务的做法,仅在掩码方式上做了改动,即最小掩码单位由WordPiece子词变为整词。即当一个整词的部分WordPiece子词被掩码时,属于该词的其他子词也会被掩码。
image.png

(1)正确理解整词掩码。
在掩码语言模型中提到的掩码操作应理解为广义的掩码操作,即替换为[MASK]、替换为随机词和保留原词,这三种操作按照概率选择其中一种,而不能只理解为将待处理文本转换为 [MASK]标记。
当整词掩码时,容易理解为待掩码整词中的每个子词的掩码方式是一样的,实际并非如此。
在整词掩码中,当发生掩码操作时:

  • 整词中的各个子词均会被掩码处理;
  • 子词的掩码方式并不统一,并不是采用一样的掩码方式(三选一);
  • 子词各自的掩码方式受概率控制。

image.png

(2)中文整词掩码
应用WordPiece分词器时,中文将以字为粒度切分,而不存在英文中的“子词”的概念,因为中文不是由字母构成的语言,这一点与英文等拉丁语系语言存在较大差异。

在传统中文信息处理中,文本通常会经过中文分词(Chinese Word Segmentation,CWS)处理,转换为以词为粒度的序列。因此,可将中文的字(Character)类比为英文中的WordPiece子词,进而可以应用整词掩码技术。

当进行整词掩码时,掩码最小单位由字变为词,即当一个整词中的部分字被掩码时,属于该词的其他字也会被掩码。
下图使用LTP进行分词
image.png

2.N-gram掩码
为了进一步挖掘模型对连续空缺文本的还原能力,可将原始的掩码语言模型进一步扩展成基于N-gram的掩码语言模型。

N-gram掩码(N-gram Masking,NM)语言模型,顾名思义就是将连续的N-gram文本进行掩码,并要求模型还原缺失内容。需要注意的是,与整词掩码类似,N-gram掩码语言模型仅对掩码过程有影响(即只会影响选择掩码位置的过程),但仍然使用经过WordPiece分词后的序列作为模型输入。

在整词掩码语言模型中,需要识别整词的边界,而在N-gram掩码语言模型中,需要进一步识别短语级别的边界信息。此处,可以借鉴统计机器翻译(Statistical Machine Translation,SMT)中的短语表抽取(Phrase Table Extraction)方法,从语料库中抽取出高频短语。
统计所有短语是非常耗时的,Cui等人使用的N-gram掩码方法,其具体操作流程如下。

  • 首先根据掩码概率判断当前标记(Token)是否应该被掩码;
  • 当被选定为需要掩码时,进一步判断N-gram的掩码概率。此处假设最大短语长度为4-gram。为了避免连续N-gram短语被掩码导致过长文本的缺失,此处针对低元短语采用高概率,高元短语采用低概率抽取。例如,对于unigram,采用40%的概率,对于4-gram,采用10%的概率;
  • 对该标记及其之后的N−1个标记进行掩码。当不足N−1个标记时,以词边界截断;
  • 在掩码完毕后,跳过该N-gram,并对下一个候选标记进行掩码判断。

3.三种掩码策略的区别

掩码语言模型(MLM)、整词掩码(WWM)和N-gram掩码(NM)三种掩码策略之间既有一定的联系也有一定的区别。
image.png
需要特别强调的是,三种掩码策略仅影响模型的预训练阶段,而对于下游任务精调是透明的。因此,经过以上三种掩码策略得到的BERT模型是可以无缝替换的,且无须替换任何下游任务的精调代码。

7.3.5 模型对比

image.png

7.4 预训练模型的应用

7.4.1 概述

预训练语言模型的应用方式分为以下两种:

  • 特征提取:仅利用BERT提取输入文本特征,生成对应的上下文语义表示,而BERT本身不参与目标任务的训练,即BERT部分只进行解码(无梯度回传);
  • 模型精调:利用BERT作为下游任务模型基底,生成文本对应的上下文语义表示,并参与下游任务的训练。即在下游任务学习过程中,BERT对自身参数进行更新。


特征提取和模型精调的优劣

特征提取方法与传统的词向量技术类似,使用起来相对简单。同时,因为预训练语言模型不参与下游任务的训练,在训练效率上相对较高。但这种方法也有一定的缺点,因为预训练语言模型不参与下游任务的训练,本身无法根据下游任务进行适配,更多依赖于下游任务模型的设计,进一步加大了建模难度。而模型精调方法能够充分利用预训练语言模型庞大的参数量学习更多的下游任务知识,使预训练语言模型与下游任务数据更加适配。但模型精调方法也有一定的弊端,因其要求预训练语言模型参与下游任务的训练,所以需要更大的参数存储量以存储模型更新所需的梯度,进而在模型训练效率上存在一定的劣势。

通过大量的实验数据表明,模型精调方法训练出的模型效果显著优于特征提取方法。

7.4.2 单句文本分类

1.建模方法
单句文本分类(Single Sentence Classification,SSC)任务是最常见的自然语言处理任务,需要将输入文本分成不同类别。例如,在情感分类任务SST-2中,需要将影评文本输入文本分类模型中,并将其分成“褒义”或“贬义”分类标签中的一个。
image.png
(1)输入层
对于一个给定的经过WordPiece分词后的句子第七章 预训练语言模型 - 图132,进行如下处理得到BERT的原始输入X和输入层表示v。
第七章 预训练语言模型 - 图133
第七章 预训练语言模型 - 图134%3C%2Ftitle%3E%0A%3Cdefs%20aria-hidden%3D%22true%22%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-76%22%20d%3D%22M173%20380Q173%20405%20154%20405Q130%20405%20104%20376T61%20287Q60%20286%2059%20284T58%20281T56%20279T53%20278T49%20278T41%20278H27Q21%20284%2021%20287Q21%20294%2029%20316T53%20368T97%20419T160%20441Q202%20441%20225%20417T249%20361Q249%20344%20246%20335Q246%20329%20231%20291T200%20202T182%20113Q182%2086%20187%2069Q200%2026%20250%2026Q287%2026%20319%2060T369%20139T398%20222T409%20277Q409%20300%20401%20317T383%20343T365%20361T357%20383Q357%20405%20376%20424T417%20443Q436%20443%20451%20425T467%20367Q467%20340%20455%20284T418%20159T347%2040T241%20-11Q177%20-11%20139%2022Q102%2054%20102%20117Q102%20148%20110%20181T151%20298Q173%20362%20173%20380Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMAIN-3D%22%20d%3D%22M56%20347Q56%20360%2070%20367H707Q722%20359%20722%20347Q722%20336%20708%20328L390%20327H72Q56%20332%2056%20347ZM56%20153Q56%20168%2072%20173H708Q722%20163%20722%20153Q722%20140%20707%20133H70Q56%20140%2056%20153Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-49%22%20d%3D%22M43%201Q26%201%2026%2010Q26%2012%2029%2024Q34%2043%2039%2045Q42%2046%2054%2046H60Q120%2046%20136%2053Q137%2053%20138%2054Q143%2056%20149%2077T198%20273Q210%20318%20216%20344Q286%20624%20286%20626Q284%20630%20284%20631Q274%20637%20213%20637H193Q184%20643%20189%20662Q193%20677%20195%20680T209%20683H213Q285%20681%20359%20681Q481%20681%20487%20683H497Q504%20676%20504%20672T501%20655T494%20639Q491%20637%20471%20637Q440%20637%20407%20634Q393%20631%20388%20623Q381%20609%20337%20432Q326%20385%20315%20341Q245%2065%20245%2059Q245%2052%20255%2050T307%2046H339Q345%2038%20345%2037T342%2019Q338%206%20332%200H316Q279%202%20179%202Q143%202%20113%202T65%202T43%201Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-6E%22%20d%3D%22M21%20287Q22%20293%2024%20303T36%20341T56%20388T89%20425T135%20442Q171%20442%20195%20424T225%20390T231%20369Q231%20367%20232%20367L243%20378Q304%20442%20382%20442Q436%20442%20469%20415T503%20336T465%20179T427%2052Q427%2026%20444%2026Q450%2026%20453%2027Q482%2032%20505%2065T540%20145Q542%20153%20560%20153Q580%20153%20580%20145Q580%20144%20576%20130Q568%20101%20554%2073T508%2017T439%20-10Q392%20-10%20371%2017T350%2073Q350%2092%20386%20193T423%20345Q423%20404%20379%20404H374Q288%20404%20229%20303L222%20291L189%20157Q156%2026%20151%2016Q138%20-11%20108%20-11Q95%20-11%2087%20-5T76%207T74%2017Q74%2030%20112%20180T152%20343Q153%20348%20153%20366Q153%20405%20129%20405Q91%20405%2066%20305Q60%20285%2060%20284Q58%20278%2041%20278H27Q21%20284%2021%20287Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-70%22%20d%3D%22M23%20287Q24%20290%2025%20295T30%20317T40%20348T55%20381T75%20411T101%20433T134%20442Q209%20442%20230%20378L240%20387Q302%20442%20358%20442Q423%20442%20460%20395T497%20281Q497%20173%20421%2082T249%20-10Q227%20-10%20210%20-4Q199%201%20187%2011T168%2028L161%2036Q160%2035%20139%20-51T118%20-138Q118%20-144%20126%20-145T163%20-148H188Q194%20-155%20194%20-157T191%20-175Q188%20-187%20185%20-190T172%20-194Q170%20-194%20161%20-194T127%20-193T65%20-192Q-5%20-192%20-24%20-194H-32Q-39%20-187%20-39%20-183Q-37%20-156%20-26%20-148H-6Q28%20-147%2033%20-136Q36%20-130%2094%20103T155%20350Q156%20355%20156%20364Q156%20405%20131%20405Q109%20405%2094%20377T71%20316T59%20280Q57%20278%2043%20278H29Q23%20284%2023%20287ZM178%20102Q200%2026%20252%2026Q282%2026%20310%2049T356%20107Q374%20141%20392%20215T411%20325V331Q411%20405%20350%20405Q339%20405%20328%20402T306%20393T286%20380T269%20365T254%20350T243%20336T235%20326L232%20322Q232%20321%20229%20308T218%20264T204%20212Q178%20106%20178%20102Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-75%22%20d%3D%22M21%20287Q21%20295%2030%20318T55%20370T99%20420T158%20442Q204%20442%20227%20417T250%20358Q250%20340%20216%20246T182%20105Q182%2062%20196%2045T238%2027T291%2044T328%2078L339%2095Q341%2099%20377%20247Q407%20367%20413%20387T427%20416Q444%20431%20463%20431Q480%20431%20488%20421T496%20402L420%2084Q419%2079%20419%2068Q419%2043%20426%2035T447%2026Q469%2029%20482%2057T512%20145Q514%20153%20532%20153Q551%20153%20551%20144Q550%20139%20549%20130T540%2098T523%2055T498%2017T462%20-8Q454%20-10%20438%20-10Q372%20-10%20347%2046Q345%2045%20336%2036T318%2021T296%206T267%20-6T233%20-11Q189%20-11%20155%207Q103%2038%20103%20113Q103%20170%20138%20262T173%20379Q173%20380%20173%20381Q173%20390%20173%20393T169%20400T158%20404H154Q131%20404%20112%20385T82%20344T65%20302T57%20280Q55%20278%2041%20278H27Q21%20284%2021%20287Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-74%22%20d%3D%22M26%20385Q19%20392%2019%20395Q19%20399%2022%20411T27%20425Q29%20430%2036%20430T87%20431H140L159%20511Q162%20522%20166%20540T173%20566T179%20586T187%20603T197%20615T211%20624T229%20626Q247%20625%20254%20615T261%20596Q261%20589%20252%20549T232%20470L222%20433Q222%20431%20272%20431H323Q330%20424%20330%20420Q330%20398%20317%20385H210L174%20240Q135%2080%20135%2068Q135%2026%20162%2026Q197%2026%20230%2060T283%20144Q285%20150%20288%20151T303%20153H307Q322%20153%20322%20145Q322%20142%20319%20133Q314%20117%20301%2095T267%2048T216%206T155%20-11Q125%20-11%2098%204T59%2056Q57%2064%2057%2083V101L92%20241Q127%20382%20128%20383Q128%20385%2077%20385H26Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-52%22%20d%3D%22M230%20637Q203%20637%20198%20638T193%20649Q193%20676%20204%20682Q206%20683%20378%20683Q550%20682%20564%20680Q620%20672%20658%20652T712%20606T733%20563T739%20529Q739%20484%20710%20445T643%20385T576%20351T538%20338L545%20333Q612%20295%20612%20223Q612%20212%20607%20162T602%2080V71Q602%2053%20603%2043T614%2025T640%2016Q668%2016%20686%2038T712%2085Q717%2099%20720%20102T735%20105Q755%20105%20755%2093Q755%2075%20731%2036Q693%20-21%20641%20-21H632Q571%20-21%20531%204T487%2082Q487%20109%20502%20166T517%20239Q517%20290%20474%20313Q459%20320%20449%20321T378%20323H309L277%20193Q244%2061%20244%2059Q244%2055%20245%2054T252%2050T269%2048T302%2046H333Q339%2038%20339%2037T336%2019Q332%206%20326%200H311Q275%202%20180%202Q146%202%20117%202T71%202T50%201Q33%201%2033%2010Q33%2012%2036%2024Q41%2043%2046%2045Q50%2046%2061%2046H67Q94%2046%20127%2049Q141%2052%20146%2061Q149%2065%20218%20339T287%20628Q287%20635%20230%20637ZM630%20554Q630%20586%20609%20608T523%20636Q521%20636%20500%20636T462%20637H440Q393%20637%20386%20627Q385%20624%20352%20494T319%20361Q319%20360%20388%20360Q466%20361%20492%20367Q556%20377%20592%20426Q608%20449%20619%20486T630%20554Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-65%22%20d%3D%22M39%20168Q39%20225%2058%20272T107%20350T174%20402T244%20433T307%20442H310Q355%20442%20388%20420T421%20355Q421%20265%20310%20237Q261%20224%20176%20223Q139%20223%20138%20221Q138%20219%20132%20186T125%20128Q125%2081%20146%2054T209%2026T302%2045T394%20111Q403%20121%20406%20121Q410%20121%20419%20112T429%2098T420%2082T390%2055T344%2024T281%20-1T205%20-11Q126%20-11%2083%2042T39%20168ZM373%20353Q367%20405%20305%20405Q272%20405%20244%20391T199%20357T170%20316T154%20280T149%20261Q149%20260%20169%20260Q282%20260%20327%20284T373%20353Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-72%22%20d%3D%22M21%20287Q22%20290%2023%20295T28%20317T38%20348T53%20381T73%20411T99%20433T132%20442Q161%20442%20183%20430T214%20408T225%20388Q227%20382%20228%20382T236%20389Q284%20441%20347%20441H350Q398%20441%20422%20400Q430%20381%20430%20363Q430%20333%20417%20315T391%20292T366%20288Q346%20288%20334%20299T322%20328Q322%20376%20378%20392Q356%20405%20342%20405Q286%20405%20239%20331Q229%20315%20224%20298T190%20165Q156%2025%20151%2016Q138%20-11%20108%20-11Q95%20-11%2087%20-5T76%207T74%2017Q74%2030%20114%20189T154%20366Q154%20405%20128%20405Q107%20405%2092%20377T68%20316T57%20280Q55%20278%2041%20278H27Q21%20284%2021%20287Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-73%22%20d%3D%22M131%20289Q131%20321%20147%20354T203%20415T300%20442Q362%20442%20390%20415T419%20355Q419%20323%20402%20308T364%20292Q351%20292%20340%20300T328%20326Q328%20342%20337%20354T354%20372T367%20378Q368%20378%20368%20379Q368%20382%20361%20388T336%20399T297%20405Q249%20405%20227%20379T204%20326Q204%20301%20223%20291T278%20274T330%20259Q396%20230%20396%20163Q396%20135%20385%20107T352%2051T289%207T195%20-10Q118%20-10%2086%2019T53%2087Q53%20126%2074%20143T118%20160Q133%20160%20146%20151T160%20120Q160%2094%20142%2076T111%2058Q109%2057%20108%2057T107%2055Q108%2052%20115%2047T146%2034T201%2027Q237%2027%20263%2038T301%2066T318%2097T323%20122Q323%20150%20302%20164T254%20181T195%20196T148%20231Q131%20256%20131%20289Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-61%22%20d%3D%22M33%20157Q33%20258%20109%20349T280%20441Q331%20441%20370%20392Q386%20422%20416%20422Q429%20422%20439%20414T449%20394Q449%20381%20412%20234T374%2068Q374%2043%20381%2035T402%2026Q411%2027%20422%2035Q443%2055%20463%20131Q469%20151%20473%20152Q475%20153%20483%20153H487Q506%20153%20506%20144Q506%20138%20501%20117T481%2063T449%2013Q436%200%20417%20-8Q409%20-10%20393%20-10Q359%20-10%20336%205T306%2036L300%2051Q299%2052%20296%2050Q294%2048%20292%2046Q233%20-10%20172%20-10Q117%20-10%2075%2030T33%20157ZM351%20328Q351%20334%20346%20350T323%20385T277%20405Q242%20405%20210%20374T160%20293Q131%20214%20119%20129Q119%20126%20119%20118T118%20106Q118%2061%20136%2044T179%2026Q217%2026%20254%2059T298%20110Q300%20114%20325%20217T351%20328Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-69%22%20d%3D%22M184%20600Q184%20624%20203%20642T247%20661Q265%20661%20277%20649T290%20619Q290%20596%20270%20577T226%20557Q211%20557%20198%20567T184%20600ZM21%20287Q21%20295%2030%20318T54%20369T98%20420T158%20442Q197%20442%20223%20419T250%20357Q250%20340%20236%20301T196%20196T154%2083Q149%2061%20149%2051Q149%2026%20166%2026Q175%2026%20185%2029T208%2043T235%2078T260%20137Q263%20149%20265%20151T282%20153Q302%20153%20302%20143Q302%20135%20293%20112T268%2061T223%2011T161%20-11Q129%20-11%20102%2010T74%2074Q74%2091%2079%20106T122%20220Q160%20321%20166%20341T173%20380Q173%20404%20156%20404H154Q124%20404%2099%20371T61%20287Q60%20286%2059%20284T58%20281T56%20279T53%20278T49%20278T41%20278H27Q21%20284%2021%20287Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-6F%22%20d%3D%22M201%20-11Q126%20-11%2080%2038T34%20156Q34%20221%2064%20279T146%20380Q222%20441%20301%20441Q333%20441%20341%20440Q354%20437%20367%20433T402%20417T438%20387T464%20338T476%20268Q476%20161%20390%2075T201%20-11ZM121%20120Q121%2070%20147%2048T206%2026Q250%2026%20289%2058T351%20142Q360%20163%20374%20216T388%20308Q388%20352%20370%20375Q346%20405%20306%20405Q243%20405%20195%20347Q158%20303%20140%20230T121%20120Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMAIN-28%22%20d%3D%22M94%20250Q94%20319%20104%20381T127%20488T164%20576T202%20643T244%20695T277%20729T302%20750H315H319Q333%20750%20333%20741Q333%20738%20316%20720T275%20667T226%20581T184%20443T167%20250T184%2058T225%20-81T274%20-167T316%20-220T333%20-241Q333%20-250%20318%20-250H315H302L274%20-226Q180%20-141%20137%20-14T94%20250Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-58%22%20d%3D%22M42%200H40Q26%200%2026%2011Q26%2015%2029%2027Q33%2041%2036%2043T55%2046Q141%2049%20190%2098Q200%20108%20306%20224T411%20342Q302%20620%20297%20625Q288%20636%20234%20637H206Q200%20643%20200%20645T202%20664Q206%20677%20212%20683H226Q260%20681%20347%20681Q380%20681%20408%20681T453%20682T473%20682Q490%20682%20490%20671Q490%20670%20488%20658Q484%20643%20481%20640T465%20637Q434%20634%20411%20620L488%20426L541%20485Q646%20598%20646%20610Q646%20628%20622%20635Q617%20635%20609%20637Q594%20637%20594%20648Q594%20650%20596%20664Q600%20677%20606%20683H618Q619%20683%20643%20683T697%20681T738%20680Q828%20680%20837%20683H845Q852%20676%20852%20672Q850%20647%20840%20637H824Q790%20636%20763%20628T722%20611T698%20593L687%20584Q687%20585%20592%20480L505%20384Q505%20383%20536%20304T601%20142T638%2056Q648%2047%20699%2046Q734%2046%20734%2037Q734%2035%20732%2023Q728%207%20725%204T711%201Q708%201%20678%201T589%202Q528%202%20496%202T461%201Q444%201%20444%2010Q444%2011%20446%2025Q448%2035%20450%2039T455%2044T464%2046T480%2047T506%2054Q523%2062%20523%2064Q522%2064%20476%20181L429%20299Q241%2095%20236%2084Q232%2076%20232%2072Q232%2053%20261%2047Q262%2047%20267%2047T273%2046Q276%2046%20277%2046T280%2045T283%2042T284%2035Q284%2026%20282%2019Q279%206%20276%204T261%201Q258%201%20243%201T201%202T142%202Q64%202%2042%200Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMAIN-29%22%20d%3D%22M60%20749L64%20750Q69%20750%2074%20750H86L114%20726Q208%20641%20251%20514T294%20250Q294%20182%20284%20119T261%2012T224%20-76T186%20-143T145%20-194T113%20-227T90%20-246Q87%20-249%2086%20-250H74Q66%20-250%2063%20-250T58%20-247T55%20-238Q56%20-237%2066%20-225Q221%20-64%20221%20250T66%20725Q56%20737%2055%20738Q55%20746%2060%20749Z%22%3E%3C%2Fpath%3E%0A%3C%2Fdefs%3E%0A%3Cg%20stroke%3D%22currentColor%22%20fill%3D%22currentColor%22%20stroke-width%3D%220%22%20transform%3D%22matrix(1%200%200%20-1%200%200)%22%20aria-hidden%3D%22true%22%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-76%22%20x%3D%220%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMAIN-3D%22%20x%3D%22763%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-49%22%20x%3D%221819%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-6E%22%20x%3D%222324%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-70%22%20x%3D%222924%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-75%22%20x%3D%223428%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-74%22%20x%3D%224000%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-52%22%20x%3D%224362%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-65%22%20x%3D%225121%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-70%22%20x%3D%225588%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-72%22%20x%3D%226091%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-65%22%20x%3D%226543%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-73%22%20x%3D%227009%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-65%22%20x%3D%227479%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-6E%22%20x%3D%227945%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-74%22%20x%3D%228546%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-61%22%20x%3D%228907%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-74%22%20x%3D%229437%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-69%22%20x%3D%229798%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-6F%22%20x%3D%2210144%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-6E%22%20x%3D%2210629%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMAIN-28%22%20x%3D%2211230%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-58%22%20x%3D%2211619%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMAIN-29%22%20x%3D%2212472%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%3C%2Fg%3E%0A%3C%2Fsvg%3E#card=math&code=v%3DInputRepresentation%28X%29&id=AoLYE)
其中,[CLS]表示文本序列开始的特殊记号,[SEP]表示文本序列之间的分隔标记。

(2)BERT编码层
在BERT编码层中,输入表示第七章 预训练语言模型 - 图135经过多层Transformer的编码,借助自注意力机制充分学习句子中每个词之间的语义关联,并最终得到句子的上下文语义表示,并得到上下文语义表示第七章 预训练语言模型 - 图136,其中,d为BERT的隐藏层维度。
第七章 预训练语言模型 - 图137
由于,BERT预训练阶段的NSP任务使用了[CLS]位预测,通常在文本分类任务中使用同样的方法预测。模型使用[CLS]位对应的隐含层表示第七章 预训练语言模型 - 图138,其值由第七章 预训练语言模型 - 图139的首个分量的表示构成,因为[CLS]是输入序列的第一个元素。

(3)分类输出层
在得到[CLS]位的隐含层表示第七章 预训练语言模型 - 图140后,通过一个全连接层预测输入文本对应的分类标签。由下式计算概率分布第七章 预训练语言模型 - 图141
第七章 预训练语言模型 - 图142
其中,第七章 预训练语言模型 - 图143表示全连接层的权重;第七章 预训练语言模型 - 图144表示全连接层的偏置;第七章 预训练语言模型 - 图145表示分类标签数;
最后,在得到分类概率分布第七章 预训练语言模型 - 图146后,与真实分类标签y计算交叉熵损失,对模型参数进行学习。

2.代码实现
接下来将结合实际代码,介绍BERT在单句文本分类任务中的训练方法。这里以英文情感分类(二分类)数据集 SST-2 为例介绍。这里主要应用了由 Hug-gingFace开发的简单易用的transformers包和datasets库进行建模,可以极大地简化数据处理和模型建模过程。以下给出了单句文本分类任务的精调代码。

  1. 待补充

7.4.3 句对文本分类

1.建模方法
句对文本分类(Sentence Pair Classification,SPC)任务与单句文本分类任务类似,需要将一对文本分成不同类别。例如,在英文文本蕴含数据集RTE中,需要将两个句子输入文本分类模型,并将其分成“蕴含”“冲突”分类标签中的一个。
image.png
(1)输入层
对于一对给定的经过WordPiece分词后的句子第七章 预训练语言模型 - 图148第七章 预训练语言模型 - 图149,将其拼接得到BERT的原始输入X和输入层表示v。
第七章 预训练语言模型 - 图150
第七章 预训练语言模型 - 图151%3C%2Ftitle%3E%0A%3Cdefs%20aria-hidden%3D%22true%22%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-76%22%20d%3D%22M173%20380Q173%20405%20154%20405Q130%20405%20104%20376T61%20287Q60%20286%2059%20284T58%20281T56%20279T53%20278T49%20278T41%20278H27Q21%20284%2021%20287Q21%20294%2029%20316T53%20368T97%20419T160%20441Q202%20441%20225%20417T249%20361Q249%20344%20246%20335Q246%20329%20231%20291T200%20202T182%20113Q182%2086%20187%2069Q200%2026%20250%2026Q287%2026%20319%2060T369%20139T398%20222T409%20277Q409%20300%20401%20317T383%20343T365%20361T357%20383Q357%20405%20376%20424T417%20443Q436%20443%20451%20425T467%20367Q467%20340%20455%20284T418%20159T347%2040T241%20-11Q177%20-11%20139%2022Q102%2054%20102%20117Q102%20148%20110%20181T151%20298Q173%20362%20173%20380Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMAIN-3D%22%20d%3D%22M56%20347Q56%20360%2070%20367H707Q722%20359%20722%20347Q722%20336%20708%20328L390%20327H72Q56%20332%2056%20347ZM56%20153Q56%20168%2072%20173H708Q722%20163%20722%20153Q722%20140%20707%20133H70Q56%20140%2056%20153Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-49%22%20d%3D%22M43%201Q26%201%2026%2010Q26%2012%2029%2024Q34%2043%2039%2045Q42%2046%2054%2046H60Q120%2046%20136%2053Q137%2053%20138%2054Q143%2056%20149%2077T198%20273Q210%20318%20216%20344Q286%20624%20286%20626Q284%20630%20284%20631Q274%20637%20213%20637H193Q184%20643%20189%20662Q193%20677%20195%20680T209%20683H213Q285%20681%20359%20681Q481%20681%20487%20683H497Q504%20676%20504%20672T501%20655T494%20639Q491%20637%20471%20637Q440%20637%20407%20634Q393%20631%20388%20623Q381%20609%20337%20432Q326%20385%20315%20341Q245%2065%20245%2059Q245%2052%20255%2050T307%2046H339Q345%2038%20345%2037T342%2019Q338%206%20332%200H316Q279%202%20179%202Q143%202%20113%202T65%202T43%201Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-6E%22%20d%3D%22M21%20287Q22%20293%2024%20303T36%20341T56%20388T89%20425T135%20442Q171%20442%20195%20424T225%20390T231%20369Q231%20367%20232%20367L243%20378Q304%20442%20382%20442Q436%20442%20469%20415T503%20336T465%20179T427%2052Q427%2026%20444%2026Q450%2026%20453%2027Q482%2032%20505%2065T540%20145Q542%20153%20560%20153Q580%20153%20580%20145Q580%20144%20576%20130Q568%20101%20554%2073T508%2017T439%20-10Q392%20-10%20371%2017T350%2073Q350%2092%20386%20193T423%20345Q423%20404%20379%20404H374Q288%20404%20229%20303L222%20291L189%20157Q156%2026%20151%2016Q138%20-11%20108%20-11Q95%20-11%2087%20-5T76%207T74%2017Q74%2030%20112%20180T152%20343Q153%20348%20153%20366Q153%20405%20129%20405Q91%20405%2066%20305Q60%20285%2060%20284Q58%20278%2041%20278H27Q21%20284%2021%20287Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-70%22%20d%3D%22M23%20287Q24%20290%2025%20295T30%20317T40%20348T55%20381T75%20411T101%20433T134%20442Q209%20442%20230%20378L240%20387Q302%20442%20358%20442Q423%20442%20460%20395T497%20281Q497%20173%20421%2082T249%20-10Q227%20-10%20210%20-4Q199%201%20187%2011T168%2028L161%2036Q160%2035%20139%20-51T118%20-138Q118%20-144%20126%20-145T163%20-148H188Q194%20-155%20194%20-157T191%20-175Q188%20-187%20185%20-190T172%20-194Q170%20-194%20161%20-194T127%20-193T65%20-192Q-5%20-192%20-24%20-194H-32Q-39%20-187%20-39%20-183Q-37%20-156%20-26%20-148H-6Q28%20-147%2033%20-136Q36%20-130%2094%20103T155%20350Q156%20355%20156%20364Q156%20405%20131%20405Q109%20405%2094%20377T71%20316T59%20280Q57%20278%2043%20278H29Q23%20284%2023%20287ZM178%20102Q200%2026%20252%2026Q282%2026%20310%2049T356%20107Q374%20141%20392%20215T411%20325V331Q411%20405%20350%20405Q339%20405%20328%20402T306%20393T286%20380T269%20365T254%20350T243%20336T235%20326L232%20322Q232%20321%20229%20308T218%20264T204%20212Q178%20106%20178%20102Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-75%22%20d%3D%22M21%20287Q21%20295%2030%20318T55%20370T99%20420T158%20442Q204%20442%20227%20417T250%20358Q250%20340%20216%20246T182%20105Q182%2062%20196%2045T238%2027T291%2044T328%2078L339%2095Q341%2099%20377%20247Q407%20367%20413%20387T427%20416Q444%20431%20463%20431Q480%20431%20488%20421T496%20402L420%2084Q419%2079%20419%2068Q419%2043%20426%2035T447%2026Q469%2029%20482%2057T512%20145Q514%20153%20532%20153Q551%20153%20551%20144Q550%20139%20549%20130T540%2098T523%2055T498%2017T462%20-8Q454%20-10%20438%20-10Q372%20-10%20347%2046Q345%2045%20336%2036T318%2021T296%206T267%20-6T233%20-11Q189%20-11%20155%207Q103%2038%20103%20113Q103%20170%20138%20262T173%20379Q173%20380%20173%20381Q173%20390%20173%20393T169%20400T158%20404H154Q131%20404%20112%20385T82%20344T65%20302T57%20280Q55%20278%2041%20278H27Q21%20284%2021%20287Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-74%22%20d%3D%22M26%20385Q19%20392%2019%20395Q19%20399%2022%20411T27%20425Q29%20430%2036%20430T87%20431H140L159%20511Q162%20522%20166%20540T173%20566T179%20586T187%20603T197%20615T211%20624T229%20626Q247%20625%20254%20615T261%20596Q261%20589%20252%20549T232%20470L222%20433Q222%20431%20272%20431H323Q330%20424%20330%20420Q330%20398%20317%20385H210L174%20240Q135%2080%20135%2068Q135%2026%20162%2026Q197%2026%20230%2060T283%20144Q285%20150%20288%20151T303%20153H307Q322%20153%20322%20145Q322%20142%20319%20133Q314%20117%20301%2095T267%2048T216%206T155%20-11Q125%20-11%2098%204T59%2056Q57%2064%2057%2083V101L92%20241Q127%20382%20128%20383Q128%20385%2077%20385H26Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-52%22%20d%3D%22M230%20637Q203%20637%20198%20638T193%20649Q193%20676%20204%20682Q206%20683%20378%20683Q550%20682%20564%20680Q620%20672%20658%20652T712%20606T733%20563T739%20529Q739%20484%20710%20445T643%20385T576%20351T538%20338L545%20333Q612%20295%20612%20223Q612%20212%20607%20162T602%2080V71Q602%2053%20603%2043T614%2025T640%2016Q668%2016%20686%2038T712%2085Q717%2099%20720%20102T735%20105Q755%20105%20755%2093Q755%2075%20731%2036Q693%20-21%20641%20-21H632Q571%20-21%20531%204T487%2082Q487%20109%20502%20166T517%20239Q517%20290%20474%20313Q459%20320%20449%20321T378%20323H309L277%20193Q244%2061%20244%2059Q244%2055%20245%2054T252%2050T269%2048T302%2046H333Q339%2038%20339%2037T336%2019Q332%206%20326%200H311Q275%202%20180%202Q146%202%20117%202T71%202T50%201Q33%201%2033%2010Q33%2012%2036%2024Q41%2043%2046%2045Q50%2046%2061%2046H67Q94%2046%20127%2049Q141%2052%20146%2061Q149%2065%20218%20339T287%20628Q287%20635%20230%20637ZM630%20554Q630%20586%20609%20608T523%20636Q521%20636%20500%20636T462%20637H440Q393%20637%20386%20627Q385%20624%20352%20494T319%20361Q319%20360%20388%20360Q466%20361%20492%20367Q556%20377%20592%20426Q608%20449%20619%20486T630%20554Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-65%22%20d%3D%22M39%20168Q39%20225%2058%20272T107%20350T174%20402T244%20433T307%20442H310Q355%20442%20388%20420T421%20355Q421%20265%20310%20237Q261%20224%20176%20223Q139%20223%20138%20221Q138%20219%20132%20186T125%20128Q125%2081%20146%2054T209%2026T302%2045T394%20111Q403%20121%20406%20121Q410%20121%20419%20112T429%2098T420%2082T390%2055T344%2024T281%20-1T205%20-11Q126%20-11%2083%2042T39%20168ZM373%20353Q367%20405%20305%20405Q272%20405%20244%20391T199%20357T170%20316T154%20280T149%20261Q149%20260%20169%20260Q282%20260%20327%20284T373%20353Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-72%22%20d%3D%22M21%20287Q22%20290%2023%20295T28%20317T38%20348T53%20381T73%20411T99%20433T132%20442Q161%20442%20183%20430T214%20408T225%20388Q227%20382%20228%20382T236%20389Q284%20441%20347%20441H350Q398%20441%20422%20400Q430%20381%20430%20363Q430%20333%20417%20315T391%20292T366%20288Q346%20288%20334%20299T322%20328Q322%20376%20378%20392Q356%20405%20342%20405Q286%20405%20239%20331Q229%20315%20224%20298T190%20165Q156%2025%20151%2016Q138%20-11%20108%20-11Q95%20-11%2087%20-5T76%207T74%2017Q74%2030%20114%20189T154%20366Q154%20405%20128%20405Q107%20405%2092%20377T68%20316T57%20280Q55%20278%2041%20278H27Q21%20284%2021%20287Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-73%22%20d%3D%22M131%20289Q131%20321%20147%20354T203%20415T300%20442Q362%20442%20390%20415T419%20355Q419%20323%20402%20308T364%20292Q351%20292%20340%20300T328%20326Q328%20342%20337%20354T354%20372T367%20378Q368%20378%20368%20379Q368%20382%20361%20388T336%20399T297%20405Q249%20405%20227%20379T204%20326Q204%20301%20223%20291T278%20274T330%20259Q396%20230%20396%20163Q396%20135%20385%20107T352%2051T289%207T195%20-10Q118%20-10%2086%2019T53%2087Q53%20126%2074%20143T118%20160Q133%20160%20146%20151T160%20120Q160%2094%20142%2076T111%2058Q109%2057%20108%2057T107%2055Q108%2052%20115%2047T146%2034T201%2027Q237%2027%20263%2038T301%2066T318%2097T323%20122Q323%20150%20302%20164T254%20181T195%20196T148%20231Q131%20256%20131%20289Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-61%22%20d%3D%22M33%20157Q33%20258%20109%20349T280%20441Q331%20441%20370%20392Q386%20422%20416%20422Q429%20422%20439%20414T449%20394Q449%20381%20412%20234T374%2068Q374%2043%20381%2035T402%2026Q411%2027%20422%2035Q443%2055%20463%20131Q469%20151%20473%20152Q475%20153%20483%20153H487Q506%20153%20506%20144Q506%20138%20501%20117T481%2063T449%2013Q436%200%20417%20-8Q409%20-10%20393%20-10Q359%20-10%20336%205T306%2036L300%2051Q299%2052%20296%2050Q294%2048%20292%2046Q233%20-10%20172%20-10Q117%20-10%2075%2030T33%20157ZM351%20328Q351%20334%20346%20350T323%20385T277%20405Q242%20405%20210%20374T160%20293Q131%20214%20119%20129Q119%20126%20119%20118T118%20106Q118%2061%20136%2044T179%2026Q217%2026%20254%2059T298%20110Q300%20114%20325%20217T351%20328Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-69%22%20d%3D%22M184%20600Q184%20624%20203%20642T247%20661Q265%20661%20277%20649T290%20619Q290%20596%20270%20577T226%20557Q211%20557%20198%20567T184%20600ZM21%20287Q21%20295%2030%20318T54%20369T98%20420T158%20442Q197%20442%20223%20419T250%20357Q250%20340%20236%20301T196%20196T154%2083Q149%2061%20149%2051Q149%2026%20166%2026Q175%2026%20185%2029T208%2043T235%2078T260%20137Q263%20149%20265%20151T282%20153Q302%20153%20302%20143Q302%20135%20293%20112T268%2061T223%2011T161%20-11Q129%20-11%20102%2010T74%2074Q74%2091%2079%20106T122%20220Q160%20321%20166%20341T173%20380Q173%20404%20156%20404H154Q124%20404%2099%20371T61%20287Q60%20286%2059%20284T58%20281T56%20279T53%20278T49%20278T41%20278H27Q21%20284%2021%20287Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-6F%22%20d%3D%22M201%20-11Q126%20-11%2080%2038T34%20156Q34%20221%2064%20279T146%20380Q222%20441%20301%20441Q333%20441%20341%20440Q354%20437%20367%20433T402%20417T438%20387T464%20338T476%20268Q476%20161%20390%2075T201%20-11ZM121%20120Q121%2070%20147%2048T206%2026Q250%2026%20289%2058T351%20142Q360%20163%20374%20216T388%20308Q388%20352%20370%20375Q346%20405%20306%20405Q243%20405%20195%20347Q158%20303%20140%20230T121%20120Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMAIN-28%22%20d%3D%22M94%20250Q94%20319%20104%20381T127%20488T164%20576T202%20643T244%20695T277%20729T302%20750H315H319Q333%20750%20333%20741Q333%20738%20316%20720T275%20667T226%20581T184%20443T167%20250T184%2058T225%20-81T274%20-167T316%20-220T333%20-241Q333%20-250%20318%20-250H315H302L274%20-226Q180%20-141%20137%20-14T94%20250Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-58%22%20d%3D%22M42%200H40Q26%200%2026%2011Q26%2015%2029%2027Q33%2041%2036%2043T55%2046Q141%2049%20190%2098Q200%20108%20306%20224T411%20342Q302%20620%20297%20625Q288%20636%20234%20637H206Q200%20643%20200%20645T202%20664Q206%20677%20212%20683H226Q260%20681%20347%20681Q380%20681%20408%20681T453%20682T473%20682Q490%20682%20490%20671Q490%20670%20488%20658Q484%20643%20481%20640T465%20637Q434%20634%20411%20620L488%20426L541%20485Q646%20598%20646%20610Q646%20628%20622%20635Q617%20635%20609%20637Q594%20637%20594%20648Q594%20650%20596%20664Q600%20677%20606%20683H618Q619%20683%20643%20683T697%20681T738%20680Q828%20680%20837%20683H845Q852%20676%20852%20672Q850%20647%20840%20637H824Q790%20636%20763%20628T722%20611T698%20593L687%20584Q687%20585%20592%20480L505%20384Q505%20383%20536%20304T601%20142T638%2056Q648%2047%20699%2046Q734%2046%20734%2037Q734%2035%20732%2023Q728%207%20725%204T711%201Q708%201%20678%201T589%202Q528%202%20496%202T461%201Q444%201%20444%2010Q444%2011%20446%2025Q448%2035%20450%2039T455%2044T464%2046T480%2047T506%2054Q523%2062%20523%2064Q522%2064%20476%20181L429%20299Q241%2095%20236%2084Q232%2076%20232%2072Q232%2053%20261%2047Q262%2047%20267%2047T273%2046Q276%2046%20277%2046T280%2045T283%2042T284%2035Q284%2026%20282%2019Q279%206%20276%204T261%201Q258%201%20243%201T201%202T142%202Q64%202%2042%200Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMAIN-29%22%20d%3D%22M60%20749L64%20750Q69%20750%2074%20750H86L114%20726Q208%20641%20251%20514T294%20250Q294%20182%20284%20119T261%2012T224%20-76T186%20-143T145%20-194T113%20-227T90%20-246Q87%20-249%2086%20-250H74Q66%20-250%2063%20-250T58%20-247T55%20-238Q56%20-237%2066%20-225Q221%20-64%20221%20250T66%20725Q56%20737%2055%20738Q55%20746%2060%20749Z%22%3E%3C%2Fpath%3E%0A%3C%2Fdefs%3E%0A%3Cg%20stroke%3D%22currentColor%22%20fill%3D%22currentColor%22%20stroke-width%3D%220%22%20transform%3D%22matrix(1%200%200%20-1%200%200)%22%20aria-hidden%3D%22true%22%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-76%22%20x%3D%220%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMAIN-3D%22%20x%3D%22763%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-49%22%20x%3D%221819%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-6E%22%20x%3D%222324%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-70%22%20x%3D%222924%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-75%22%20x%3D%223428%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-74%22%20x%3D%224000%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-52%22%20x%3D%224362%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-65%22%20x%3D%225121%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-70%22%20x%3D%225588%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-72%22%20x%3D%226091%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-65%22%20x%3D%226543%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-73%22%20x%3D%227009%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-65%22%20x%3D%227479%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-6E%22%20x%3D%227945%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-74%22%20x%3D%228546%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-61%22%20x%3D%228907%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-74%22%20x%3D%229437%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-69%22%20x%3D%229798%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-6F%22%20x%3D%2210144%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-6E%22%20x%3D%2210629%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMAIN-28%22%20x%3D%2211230%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-58%22%20x%3D%2211619%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMAIN-29%22%20x%3D%2212472%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%3C%2Fg%3E%0A%3C%2Fsvg%3E#card=math&code=v%3DInputRepresentation%28X%29&id=Cebz0)
其中,n和m分别表示第一个句子和第二个句子的长度;[CLS]表示文本序列开始的特殊记号;[SEP]表示文本序列之间的分隔标记。

句对文本分类的BERT编码层、分类输出层和训练方法与单句文本分类一致,因此不再赘述。

2.代码实现
接下来将结合实际代码,介绍BERT在句对文本分类任务中的训练方法。这里以英文文本蕴含数据集 RTE 为例介绍。以下给出了句对文本分类任务的精调代码。

  1. 待补充

7.4.4 阅读理解

1.建模方法
本节以抽取式阅读理解(Span-extraction ReadingComprehension)为例,介绍BERT在阅读理解任务上的应用方法。抽取式阅读理解主要由篇章(Passage)、问题(Question)和答案(Answer)构成,要求机器在阅读篇章和问题后给出相应的答案,而答案要求是从篇章中抽取出的一个文本片段(Span)
该任务可以简化为预测篇章中的一个起始位置和终止位置,而答案就是介于两者之间的文本片段。
常用的英文阅读理解数据集SQuAD和中文阅读理解数据集CMRC 2018都属于抽取式阅读理解数据集。下图给出了一个抽取式阅读理解的示例。
image.png
应用BERT处理抽取式阅读理解任务的模型与句对文本分类任务类似
image.png

(1)输入层
在输入层中,对问题第七章 预训练语言模型 - 图154和篇章第七章 预训练语言模型 - 图155,(P和Q均经过WordPiece分词后得到)拼接得到BERT的原始输入X和输入层表示v。
第七章 预训练语言模型 - 图156
第七章 预训练语言模型 - 图157%3C%2Ftitle%3E%0A%3Cdefs%20aria-hidden%3D%22true%22%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-76%22%20d%3D%22M173%20380Q173%20405%20154%20405Q130%20405%20104%20376T61%20287Q60%20286%2059%20284T58%20281T56%20279T53%20278T49%20278T41%20278H27Q21%20284%2021%20287Q21%20294%2029%20316T53%20368T97%20419T160%20441Q202%20441%20225%20417T249%20361Q249%20344%20246%20335Q246%20329%20231%20291T200%20202T182%20113Q182%2086%20187%2069Q200%2026%20250%2026Q287%2026%20319%2060T369%20139T398%20222T409%20277Q409%20300%20401%20317T383%20343T365%20361T357%20383Q357%20405%20376%20424T417%20443Q436%20443%20451%20425T467%20367Q467%20340%20455%20284T418%20159T347%2040T241%20-11Q177%20-11%20139%2022Q102%2054%20102%20117Q102%20148%20110%20181T151%20298Q173%20362%20173%20380Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMAIN-3D%22%20d%3D%22M56%20347Q56%20360%2070%20367H707Q722%20359%20722%20347Q722%20336%20708%20328L390%20327H72Q56%20332%2056%20347ZM56%20153Q56%20168%2072%20173H708Q722%20163%20722%20153Q722%20140%20707%20133H70Q56%20140%2056%20153Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-49%22%20d%3D%22M43%201Q26%201%2026%2010Q26%2012%2029%2024Q34%2043%2039%2045Q42%2046%2054%2046H60Q120%2046%20136%2053Q137%2053%20138%2054Q143%2056%20149%2077T198%20273Q210%20318%20216%20344Q286%20624%20286%20626Q284%20630%20284%20631Q274%20637%20213%20637H193Q184%20643%20189%20662Q193%20677%20195%20680T209%20683H213Q285%20681%20359%20681Q481%20681%20487%20683H497Q504%20676%20504%20672T501%20655T494%20639Q491%20637%20471%20637Q440%20637%20407%20634Q393%20631%20388%20623Q381%20609%20337%20432Q326%20385%20315%20341Q245%2065%20245%2059Q245%2052%20255%2050T307%2046H339Q345%2038%20345%2037T342%2019Q338%206%20332%200H316Q279%202%20179%202Q143%202%20113%202T65%202T43%201Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-6E%22%20d%3D%22M21%20287Q22%20293%2024%20303T36%20341T56%20388T89%20425T135%20442Q171%20442%20195%20424T225%20390T231%20369Q231%20367%20232%20367L243%20378Q304%20442%20382%20442Q436%20442%20469%20415T503%20336T465%20179T427%2052Q427%2026%20444%2026Q450%2026%20453%2027Q482%2032%20505%2065T540%20145Q542%20153%20560%20153Q580%20153%20580%20145Q580%20144%20576%20130Q568%20101%20554%2073T508%2017T439%20-10Q392%20-10%20371%2017T350%2073Q350%2092%20386%20193T423%20345Q423%20404%20379%20404H374Q288%20404%20229%20303L222%20291L189%20157Q156%2026%20151%2016Q138%20-11%20108%20-11Q95%20-11%2087%20-5T76%207T74%2017Q74%2030%20112%20180T152%20343Q153%20348%20153%20366Q153%20405%20129%20405Q91%20405%2066%20305Q60%20285%2060%20284Q58%20278%2041%20278H27Q21%20284%2021%20287Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-70%22%20d%3D%22M23%20287Q24%20290%2025%20295T30%20317T40%20348T55%20381T75%20411T101%20433T134%20442Q209%20442%20230%20378L240%20387Q302%20442%20358%20442Q423%20442%20460%20395T497%20281Q497%20173%20421%2082T249%20-10Q227%20-10%20210%20-4Q199%201%20187%2011T168%2028L161%2036Q160%2035%20139%20-51T118%20-138Q118%20-144%20126%20-145T163%20-148H188Q194%20-155%20194%20-157T191%20-175Q188%20-187%20185%20-190T172%20-194Q170%20-194%20161%20-194T127%20-193T65%20-192Q-5%20-192%20-24%20-194H-32Q-39%20-187%20-39%20-183Q-37%20-156%20-26%20-148H-6Q28%20-147%2033%20-136Q36%20-130%2094%20103T155%20350Q156%20355%20156%20364Q156%20405%20131%20405Q109%20405%2094%20377T71%20316T59%20280Q57%20278%2043%20278H29Q23%20284%2023%20287ZM178%20102Q200%2026%20252%2026Q282%2026%20310%2049T356%20107Q374%20141%20392%20215T411%20325V331Q411%20405%20350%20405Q339%20405%20328%20402T306%20393T286%20380T269%20365T254%20350T243%20336T235%20326L232%20322Q232%20321%20229%20308T218%20264T204%20212Q178%20106%20178%20102Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-75%22%20d%3D%22M21%20287Q21%20295%2030%20318T55%20370T99%20420T158%20442Q204%20442%20227%20417T250%20358Q250%20340%20216%20246T182%20105Q182%2062%20196%2045T238%2027T291%2044T328%2078L339%2095Q341%2099%20377%20247Q407%20367%20413%20387T427%20416Q444%20431%20463%20431Q480%20431%20488%20421T496%20402L420%2084Q419%2079%20419%2068Q419%2043%20426%2035T447%2026Q469%2029%20482%2057T512%20145Q514%20153%20532%20153Q551%20153%20551%20144Q550%20139%20549%20130T540%2098T523%2055T498%2017T462%20-8Q454%20-10%20438%20-10Q372%20-10%20347%2046Q345%2045%20336%2036T318%2021T296%206T267%20-6T233%20-11Q189%20-11%20155%207Q103%2038%20103%20113Q103%20170%20138%20262T173%20379Q173%20380%20173%20381Q173%20390%20173%20393T169%20400T158%20404H154Q131%20404%20112%20385T82%20344T65%20302T57%20280Q55%20278%2041%20278H27Q21%20284%2021%20287Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-74%22%20d%3D%22M26%20385Q19%20392%2019%20395Q19%20399%2022%20411T27%20425Q29%20430%2036%20430T87%20431H140L159%20511Q162%20522%20166%20540T173%20566T179%20586T187%20603T197%20615T211%20624T229%20626Q247%20625%20254%20615T261%20596Q261%20589%20252%20549T232%20470L222%20433Q222%20431%20272%20431H323Q330%20424%20330%20420Q330%20398%20317%20385H210L174%20240Q135%2080%20135%2068Q135%2026%20162%2026Q197%2026%20230%2060T283%20144Q285%20150%20288%20151T303%20153H307Q322%20153%20322%20145Q322%20142%20319%20133Q314%20117%20301%2095T267%2048T216%206T155%20-11Q125%20-11%2098%204T59%2056Q57%2064%2057%2083V101L92%20241Q127%20382%20128%20383Q128%20385%2077%20385H26Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-52%22%20d%3D%22M230%20637Q203%20637%20198%20638T193%20649Q193%20676%20204%20682Q206%20683%20378%20683Q550%20682%20564%20680Q620%20672%20658%20652T712%20606T733%20563T739%20529Q739%20484%20710%20445T643%20385T576%20351T538%20338L545%20333Q612%20295%20612%20223Q612%20212%20607%20162T602%2080V71Q602%2053%20603%2043T614%2025T640%2016Q668%2016%20686%2038T712%2085Q717%2099%20720%20102T735%20105Q755%20105%20755%2093Q755%2075%20731%2036Q693%20-21%20641%20-21H632Q571%20-21%20531%204T487%2082Q487%20109%20502%20166T517%20239Q517%20290%20474%20313Q459%20320%20449%20321T378%20323H309L277%20193Q244%2061%20244%2059Q244%2055%20245%2054T252%2050T269%2048T302%2046H333Q339%2038%20339%2037T336%2019Q332%206%20326%200H311Q275%202%20180%202Q146%202%20117%202T71%202T50%201Q33%201%2033%2010Q33%2012%2036%2024Q41%2043%2046%2045Q50%2046%2061%2046H67Q94%2046%20127%2049Q141%2052%20146%2061Q149%2065%20218%20339T287%20628Q287%20635%20230%20637ZM630%20554Q630%20586%20609%20608T523%20636Q521%20636%20500%20636T462%20637H440Q393%20637%20386%20627Q385%20624%20352%20494T319%20361Q319%20360%20388%20360Q466%20361%20492%20367Q556%20377%20592%20426Q608%20449%20619%20486T630%20554Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-65%22%20d%3D%22M39%20168Q39%20225%2058%20272T107%20350T174%20402T244%20433T307%20442H310Q355%20442%20388%20420T421%20355Q421%20265%20310%20237Q261%20224%20176%20223Q139%20223%20138%20221Q138%20219%20132%20186T125%20128Q125%2081%20146%2054T209%2026T302%2045T394%20111Q403%20121%20406%20121Q410%20121%20419%20112T429%2098T420%2082T390%2055T344%2024T281%20-1T205%20-11Q126%20-11%2083%2042T39%20168ZM373%20353Q367%20405%20305%20405Q272%20405%20244%20391T199%20357T170%20316T154%20280T149%20261Q149%20260%20169%20260Q282%20260%20327%20284T373%20353Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-72%22%20d%3D%22M21%20287Q22%20290%2023%20295T28%20317T38%20348T53%20381T73%20411T99%20433T132%20442Q161%20442%20183%20430T214%20408T225%20388Q227%20382%20228%20382T236%20389Q284%20441%20347%20441H350Q398%20441%20422%20400Q430%20381%20430%20363Q430%20333%20417%20315T391%20292T366%20288Q346%20288%20334%20299T322%20328Q322%20376%20378%20392Q356%20405%20342%20405Q286%20405%20239%20331Q229%20315%20224%20298T190%20165Q156%2025%20151%2016Q138%20-11%20108%20-11Q95%20-11%2087%20-5T76%207T74%2017Q74%2030%20114%20189T154%20366Q154%20405%20128%20405Q107%20405%2092%20377T68%20316T57%20280Q55%20278%2041%20278H27Q21%20284%2021%20287Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-73%22%20d%3D%22M131%20289Q131%20321%20147%20354T203%20415T300%20442Q362%20442%20390%20415T419%20355Q419%20323%20402%20308T364%20292Q351%20292%20340%20300T328%20326Q328%20342%20337%20354T354%20372T367%20378Q368%20378%20368%20379Q368%20382%20361%20388T336%20399T297%20405Q249%20405%20227%20379T204%20326Q204%20301%20223%20291T278%20274T330%20259Q396%20230%20396%20163Q396%20135%20385%20107T352%2051T289%207T195%20-10Q118%20-10%2086%2019T53%2087Q53%20126%2074%20143T118%20160Q133%20160%20146%20151T160%20120Q160%2094%20142%2076T111%2058Q109%2057%20108%2057T107%2055Q108%2052%20115%2047T146%2034T201%2027Q237%2027%20263%2038T301%2066T318%2097T323%20122Q323%20150%20302%20164T254%20181T195%20196T148%20231Q131%20256%20131%20289Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-61%22%20d%3D%22M33%20157Q33%20258%20109%20349T280%20441Q331%20441%20370%20392Q386%20422%20416%20422Q429%20422%20439%20414T449%20394Q449%20381%20412%20234T374%2068Q374%2043%20381%2035T402%2026Q411%2027%20422%2035Q443%2055%20463%20131Q469%20151%20473%20152Q475%20153%20483%20153H487Q506%20153%20506%20144Q506%20138%20501%20117T481%2063T449%2013Q436%200%20417%20-8Q409%20-10%20393%20-10Q359%20-10%20336%205T306%2036L300%2051Q299%2052%20296%2050Q294%2048%20292%2046Q233%20-10%20172%20-10Q117%20-10%2075%2030T33%20157ZM351%20328Q351%20334%20346%20350T323%20385T277%20405Q242%20405%20210%20374T160%20293Q131%20214%20119%20129Q119%20126%20119%20118T118%20106Q118%2061%20136%2044T179%2026Q217%2026%20254%2059T298%20110Q300%20114%20325%20217T351%20328Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-69%22%20d%3D%22M184%20600Q184%20624%20203%20642T247%20661Q265%20661%20277%20649T290%20619Q290%20596%20270%20577T226%20557Q211%20557%20198%20567T184%20600ZM21%20287Q21%20295%2030%20318T54%20369T98%20420T158%20442Q197%20442%20223%20419T250%20357Q250%20340%20236%20301T196%20196T154%2083Q149%2061%20149%2051Q149%2026%20166%2026Q175%2026%20185%2029T208%2043T235%2078T260%20137Q263%20149%20265%20151T282%20153Q302%20153%20302%20143Q302%20135%20293%20112T268%2061T223%2011T161%20-11Q129%20-11%20102%2010T74%2074Q74%2091%2079%20106T122%20220Q160%20321%20166%20341T173%20380Q173%20404%20156%20404H154Q124%20404%2099%20371T61%20287Q60%20286%2059%20284T58%20281T56%20279T53%20278T49%20278T41%20278H27Q21%20284%2021%20287Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-6F%22%20d%3D%22M201%20-11Q126%20-11%2080%2038T34%20156Q34%20221%2064%20279T146%20380Q222%20441%20301%20441Q333%20441%20341%20440Q354%20437%20367%20433T402%20417T438%20387T464%20338T476%20268Q476%20161%20390%2075T201%20-11ZM121%20120Q121%2070%20147%2048T206%2026Q250%2026%20289%2058T351%20142Q360%20163%20374%20216T388%20308Q388%20352%20370%20375Q346%20405%20306%20405Q243%20405%20195%20347Q158%20303%20140%20230T121%20120Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMAIN-28%22%20d%3D%22M94%20250Q94%20319%20104%20381T127%20488T164%20576T202%20643T244%20695T277%20729T302%20750H315H319Q333%20750%20333%20741Q333%20738%20316%20720T275%20667T226%20581T184%20443T167%20250T184%2058T225%20-81T274%20-167T316%20-220T333%20-241Q333%20-250%20318%20-250H315H302L274%20-226Q180%20-141%20137%20-14T94%20250Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-58%22%20d%3D%22M42%200H40Q26%200%2026%2011Q26%2015%2029%2027Q33%2041%2036%2043T55%2046Q141%2049%20190%2098Q200%20108%20306%20224T411%20342Q302%20620%20297%20625Q288%20636%20234%20637H206Q200%20643%20200%20645T202%20664Q206%20677%20212%20683H226Q260%20681%20347%20681Q380%20681%20408%20681T453%20682T473%20682Q490%20682%20490%20671Q490%20670%20488%20658Q484%20643%20481%20640T465%20637Q434%20634%20411%20620L488%20426L541%20485Q646%20598%20646%20610Q646%20628%20622%20635Q617%20635%20609%20637Q594%20637%20594%20648Q594%20650%20596%20664Q600%20677%20606%20683H618Q619%20683%20643%20683T697%20681T738%20680Q828%20680%20837%20683H845Q852%20676%20852%20672Q850%20647%20840%20637H824Q790%20636%20763%20628T722%20611T698%20593L687%20584Q687%20585%20592%20480L505%20384Q505%20383%20536%20304T601%20142T638%2056Q648%2047%20699%2046Q734%2046%20734%2037Q734%2035%20732%2023Q728%207%20725%204T711%201Q708%201%20678%201T589%202Q528%202%20496%202T461%201Q444%201%20444%2010Q444%2011%20446%2025Q448%2035%20450%2039T455%2044T464%2046T480%2047T506%2054Q523%2062%20523%2064Q522%2064%20476%20181L429%20299Q241%2095%20236%2084Q232%2076%20232%2072Q232%2053%20261%2047Q262%2047%20267%2047T273%2046Q276%2046%20277%2046T280%2045T283%2042T284%2035Q284%2026%20282%2019Q279%206%20276%204T261%201Q258%201%20243%201T201%202T142%202Q64%202%2042%200Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMAIN-29%22%20d%3D%22M60%20749L64%20750Q69%20750%2074%20750H86L114%20726Q208%20641%20251%20514T294%20250Q294%20182%20284%20119T261%2012T224%20-76T186%20-143T145%20-194T113%20-227T90%20-246Q87%20-249%2086%20-250H74Q66%20-250%2063%20-250T58%20-247T55%20-238Q56%20-237%2066%20-225Q221%20-64%20221%20250T66%20725Q56%20737%2055%20738Q55%20746%2060%20749Z%22%3E%3C%2Fpath%3E%0A%3C%2Fdefs%3E%0A%3Cg%20stroke%3D%22currentColor%22%20fill%3D%22currentColor%22%20stroke-width%3D%220%22%20transform%3D%22matrix(1%200%200%20-1%200%200)%22%20aria-hidden%3D%22true%22%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-76%22%20x%3D%220%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMAIN-3D%22%20x%3D%22763%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-49%22%20x%3D%221819%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-6E%22%20x%3D%222324%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-70%22%20x%3D%222924%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-75%22%20x%3D%223428%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-74%22%20x%3D%224000%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-52%22%20x%3D%224362%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-65%22%20x%3D%225121%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-70%22%20x%3D%225588%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-72%22%20x%3D%226091%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-65%22%20x%3D%226543%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-73%22%20x%3D%227009%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-65%22%20x%3D%227479%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-6E%22%20x%3D%227945%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-74%22%20x%3D%228546%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-61%22%20x%3D%228907%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-74%22%20x%3D%229437%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-69%22%20x%3D%229798%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-6F%22%20x%3D%2210144%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-6E%22%20x%3D%2210629%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMAIN-28%22%20x%3D%2211230%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-58%22%20x%3D%2211619%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMAIN-29%22%20x%3D%2212472%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%3C%2Fg%3E%0A%3C%2Fsvg%3E#card=math&code=v%3DInputRepresentation%28X%29&id=JI9BJ)
其中,n和m分别表示问题序列长度和篇章序列长度;[CLS]表示文本序列开始的特殊记号;[SEP]表示文本序列之间的分隔标记。
注意:需要注意的是,通常此处将问题放在篇章的前面。其原因是BERT一次只能处理一个固定长度为N 的文本序列(如N=512)。如果将问题放在输入的后半部分,当篇章和问题的总长度超过N 时,部分问题文本将会被截断,导致无法获得完整的问题信息,进而影响阅读理解系统的整体效果。而将篇章放在后半部分,虽然部分甚至全部篇章文本可能会被截断,但可以通过篇章切片的方式进行多次预测,并综合相应的答题结果得到最终的输出。

(2)BERT编码层
在BERT编码层中,输入表示第七章 预训练语言模型 - 图158经过多层Transformer的编码,借助自注意力机制充分学习句子中每个词之间的语义关联,并最终得到句子的上下文语义表示,并得到上下文语义表示第七章 预训练语言模型 - 图159,其中,d为BERT的隐藏层维度。
第七章 预训练语言模型 - 图160

(3)答案输出层
在得到输入序列的上下文语义表示第七章 预训练语言模型 - 图161后,通过全连接层,将每个分量(对应输入序列的每个位置)压缩为一个标量,并通过Softmax函数预测每个时刻成为答案起始位置的概率第七章 预训练语言模型 - 图162以及终止位置的概率第七章 预训练语言模型 - 图163。具体地,由下式计算起始位置概率第七章 预训练语言模型 - 图164
第七章 预训练语言模型 - 图165%3C%2Ftitle%3E%0A%3Cdefs%20aria-hidden%3D%22true%22%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-50%22%20d%3D%22M287%20628Q287%20635%20230%20637Q206%20637%20199%20638T192%20648Q192%20649%20194%20659Q200%20679%20203%20681T397%20683Q587%20682%20600%20680Q664%20669%20707%20631T751%20530Q751%20453%20685%20389Q616%20321%20507%20303Q500%20302%20402%20301H307L277%20182Q247%2066%20247%2059Q247%2055%20248%2054T255%2050T272%2048T305%2046H336Q342%2037%20342%2035Q342%2019%20335%205Q330%200%20319%200Q316%200%20282%201T182%202Q120%202%2087%202T51%201Q33%201%2033%2011Q33%2013%2036%2025Q40%2041%2044%2043T67%2046Q94%2046%20127%2049Q141%2052%20146%2061Q149%2065%20218%20339T287%20628ZM645%20554Q645%20567%20643%20575T634%20597T609%20619T560%20635Q553%20636%20480%20637Q463%20637%20445%20637T416%20636T404%20636Q391%20635%20386%20627Q384%20621%20367%20550T332%20412T314%20344Q314%20342%20395%20342H407H430Q542%20342%20590%20392Q617%20419%20631%20471T645%20554Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-53%22%20d%3D%22M308%2024Q367%2024%20416%2076T466%20197Q466%20260%20414%20284Q308%20311%20278%20321T236%20341Q176%20383%20176%20462Q176%20523%20208%20573T273%20648Q302%20673%20343%20688T407%20704H418H425Q521%20704%20564%20640Q565%20640%20577%20653T603%20682T623%20704Q624%20704%20627%20704T632%20705Q645%20705%20645%20698T617%20577T585%20459T569%20456Q549%20456%20549%20465Q549%20471%20550%20475Q550%20478%20551%20494T553%20520Q553%20554%20544%20579T526%20616T501%20641Q465%20662%20419%20662Q362%20662%20313%20616T263%20510Q263%20480%20278%20458T319%20427Q323%20425%20389%20408T456%20390Q490%20379%20522%20342T554%20242Q554%20216%20546%20186Q541%20164%20528%20137T492%2078T426%2018T332%20-20Q320%20-22%20298%20-22Q199%20-22%20144%2033L134%2044L106%2013Q83%20-14%2078%20-18T65%20-22Q52%20-22%2052%20-14Q52%20-11%20110%20221Q112%20227%20130%20227H143Q149%20221%20149%20216Q149%20214%20148%20207T144%20186T142%20153Q144%20114%20160%2087T203%2047T255%2029T308%2024Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMAIN-3D%22%20d%3D%22M56%20347Q56%20360%2070%20367H707Q722%20359%20722%20347Q722%20336%20708%20328L390%20327H72Q56%20332%2056%20347ZM56%20153Q56%20168%2072%20173H708Q722%20163%20722%20153Q722%20140%20707%20133H70Q56%20140%2056%20153Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-6F%22%20d%3D%22M201%20-11Q126%20-11%2080%2038T34%20156Q34%20221%2064%20279T146%20380Q222%20441%20301%20441Q333%20441%20341%20440Q354%20437%20367%20433T402%20417T438%20387T464%20338T476%20268Q476%20161%20390%2075T201%20-11ZM121%20120Q121%2070%20147%2048T206%2026Q250%2026%20289%2058T351%20142Q360%20163%20374%20216T388%20308Q388%20352%20370%20375Q346%20405%20306%20405Q243%20405%20195%20347Q158%20303%20140%20230T121%20120Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-66%22%20d%3D%22M118%20-162Q120%20-162%20124%20-164T135%20-167T147%20-168Q160%20-168%20171%20-155T187%20-126Q197%20-99%20221%2027T267%20267T289%20382V385H242Q195%20385%20192%20387Q188%20390%20188%20397L195%20425Q197%20430%20203%20430T250%20431Q298%20431%20298%20432Q298%20434%20307%20482T319%20540Q356%20705%20465%20705Q502%20703%20526%20683T550%20630Q550%20594%20529%20578T487%20561Q443%20561%20443%20603Q443%20622%20454%20636T478%20657L487%20662Q471%20668%20457%20668Q445%20668%20434%20658T419%20630Q412%20601%20403%20552T387%20469T380%20433Q380%20431%20435%20431Q480%20431%20487%20430T498%20424Q499%20420%20496%20407T491%20391Q489%20386%20482%20386T428%20385H372L349%20263Q301%2015%20282%20-47Q255%20-132%20212%20-173Q175%20-205%20139%20-205Q107%20-205%2081%20-186T55%20-132Q55%20-95%2076%20-78T118%20-61Q162%20-61%20162%20-103Q162%20-122%20151%20-136T127%20-157L118%20-162Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-74%22%20d%3D%22M26%20385Q19%20392%2019%20395Q19%20399%2022%20411T27%20425Q29%20430%2036%20430T87%20431H140L159%20511Q162%20522%20166%20540T173%20566T179%20586T187%20603T197%20615T211%20624T229%20626Q247%20625%20254%20615T261%20596Q261%20589%20252%20549T232%20470L222%20433Q222%20431%20272%20431H323Q330%20424%20330%20420Q330%20398%20317%20385H210L174%20240Q135%2080%20135%2068Q135%2026%20162%2026Q197%2026%20230%2060T283%20144Q285%20150%20288%20151T303%20153H307Q322%20153%20322%20145Q322%20142%20319%20133Q314%20117%20301%2095T267%2048T216%206T155%20-11Q125%20-11%2098%204T59%2056Q57%2064%2057%2083V101L92%20241Q127%20382%20128%20383Q128%20385%2077%20385H26Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-6D%22%20d%3D%22M21%20287Q22%20293%2024%20303T36%20341T56%20388T88%20425T132%20442T175%20435T205%20417T221%20395T229%20376L231%20369Q231%20367%20232%20367L243%20378Q303%20442%20384%20442Q401%20442%20415%20440T441%20433T460%20423T475%20411T485%20398T493%20385T497%20373T500%20364T502%20357L510%20367Q573%20442%20659%20442Q713%20442%20746%20415T780%20336Q780%20285%20742%20178T704%2050Q705%2036%20709%2031T724%2026Q752%2026%20776%2056T815%20138Q818%20149%20821%20151T837%20153Q857%20153%20857%20145Q857%20144%20853%20130Q845%20101%20831%2073T785%2017T716%20-10Q669%20-10%20648%2017T627%2073Q627%2092%20663%20193T700%20345Q700%20404%20656%20404H651Q565%20404%20506%20303L499%20291L466%20157Q433%2026%20428%2016Q415%20-11%20385%20-11Q372%20-11%20364%20-4T353%208T350%2018Q350%2029%20384%20161L420%20307Q423%20322%20423%20345Q423%20404%20379%20404H374Q288%20404%20229%20303L222%20291L189%20157Q156%2026%20151%2016Q138%20-11%20108%20-11Q95%20-11%2087%20-5T76%207T74%2017Q74%2030%20112%20181Q151%20335%20151%20342Q154%20357%20154%20369Q154%20405%20129%20405Q107%20405%2092%20377T69%20316T57%20280Q55%20278%2041%20278H27Q21%20284%2021%20287Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-61%22%20d%3D%22M33%20157Q33%20258%20109%20349T280%20441Q331%20441%20370%20392Q386%20422%20416%20422Q429%20422%20439%20414T449%20394Q449%20381%20412%20234T374%2068Q374%2043%20381%2035T402%2026Q411%2027%20422%2035Q443%2055%20463%20131Q469%20151%20473%20152Q475%20153%20483%20153H487Q506%20153%20506%20144Q506%20138%20501%20117T481%2063T449%2013Q436%200%20417%20-8Q409%20-10%20393%20-10Q359%20-10%20336%205T306%2036L300%2051Q299%2052%20296%2050Q294%2048%20292%2046Q233%20-10%20172%20-10Q117%20-10%2075%2030T33%20157ZM351%20328Q351%20334%20346%20350T323%20385T277%20405Q242%20405%20210%20374T160%20293Q131%20214%20119%20129Q119%20126%20119%20118T118%20106Q118%2061%20136%2044T179%2026Q217%2026%20254%2059T298%20110Q300%20114%20325%20217T351%20328Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-78%22%20d%3D%22M52%20289Q59%20331%20106%20386T222%20442Q257%20442%20286%20424T329%20379Q371%20442%20430%20442Q467%20442%20494%20420T522%20361Q522%20332%20508%20314T481%20292T458%20288Q439%20288%20427%20299T415%20328Q415%20374%20465%20391Q454%20404%20425%20404Q412%20404%20406%20402Q368%20386%20350%20336Q290%20115%20290%2078Q290%2050%20306%2038T341%2026Q378%2026%20414%2059T463%20140Q466%20150%20469%20151T485%20153H489Q504%20153%20504%20145Q504%20144%20502%20134Q486%2077%20440%2033T333%20-11Q263%20-11%20227%2052Q186%20-10%20133%20-10H127Q78%20-10%2057%2016T35%2071Q35%20103%2054%20123T99%20143Q142%20143%20142%20101Q142%2081%20130%2066T107%2046T94%2041L91%2040Q91%2039%2097%2036T113%2029T132%2026Q168%2026%20194%2071Q203%2087%20217%20139T245%20247T261%20313Q266%20340%20266%20352Q266%20380%20251%20392T217%20404Q177%20404%20142%20372T93%20290Q91%20281%2088%20280T72%20278H58Q52%20284%2052%20289Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMAIN-28%22%20d%3D%22M94%20250Q94%20319%20104%20381T127%20488T164%20576T202%20643T244%20695T277%20729T302%20750H315H319Q333%20750%20333%20741Q333%20738%20316%20720T275%20667T226%20581T184%20443T167%20250T184%2058T225%20-81T274%20-167T316%20-220T333%20-241Q333%20-250%20318%20-250H315H302L274%20-226Q180%20-141%20137%20-14T94%20250Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-68%22%20d%3D%22M137%20683Q138%20683%20209%20688T282%20694Q294%20694%20294%20685Q294%20674%20258%20534Q220%20386%20220%20383Q220%20381%20227%20388Q288%20442%20357%20442Q411%20442%20444%20415T478%20336Q478%20285%20440%20178T402%2050Q403%2036%20407%2031T422%2026Q450%2026%20474%2056T513%20138Q516%20149%20519%20151T535%20153Q555%20153%20555%20145Q555%20144%20551%20130Q535%2071%20500%2033Q466%20-10%20419%20-10H414Q367%20-10%20346%2017T325%2074Q325%2090%20361%20192T398%20345Q398%20404%20354%20404H349Q266%20404%20205%20306L198%20293L164%20158Q132%2028%20127%2016Q114%20-11%2083%20-11Q69%20-11%2059%20-2T48%2016Q48%2030%20121%20320L195%20616Q195%20629%20188%20632T149%20637H128Q122%20643%20122%20645T124%20664Q129%20683%20137%20683Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-57%22%20d%3D%22M436%20683Q450%20683%20486%20682T553%20680Q604%20680%20638%20681T677%20682Q695%20682%20695%20674Q695%20670%20692%20659Q687%20641%20683%20639T661%20637Q636%20636%20621%20632T600%20624T597%20615Q597%20603%20613%20377T629%20138L631%20141Q633%20144%20637%20151T649%20170T666%20200T690%20241T720%20295T759%20362Q863%20546%20877%20572T892%20604Q892%20619%20873%20628T831%20637Q817%20637%20817%20647Q817%20650%20819%20660Q823%20676%20825%20679T839%20682Q842%20682%20856%20682T895%20682T949%20681Q1015%20681%201034%20683Q1048%20683%201048%20672Q1048%20666%201045%20655T1038%20640T1028%20637Q1006%20637%20988%20631T958%20617T939%20600T927%20584L923%20578L754%20282Q586%20-14%20585%20-15Q579%20-22%20561%20-22Q546%20-22%20542%20-17Q539%20-14%20523%20229T506%20480L494%20462Q472%20425%20366%20239Q222%20-13%20220%20-15T215%20-19Q210%20-22%20197%20-22Q178%20-22%20176%20-15Q176%20-12%20154%20304T131%20622Q129%20631%20121%20633T82%20637H58Q51%20644%2051%20648Q52%20671%2064%20683H76Q118%20680%20176%20680Q301%20680%20313%20683H323Q329%20677%20329%20674T327%20656Q322%20641%20318%20637H297Q236%20634%20232%20620Q262%20160%20266%20136L501%20550L499%20587Q496%20629%20489%20632Q483%20636%20447%20637Q428%20637%20422%20639T416%20648Q416%20650%20418%20660Q419%20664%20420%20669T421%20676T424%20680T428%20682T436%20683Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-73%22%20d%3D%22M131%20289Q131%20321%20147%20354T203%20415T300%20442Q362%20442%20390%20415T419%20355Q419%20323%20402%20308T364%20292Q351%20292%20340%20300T328%20326Q328%20342%20337%20354T354%20372T367%20378Q368%20378%20368%20379Q368%20382%20361%20388T336%20399T297%20405Q249%20405%20227%20379T204%20326Q204%20301%20223%20291T278%20274T330%20259Q396%20230%20396%20163Q396%20135%20385%20107T352%2051T289%207T195%20-10Q118%20-10%2086%2019T53%2087Q53%20126%2074%20143T118%20160Q133%20160%20146%20151T160%20120Q160%2094%20142%2076T111%2058Q109%2057%20108%2057T107%2055Q108%2052%20115%2047T146%2034T201%2027Q237%2027%20263%2038T301%2066T318%2097T323%20122Q323%20150%20302%20164T254%20181T195%20196T148%20231Q131%20256%20131%20289Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMAIN-2B%22%20d%3D%22M56%20237T56%20250T70%20270H369V420L370%20570Q380%20583%20389%20583Q402%20583%20409%20568V270H707Q722%20262%20722%20250T707%20230H409V-68Q401%20-82%20391%20-82H389H387Q375%20-82%20369%20-68V230H70Q56%20237%2056%20250Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-62%22%20d%3D%22M73%20647Q73%20657%2077%20670T89%20683Q90%20683%20161%20688T234%20694Q246%20694%20246%20685T212%20542Q204%20508%20195%20472T180%20418L176%20399Q176%20396%20182%20402Q231%20442%20283%20442Q345%20442%20383%20396T422%20280Q422%20169%20343%2079T173%20-11Q123%20-11%2082%2027T40%20150V159Q40%20180%2048%20217T97%20414Q147%20611%20147%20623T109%20637Q104%20637%20101%20637H96Q86%20637%2083%20637T76%20640T73%20647ZM336%20325V331Q336%20405%20275%20405Q258%20405%20240%20397T207%20376T181%20352T163%20330L157%20322L136%20236Q114%20150%20114%20114Q114%2066%20138%2042Q154%2026%20178%2026Q211%2026%20245%2058Q270%2081%20285%20114T318%20219Q336%20291%20336%20325Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMAIN-29%22%20d%3D%22M60%20749L64%20750Q69%20750%2074%20750H86L114%20726Q208%20641%20251%20514T294%20250Q294%20182%20284%20119T261%2012T224%20-76T186%20-143T145%20-194T113%20-227T90%20-246Q87%20-249%2086%20-250H74Q66%20-250%2063%20-250T58%20-247T55%20-238Q56%20-237%2066%20-225Q221%20-64%20221%20250T66%20725Q56%20737%2055%20738Q55%20746%2060%20749Z%22%3E%3C%2Fpath%3E%0A%3C%2Fdefs%3E%0A%3Cg%20stroke%3D%22currentColor%22%20fill%3D%22currentColor%22%20stroke-width%3D%220%22%20transform%3D%22matrix(1%200%200%20-1%200%200)%22%20aria-hidden%3D%22true%22%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-50%22%20x%3D%220%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20transform%3D%22scale(0.707)%22%20xlink%3Ahref%3D%22%23E1-MJMATHI-53%22%20x%3D%221109%22%20y%3D%22583%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMAIN-3D%22%20x%3D%221618%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-53%22%20x%3D%222674%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-6F%22%20x%3D%223320%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-66%22%20x%3D%223805%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-74%22%20x%3D%224356%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-6D%22%20x%3D%224717%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-61%22%20x%3D%225596%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-78%22%20x%3D%226125%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMAIN-28%22%20x%3D%226698%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-68%22%20x%3D%227087%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%3Cg%20transform%3D%22translate(7664%2C0)%22%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-57%22%20x%3D%220%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20transform%3D%22scale(0.707)%22%20xlink%3Ahref%3D%22%23E1-MJMATHI-73%22%20x%3D%221526%22%20y%3D%22583%22%3E%3C%2Fuse%3E%0A%3C%2Fg%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMAIN-2B%22%20x%3D%229398%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%3Cg%20transform%3D%22translate(10398%2C0)%22%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-62%22%20x%3D%220%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20transform%3D%22scale(0.707)%22%20xlink%3Ahref%3D%22%23E1-MJMATHI-73%22%20x%3D%22607%22%20y%3D%22583%22%3E%3C%2Fuse%3E%0A%3C%2Fg%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMAIN-29%22%20x%3D%2211260%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%3C%2Fg%3E%0A%3C%2Fsvg%3E#card=math&code=P%5E%7BS%7D%3DSoftmax%28hW%5E%7Bs%7D%2Bb%5E%7Bs%7D%29&id=yClKg)
其中,第七章 预训练语言模型 - 图166表示全连接层的权重;第七章 预训练语言模型 - 图167表示全连接层的偏置;加在每一个时刻的输出上(即复制成N 份,与hWs相加)。类似地,通过下式计算终止位置概率第七章 预训练语言模型 - 图168
第七章 预训练语言模型 - 图169
其中,第七章 预训练语言模型 - 图170表示全连接层的权重;第七章 预训练语言模型 - 图171表示全连接层的偏置,加在每一个时刻的输出上最后

在得到输入序列的起始位置概率第七章 预训练语言模型 - 图172以及终止位置的概率第七章 预训练语言模型 - 图173后,通过交叉熵损失函数学习模型参数。最终,将起始位置和终止位置的交叉熵损失平均,得到模型最终的总损失第七章 预训练语言模型 - 图174
第七章 预训练语言模型 - 图175
(4)解码方法
在得到起始位置以及终止位置的概率后,使用简单的基于Top-k 的答案抽取方法获得最终答案。首先,该算法分别计算出起始位置和终止位置中概率最高的k个项目,并记录对应的下标和概率,形成二元组〈位置,概率〉。对于任意一项起始位置二元组中的概率第七章 预训练语言模型 - 图176和任意一项终止位置二元组中的概率第七章 预训练语言模型 - 图177,计算概率乘积第七章 预训练语言模型 - 图178,以代表由对应起始位置与终止位置形成的文本片段概率:
第七章 预训练语言模型 - 图179 第七章 预训练语言模型 - 图180
最终形成k×k个三元组〈起始位置,终止位置,文本片段概率〉,并对该三元组列表按文本片段概率降序排列。由于抽取答案需要满足先决条件“起始位置≤终止位置”,系统依次扫描上述三元组列表,并将概率最高且满足先决条件的三元组抽取出来。最终,根据该三元组中的起始位置和终止位置信息抽取出相应的文本片段作为答案进行输出。

2.代码实现
接下来将结合实际代码,介绍BERT在阅读理解任务中的训练方法。这里以经典的英文抽取式阅读理解数据集SQuAD[22]为例介绍。以下是阅读理解任务的精调代码。

  1. 待补充

7.4.5 序列标注

1.建模方法
以序列标注中的典型任务——命名实体识别(NamedEntity Recogni-tion,NER)介绍BERT在序列标注任务中的典型应用方法。
命名实体识别需要针对给定输入文本的每个词输出一个标签,以此指定某个命名实体的边界信息。通常命名实体包含三种类型——人名、地名和机构名。主流的命名实体识别可分为“BIO”或“BIOES”标注模式,主要根据边界识别的准则划分,以BIO标注模式为例。
image.png
通常,基于传统神经网络模型的命名实体识别方法是以词为粒度建模的。而在以 BERT 为代表的预训练语言模型中,通常使用切分粒度更小的分词器(如WordPiece)处理输入文本,而这将破坏词与序列标签的一一对应关系。同时,需要额外记录输入文本中每个词的切分情况并对齐序列标签。
为了简化上述问题,规定当一个词被切分成若干个子词时,所有子词继承原标签。
可以看到最后一个词“Harbin”对应的原始标签是“B-LOC”。而经过BERT的WordPiece分词处理后,“Harbin”被切分成“Ha”和“##rbin”两个子词。根据上面的规则,子词“Ha”和“##rbin”均映射到原标签“B-LOC”。
image.png
应用BERT处理命名实体识别任务的模型,由输入层、BERT编码层和序列标注层构成
image.png
(1)输入层
输入层的建模与单句文本分类类似,只需对给定输入文本第七章 预训练语言模型 - 图184进行如下处理,得到BERT的原始输入X和输入层表示v。
第七章 预训练语言模型 - 图185
第七章 预训练语言模型 - 图186%3C%2Ftitle%3E%0A%3Cdefs%20aria-hidden%3D%22true%22%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-76%22%20d%3D%22M173%20380Q173%20405%20154%20405Q130%20405%20104%20376T61%20287Q60%20286%2059%20284T58%20281T56%20279T53%20278T49%20278T41%20278H27Q21%20284%2021%20287Q21%20294%2029%20316T53%20368T97%20419T160%20441Q202%20441%20225%20417T249%20361Q249%20344%20246%20335Q246%20329%20231%20291T200%20202T182%20113Q182%2086%20187%2069Q200%2026%20250%2026Q287%2026%20319%2060T369%20139T398%20222T409%20277Q409%20300%20401%20317T383%20343T365%20361T357%20383Q357%20405%20376%20424T417%20443Q436%20443%20451%20425T467%20367Q467%20340%20455%20284T418%20159T347%2040T241%20-11Q177%20-11%20139%2022Q102%2054%20102%20117Q102%20148%20110%20181T151%20298Q173%20362%20173%20380Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMAIN-3D%22%20d%3D%22M56%20347Q56%20360%2070%20367H707Q722%20359%20722%20347Q722%20336%20708%20328L390%20327H72Q56%20332%2056%20347ZM56%20153Q56%20168%2072%20173H708Q722%20163%20722%20153Q722%20140%20707%20133H70Q56%20140%2056%20153Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-49%22%20d%3D%22M43%201Q26%201%2026%2010Q26%2012%2029%2024Q34%2043%2039%2045Q42%2046%2054%2046H60Q120%2046%20136%2053Q137%2053%20138%2054Q143%2056%20149%2077T198%20273Q210%20318%20216%20344Q286%20624%20286%20626Q284%20630%20284%20631Q274%20637%20213%20637H193Q184%20643%20189%20662Q193%20677%20195%20680T209%20683H213Q285%20681%20359%20681Q481%20681%20487%20683H497Q504%20676%20504%20672T501%20655T494%20639Q491%20637%20471%20637Q440%20637%20407%20634Q393%20631%20388%20623Q381%20609%20337%20432Q326%20385%20315%20341Q245%2065%20245%2059Q245%2052%20255%2050T307%2046H339Q345%2038%20345%2037T342%2019Q338%206%20332%200H316Q279%202%20179%202Q143%202%20113%202T65%202T43%201Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-6E%22%20d%3D%22M21%20287Q22%20293%2024%20303T36%20341T56%20388T89%20425T135%20442Q171%20442%20195%20424T225%20390T231%20369Q231%20367%20232%20367L243%20378Q304%20442%20382%20442Q436%20442%20469%20415T503%20336T465%20179T427%2052Q427%2026%20444%2026Q450%2026%20453%2027Q482%2032%20505%2065T540%20145Q542%20153%20560%20153Q580%20153%20580%20145Q580%20144%20576%20130Q568%20101%20554%2073T508%2017T439%20-10Q392%20-10%20371%2017T350%2073Q350%2092%20386%20193T423%20345Q423%20404%20379%20404H374Q288%20404%20229%20303L222%20291L189%20157Q156%2026%20151%2016Q138%20-11%20108%20-11Q95%20-11%2087%20-5T76%207T74%2017Q74%2030%20112%20180T152%20343Q153%20348%20153%20366Q153%20405%20129%20405Q91%20405%2066%20305Q60%20285%2060%20284Q58%20278%2041%20278H27Q21%20284%2021%20287Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-70%22%20d%3D%22M23%20287Q24%20290%2025%20295T30%20317T40%20348T55%20381T75%20411T101%20433T134%20442Q209%20442%20230%20378L240%20387Q302%20442%20358%20442Q423%20442%20460%20395T497%20281Q497%20173%20421%2082T249%20-10Q227%20-10%20210%20-4Q199%201%20187%2011T168%2028L161%2036Q160%2035%20139%20-51T118%20-138Q118%20-144%20126%20-145T163%20-148H188Q194%20-155%20194%20-157T191%20-175Q188%20-187%20185%20-190T172%20-194Q170%20-194%20161%20-194T127%20-193T65%20-192Q-5%20-192%20-24%20-194H-32Q-39%20-187%20-39%20-183Q-37%20-156%20-26%20-148H-6Q28%20-147%2033%20-136Q36%20-130%2094%20103T155%20350Q156%20355%20156%20364Q156%20405%20131%20405Q109%20405%2094%20377T71%20316T59%20280Q57%20278%2043%20278H29Q23%20284%2023%20287ZM178%20102Q200%2026%20252%2026Q282%2026%20310%2049T356%20107Q374%20141%20392%20215T411%20325V331Q411%20405%20350%20405Q339%20405%20328%20402T306%20393T286%20380T269%20365T254%20350T243%20336T235%20326L232%20322Q232%20321%20229%20308T218%20264T204%20212Q178%20106%20178%20102Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-75%22%20d%3D%22M21%20287Q21%20295%2030%20318T55%20370T99%20420T158%20442Q204%20442%20227%20417T250%20358Q250%20340%20216%20246T182%20105Q182%2062%20196%2045T238%2027T291%2044T328%2078L339%2095Q341%2099%20377%20247Q407%20367%20413%20387T427%20416Q444%20431%20463%20431Q480%20431%20488%20421T496%20402L420%2084Q419%2079%20419%2068Q419%2043%20426%2035T447%2026Q469%2029%20482%2057T512%20145Q514%20153%20532%20153Q551%20153%20551%20144Q550%20139%20549%20130T540%2098T523%2055T498%2017T462%20-8Q454%20-10%20438%20-10Q372%20-10%20347%2046Q345%2045%20336%2036T318%2021T296%206T267%20-6T233%20-11Q189%20-11%20155%207Q103%2038%20103%20113Q103%20170%20138%20262T173%20379Q173%20380%20173%20381Q173%20390%20173%20393T169%20400T158%20404H154Q131%20404%20112%20385T82%20344T65%20302T57%20280Q55%20278%2041%20278H27Q21%20284%2021%20287Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-74%22%20d%3D%22M26%20385Q19%20392%2019%20395Q19%20399%2022%20411T27%20425Q29%20430%2036%20430T87%20431H140L159%20511Q162%20522%20166%20540T173%20566T179%20586T187%20603T197%20615T211%20624T229%20626Q247%20625%20254%20615T261%20596Q261%20589%20252%20549T232%20470L222%20433Q222%20431%20272%20431H323Q330%20424%20330%20420Q330%20398%20317%20385H210L174%20240Q135%2080%20135%2068Q135%2026%20162%2026Q197%2026%20230%2060T283%20144Q285%20150%20288%20151T303%20153H307Q322%20153%20322%20145Q322%20142%20319%20133Q314%20117%20301%2095T267%2048T216%206T155%20-11Q125%20-11%2098%204T59%2056Q57%2064%2057%2083V101L92%20241Q127%20382%20128%20383Q128%20385%2077%20385H26Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-52%22%20d%3D%22M230%20637Q203%20637%20198%20638T193%20649Q193%20676%20204%20682Q206%20683%20378%20683Q550%20682%20564%20680Q620%20672%20658%20652T712%20606T733%20563T739%20529Q739%20484%20710%20445T643%20385T576%20351T538%20338L545%20333Q612%20295%20612%20223Q612%20212%20607%20162T602%2080V71Q602%2053%20603%2043T614%2025T640%2016Q668%2016%20686%2038T712%2085Q717%2099%20720%20102T735%20105Q755%20105%20755%2093Q755%2075%20731%2036Q693%20-21%20641%20-21H632Q571%20-21%20531%204T487%2082Q487%20109%20502%20166T517%20239Q517%20290%20474%20313Q459%20320%20449%20321T378%20323H309L277%20193Q244%2061%20244%2059Q244%2055%20245%2054T252%2050T269%2048T302%2046H333Q339%2038%20339%2037T336%2019Q332%206%20326%200H311Q275%202%20180%202Q146%202%20117%202T71%202T50%201Q33%201%2033%2010Q33%2012%2036%2024Q41%2043%2046%2045Q50%2046%2061%2046H67Q94%2046%20127%2049Q141%2052%20146%2061Q149%2065%20218%20339T287%20628Q287%20635%20230%20637ZM630%20554Q630%20586%20609%20608T523%20636Q521%20636%20500%20636T462%20637H440Q393%20637%20386%20627Q385%20624%20352%20494T319%20361Q319%20360%20388%20360Q466%20361%20492%20367Q556%20377%20592%20426Q608%20449%20619%20486T630%20554Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-65%22%20d%3D%22M39%20168Q39%20225%2058%20272T107%20350T174%20402T244%20433T307%20442H310Q355%20442%20388%20420T421%20355Q421%20265%20310%20237Q261%20224%20176%20223Q139%20223%20138%20221Q138%20219%20132%20186T125%20128Q125%2081%20146%2054T209%2026T302%2045T394%20111Q403%20121%20406%20121Q410%20121%20419%20112T429%2098T420%2082T390%2055T344%2024T281%20-1T205%20-11Q126%20-11%2083%2042T39%20168ZM373%20353Q367%20405%20305%20405Q272%20405%20244%20391T199%20357T170%20316T154%20280T149%20261Q149%20260%20169%20260Q282%20260%20327%20284T373%20353Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-72%22%20d%3D%22M21%20287Q22%20290%2023%20295T28%20317T38%20348T53%20381T73%20411T99%20433T132%20442Q161%20442%20183%20430T214%20408T225%20388Q227%20382%20228%20382T236%20389Q284%20441%20347%20441H350Q398%20441%20422%20400Q430%20381%20430%20363Q430%20333%20417%20315T391%20292T366%20288Q346%20288%20334%20299T322%20328Q322%20376%20378%20392Q356%20405%20342%20405Q286%20405%20239%20331Q229%20315%20224%20298T190%20165Q156%2025%20151%2016Q138%20-11%20108%20-11Q95%20-11%2087%20-5T76%207T74%2017Q74%2030%20114%20189T154%20366Q154%20405%20128%20405Q107%20405%2092%20377T68%20316T57%20280Q55%20278%2041%20278H27Q21%20284%2021%20287Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-73%22%20d%3D%22M131%20289Q131%20321%20147%20354T203%20415T300%20442Q362%20442%20390%20415T419%20355Q419%20323%20402%20308T364%20292Q351%20292%20340%20300T328%20326Q328%20342%20337%20354T354%20372T367%20378Q368%20378%20368%20379Q368%20382%20361%20388T336%20399T297%20405Q249%20405%20227%20379T204%20326Q204%20301%20223%20291T278%20274T330%20259Q396%20230%20396%20163Q396%20135%20385%20107T352%2051T289%207T195%20-10Q118%20-10%2086%2019T53%2087Q53%20126%2074%20143T118%20160Q133%20160%20146%20151T160%20120Q160%2094%20142%2076T111%2058Q109%2057%20108%2057T107%2055Q108%2052%20115%2047T146%2034T201%2027Q237%2027%20263%2038T301%2066T318%2097T323%20122Q323%20150%20302%20164T254%20181T195%20196T148%20231Q131%20256%20131%20289Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-61%22%20d%3D%22M33%20157Q33%20258%20109%20349T280%20441Q331%20441%20370%20392Q386%20422%20416%20422Q429%20422%20439%20414T449%20394Q449%20381%20412%20234T374%2068Q374%2043%20381%2035T402%2026Q411%2027%20422%2035Q443%2055%20463%20131Q469%20151%20473%20152Q475%20153%20483%20153H487Q506%20153%20506%20144Q506%20138%20501%20117T481%2063T449%2013Q436%200%20417%20-8Q409%20-10%20393%20-10Q359%20-10%20336%205T306%2036L300%2051Q299%2052%20296%2050Q294%2048%20292%2046Q233%20-10%20172%20-10Q117%20-10%2075%2030T33%20157ZM351%20328Q351%20334%20346%20350T323%20385T277%20405Q242%20405%20210%20374T160%20293Q131%20214%20119%20129Q119%20126%20119%20118T118%20106Q118%2061%20136%2044T179%2026Q217%2026%20254%2059T298%20110Q300%20114%20325%20217T351%20328Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-69%22%20d%3D%22M184%20600Q184%20624%20203%20642T247%20661Q265%20661%20277%20649T290%20619Q290%20596%20270%20577T226%20557Q211%20557%20198%20567T184%20600ZM21%20287Q21%20295%2030%20318T54%20369T98%20420T158%20442Q197%20442%20223%20419T250%20357Q250%20340%20236%20301T196%20196T154%2083Q149%2061%20149%2051Q149%2026%20166%2026Q175%2026%20185%2029T208%2043T235%2078T260%20137Q263%20149%20265%20151T282%20153Q302%20153%20302%20143Q302%20135%20293%20112T268%2061T223%2011T161%20-11Q129%20-11%20102%2010T74%2074Q74%2091%2079%20106T122%20220Q160%20321%20166%20341T173%20380Q173%20404%20156%20404H154Q124%20404%2099%20371T61%20287Q60%20286%2059%20284T58%20281T56%20279T53%20278T49%20278T41%20278H27Q21%20284%2021%20287Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-6F%22%20d%3D%22M201%20-11Q126%20-11%2080%2038T34%20156Q34%20221%2064%20279T146%20380Q222%20441%20301%20441Q333%20441%20341%20440Q354%20437%20367%20433T402%20417T438%20387T464%20338T476%20268Q476%20161%20390%2075T201%20-11ZM121%20120Q121%2070%20147%2048T206%2026Q250%2026%20289%2058T351%20142Q360%20163%20374%20216T388%20308Q388%20352%20370%20375Q346%20405%20306%20405Q243%20405%20195%20347Q158%20303%20140%20230T121%20120Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMAIN-28%22%20d%3D%22M94%20250Q94%20319%20104%20381T127%20488T164%20576T202%20643T244%20695T277%20729T302%20750H315H319Q333%20750%20333%20741Q333%20738%20316%20720T275%20667T226%20581T184%20443T167%20250T184%2058T225%20-81T274%20-167T316%20-220T333%20-241Q333%20-250%20318%20-250H315H302L274%20-226Q180%20-141%20137%20-14T94%20250Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-58%22%20d%3D%22M42%200H40Q26%200%2026%2011Q26%2015%2029%2027Q33%2041%2036%2043T55%2046Q141%2049%20190%2098Q200%20108%20306%20224T411%20342Q302%20620%20297%20625Q288%20636%20234%20637H206Q200%20643%20200%20645T202%20664Q206%20677%20212%20683H226Q260%20681%20347%20681Q380%20681%20408%20681T453%20682T473%20682Q490%20682%20490%20671Q490%20670%20488%20658Q484%20643%20481%20640T465%20637Q434%20634%20411%20620L488%20426L541%20485Q646%20598%20646%20610Q646%20628%20622%20635Q617%20635%20609%20637Q594%20637%20594%20648Q594%20650%20596%20664Q600%20677%20606%20683H618Q619%20683%20643%20683T697%20681T738%20680Q828%20680%20837%20683H845Q852%20676%20852%20672Q850%20647%20840%20637H824Q790%20636%20763%20628T722%20611T698%20593L687%20584Q687%20585%20592%20480L505%20384Q505%20383%20536%20304T601%20142T638%2056Q648%2047%20699%2046Q734%2046%20734%2037Q734%2035%20732%2023Q728%207%20725%204T711%201Q708%201%20678%201T589%202Q528%202%20496%202T461%201Q444%201%20444%2010Q444%2011%20446%2025Q448%2035%20450%2039T455%2044T464%2046T480%2047T506%2054Q523%2062%20523%2064Q522%2064%20476%20181L429%20299Q241%2095%20236%2084Q232%2076%20232%2072Q232%2053%20261%2047Q262%2047%20267%2047T273%2046Q276%2046%20277%2046T280%2045T283%2042T284%2035Q284%2026%20282%2019Q279%206%20276%204T261%201Q258%201%20243%201T201%202T142%202Q64%202%2042%200Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMAIN-29%22%20d%3D%22M60%20749L64%20750Q69%20750%2074%20750H86L114%20726Q208%20641%20251%20514T294%20250Q294%20182%20284%20119T261%2012T224%20-76T186%20-143T145%20-194T113%20-227T90%20-246Q87%20-249%2086%20-250H74Q66%20-250%2063%20-250T58%20-247T55%20-238Q56%20-237%2066%20-225Q221%20-64%20221%20250T66%20725Q56%20737%2055%20738Q55%20746%2060%20749Z%22%3E%3C%2Fpath%3E%0A%3C%2Fdefs%3E%0A%3Cg%20stroke%3D%22currentColor%22%20fill%3D%22currentColor%22%20stroke-width%3D%220%22%20transform%3D%22matrix(1%200%200%20-1%200%200)%22%20aria-hidden%3D%22true%22%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-76%22%20x%3D%220%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMAIN-3D%22%20x%3D%22763%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-49%22%20x%3D%221819%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-6E%22%20x%3D%222324%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-70%22%20x%3D%222924%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-75%22%20x%3D%223428%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-74%22%20x%3D%224000%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-52%22%20x%3D%224362%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-65%22%20x%3D%225121%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-70%22%20x%3D%225588%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-72%22%20x%3D%226091%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-65%22%20x%3D%226543%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-73%22%20x%3D%227009%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-65%22%20x%3D%227479%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-6E%22%20x%3D%227945%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-74%22%20x%3D%228546%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-61%22%20x%3D%228907%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-74%22%20x%3D%229437%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-69%22%20x%3D%229798%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-6F%22%20x%3D%2210144%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-6E%22%20x%3D%2210629%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMAIN-28%22%20x%3D%2211230%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-58%22%20x%3D%2211619%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMAIN-29%22%20x%3D%2212472%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%3C%2Fg%3E%0A%3C%2Fsvg%3E#card=math&code=v%3DInputRepresentation%28X%29&id=TygWc)
其中,[CLS]表示文本序列开始的特殊记号,[SEP]表示文本序列之间的分隔标记。

(2)BERT编码层
在BERT编码层中的操作与阅读理解任务类似,需要得到输入文本中每个词对应的BERT隐含层表示。输入层表示v经过多层Trans-former的编码,借助自注意力机制充分学习文本内部的语义关联,并得到上下文语义表示第七章 预训练语言模型 - 图187,其中,d为BERT的隐藏层维度。
第七章 预训练语言模型 - 图188

(3)输出层
在阅读理解任务中,利用全连接层变换BERT隐含层表示,得到每个词成为答案起始位置或终止位置的概率,即每个时刻对应的输出神经元个数为1。而在命名实体识别任务中,需要针对每个词给出“BIO”标注模式下的分类预测。因此,这一部分仍然使用全连接层变换BERT隐含层表示,而输出神经元个数变为K,对应“BIO”标注模式下K个类别的概率值。
正式地,在得到输入序列的上下文语义表示h后,针对输入序列中的每一个时刻t,预测在“BIO”标注模式下的概率分布第七章 预训练语言模型 - 图189,其计算方法为:
第七章 预训练语言模型 - 图190
其中,第七章 预训练语言模型 - 图191表示全连接层的权重;第七章 预训练语言模型 - 图192表示全连接层的偏置;第七章 预训练语言模型 - 图193表示h在时刻t的分量;
最后,在得到每个位置对应的概率分布后,通过交叉熵损失函数对模型参数学习。
同时,为了进一步提升序列标注的准确性,也可以在概率输出之上增加传统命名实体识别模型中使用的条件随机场(Conditional Random Field,CRF)预测。
感兴趣的读者可以阅读相关文献了解替换方法。

2.代码实现
以常用的命名实体识别数据集CoNLL-2003NER[24]为例
这一部分需要额外的seqeval库计算命名实体识别的相关指标

7.5 深入理解BERT

7.5.1 概述

以BERT、GPT等为代表的预训练技术为自然语言处理领域带来了巨大的变革。例如,BERT模型含有上亿个参数,而OpenAI发布的GPT-3模型更是达到了惊人的千亿级参数。尽管这些大规模的预训练模型在很多任务上表现优异,但是庞大的模型体量也使得其预测行为变得更加难以“理解”以及“不可控”
对于很多实际应用而言,模型的性能固然重要,但是对于模型行为给出可信的解释同样很关键

从这个角度出发,大致衍生出两大类“可解释性”方面的研究,分别是

  • 构建能够“自解释”(Self-explainable)的模型
  • 以及对于模型行为“事后解释”(Post-hoc explanation)

前者要求在模型构建之初针对性地设计其结构,使其具备可解释性;而对于BERT等大规模预训练模型的解释性研究,主要集中于后者。

“解释性”实际上是以人类的视角理解模型的行为。而在自然语言处理任务中,最具解释性的人类概念系统无疑是语言学特征。例如,
BERT作为一个多任务通用的编码器,能够表达哪些语言学特征?
BERT模型每一层使用的多头注意力又分别捕获了哪些关系特征?
它的每一层表示是否和ELMo一样具有层次性?
后续将从自注意力和表示学习两个角度分析 BERT 模型。

7.5.2 自注意力可视化分析

BERT 模型依赖 Transformer 结构,其主要由多层自注意力网络层堆叠而成(含残差连接)。而自注意力的本质事实上是对词(或标记)与词之间关系的刻画。不同类型的关系可以表达丰富的语义,例如名词短语内的依存关系、句法依存关系和指代关系等。而这些关系特征对于大部分语义理解类自然语言处理任务具有关键的作用。因此,自注意力的分析将有助于理解BERT模型对于关系(relational)特征的学习能力。

研究者随机选取了1,000个维基百科文本片段,并对BERT多头自注意力进行了分析。
image.png
可以看出,不同的注意力头具有比较多样化的行为,因而能够编码不同类型的上下文和关系特征。
文献作者进一步分析了不同层的注意力分布。通过计算各层注意力分布的信息熵可以发现,一部分注意力头分布具有较大的熵值(接近平均分布),尤其在BERT的浅层。而在较深的自注意力层(如6~8层),其分布相对集中,熵值较小。当接近输出层时,熵值又增大。这种变化趋势在一定程度上可以反映BERT模型中信息聚合(或语义组合)的过程。
image.png
在注意力分布较为“广泛”的模型浅层,其表示接近于词袋表示。随着层次变深,信息开始以不同的方式组合,从而形成集中在不同局部的注意力分布。而接近输出层的自注意力分布与目标预训练任务直接相关。

7.5.3 探针实验

为了更准确地理解模型的行为,仍然需要定量的实验分析。
目前被广为采用的定量分析方法探针实验。探针实验的核心思想是设计特定的探针,对于待分析对象(如自注意力或隐含层表示)进行特定行为分析。

探针通常是一个非参或者非常轻量的参数模型(如线性分类器),它接受待分析对象作为输入,并对特定行为预测。而预测的准确度可以作为待分析对象是否具有该行为的衡量指标。例如,为了检验某个自注意力头对直接宾语(Direct object,dobj)关系的表达能力,可以设计一个探针对该自注意力头在dobj句法关系预测上的表现进行分析。

研究者在宾州依存树库(PTB)上进行了探针实验,结果表明在BERT模型中,确实存在一部分自注意力头较好地捕捉到特定的句法关系。例如,对于dobj关系的预测准确率达到了86.8%。对于更复杂的共指关系(Coreference),同样能够找到具有较好预测能力的自注意力头。
image.png
自注意力反映了预训练模型内部信息的聚合过程,而模型的各层隐含层表示是聚合的结果。因此,也可以对预训练编码器的隐含层表示直接进行探针实验,从而更好地理解其特性。这里的探针可以是一个简单的线性分类器,该分类器利用模型的隐含层表示作为特征在目标任务(如词性标注)上训练,从而根据该任务的表现对预训练模型隐含层表示中蕴含的语言学特征评估。图7-16展示了这类探针的一般性框架。对于更复杂的结构预测类任务,如句法分析等,也可以设计针对性的结构化探针。
image.png

习题