BERT——Transformer Encoder

Pretrain

Masked token prediction
Next sentence prediction

Application

1.Input:sequence Output:class Sentiment analusis
2.Input:sequence Output:same length as input POS tagging
3.Input:two sequences Output: a class Natural Language Inference(NLI)
4.Input: document+question Output:answer in document QA

Why does BERT work

BERT在预训练过程中通过Self Attention 和大量的语料学到了词汇的向量表示,并且能根据上下文对一词多义进行区分。可以将其理解为DNN版的CBOW,Contextualized word embedding。

相关链接

链接
bert_v8.pptx