BERT

Pretrain
Application
Why does BERT work
相关链接

BERT——Transformer Encoder

Pretrain

Masked token prediction
Next sentence prediction

Application

1.Input:sequence Output:class Sentiment analusis
2.Input:sequence Output:same length as input POS tagging
3.Input:two sequences Output: a class Natural Language Inference(NLI)
4.Input: document+question Output:answer in document QA

Why does BERT work

BERT在预训练过程中通过Self Attention 和大量的语料学到了词汇的向量表示，并且能根据上下文对一词多义进行区分。可以将其理解为DNN版的CBOW，Contextualized word embedding。

Pretrain

Application

Why does BERT work

相关链接