RNN

Attention

注意力机制,允许模型按需聚焦在输入序列的相关部分
https://jalammar.github.io/visualizing-neural-machine-translation-mechanics-of-seq2seq-models-with-attention/

Transformer

http://jalammar.github.io/illustrated-transformer/

损失函数

交叉熵 https://colah.github.io/posts/2015-09-Visual-Information/
相对熵 https://www.countbayesie.com/blog/2017/5/9/kullback-leibler-divergence-explained

Follow-up works:
Attention Is All You Need

https://cs231n.github.io

Backpropagation and the brain

BERT

GPT