- 19_NIPS_Unified Language Model Pre-training forNatural Language Understanding and Generation
- 20_UNILMv2 Pseudo-Masked Language Models forUnified Language Model Pre-Training
三篇降低transformer复杂度的工作
+ [NeurIPS’2020 submit] TensorCoder: Dimension-Wise Attention via Tensor Representation for Natural Language Modeling
华为诺亚的工作,利用embedding dim之间的attention取代token attention,复杂度O(Nd^2),效率更高。
[ICLR] LambdaNetworks: Modeling Long-Range Interactions Without Attention
[Arxiv’20] Linformer: Self-Attention with Linear Complexity