• 19_NIPS_Unified Language Model Pre-training forNatural Language Understanding and Generation
    • 20_UNILMv2 Pseudo-Masked Language Models forUnified Language Model Pre-Training

    三篇降低transformer复杂度的工作
    + [NeurIPS’2020 submit] TensorCoder: Dimension-Wise Attention via Tensor Representation for Natural Language Modeling

    华为诺亚的工作,利用embedding dim之间的attention取代token attention,复杂度O(Nd^2),效率更高。

    • [ICLR] LambdaNetworks: Modeling Long-Range Interactions Without Attention

    • [Arxiv’20] Linformer: Self-Attention with Linear Complexity