- S2-MLP V1&V2 : Spatial-Shift MLP Architecture for Vision
 - FishNet: A Versatile Backbone for Image, Region, and Pixel Level Prediction
 - Attention Augmented Convolutional Networks
 - Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
 - LambdaNetworks: Modeling long-range Interactions without Attention
 - Involution: Inverting the Inherence of Convolution for Visual Recognition
 - ConvMLP: Hierarchical Convolutional MLPs for Vision
 - Sparse-MLP: A Fully-MLP Architecture with Conditional Computation
 - Hire-MLP: Vision MLP via Hierarchical Rearrangement
 - RaftMLP: Do MLP-based Models Dream of Winning Over Computer Vision?
 - CycleMLP: A MLP-like Architecture for Dense Prediction
 - Vision Transformers with Hierarchical Attention
 - X-volution: On the Unification of Convolution and Self-attention
 - On the Integration of Self-Attention and Convolution
 - DynaMixer: A Vision MLP Architecture with Dynamic Mixing
 - ELSA: Enhanced Local Self-Attention for Vision Transformer
 - Container: Context Aggregation Network
 - Neighborhood Attention Transformer
 - EdgeFormer: Improving Light-weight ConvNets by Learning from Vision Transformers
 - TRT-ViT: TensorRT-oriented Vision Transformer
 - Fast Vision Transformers with HiLo Attention
 - ActiveMLP: An MLP-like Architecture with Active Token Mixer
 
