AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALETraining data-efficient image transformers & distillation through attentionDeiT III: Revenge of the ViTSwin Transformer: Hierarchical Vision Transformer using Shifted Windows