- 1、architectures / models
- (1)A-H
- 【A】
- 【B】
- BART (Facebook)
- BARThez (École polytechnique)
- BARTpho (VinAI Research)
- BEiT (Microsoft)
- BERT (Google)
- BERTweet (VinAI Research)
- BERT For Sequence Generation (Google)
- BigBird-RoBERTa (Google Research)
- BigBird-Pegasus (Google Research)
- Blenderbot (Facebook)
- BlenderbotSmall (Facebook)
- BORT (Alexa)
- ByT5 (Google Research)
- 【C】
- 【D】
- 【E】
- 【F】
- 【G】
- 【H】
- (2)I-T
- (3)U-Z
- (1)A-H
- 2、Summary of the models
">
All the model checkpoints provided by Transformers are seamlessly integrated from the huggingface.co model hub where they are uploaded directly by users and organizations.
- model checkpoints:https://huggingface.co/models
- users:https://huggingface.co/users
- organizations:https://huggingface.co/organizations
1、architectures / models
(1)A-H
【A】
ALBERT (Google Research & Toyota Technological Institute at Chicago)
- paper:ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
【B】
BART (Facebook)
- https://huggingface.co/docs/transformers/model_doc/bart
- paper:BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
BARThez (École polytechnique)
- https://huggingface.co/docs/transformers/model_doc/barthez
- paper:BARThez: a Skilled Pretrained French Sequence-to-Sequence Model
BARTpho (VinAI Research)
- https://huggingface.co/docs/transformers/model_doc/bartpho
- paper:BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese
BEiT (Microsoft)
- https://huggingface.co/docs/transformers/model_doc/beit
- paper:BEiT: BERT Pre-Training of Image Transformers
BERT (Google)
- https://huggingface.co/docs/transformers/model_doc/bert
- paper:BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERTweet (VinAI Research)
- https://huggingface.co/docs/transformers/model_doc/bertweet
- paper:BERTweet: A pre-trained language model for English Tweets
BERT For Sequence Generation (Google)
- https://huggingface.co/docs/transformers/model_doc/bert-generation
- paper:Leveraging Pre-trained Checkpoints for Sequence Generation Tasks
BigBird-RoBERTa (Google Research)
- https://huggingface.co/docs/transformers/model_doc/big_bird
- paper:Big Bird: Transformers for Longer Sequences
BigBird-Pegasus (Google Research)
- https://huggingface.co/docs/transformers/model_doc/bigbird_pegasus
- paper:Big Bird: Transformers for Longer Sequences
Blenderbot (Facebook)
- https://huggingface.co/docs/transformers/model_doc/blenderbot
- paper:Recipes for building an open-domain chatbot
BlenderbotSmall (Facebook)
- https://huggingface.co/docs/transformers/model_doc/blenderbot-small
- paper:Recipes for building an open-domain chatbot
BORT (Alexa)
- https://huggingface.co/docs/transformers/model_doc/bort
- paper:Optimal Subarchitecture Extraction For BERT
ByT5 (Google Research)
- https://huggingface.co/docs/transformers/model_doc/byt5
- paper:ByT5: Towards a token-free future with pre-trained byte-to-byte models
【C】
CamemBERT (Inria/Facebook/Sorbonne)
- https://huggingface.co/docs/transformers/model_doc/camembert
- paper:CamemBERT: a Tasty French Language Model
CANINE (Google Research)
- https://huggingface.co/docs/transformers/model_doc/canine
- paper:CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation
ConvNeXT (Facebook AI)
CLIP (from OpenAI)
- https://huggingface.co/docs/transformers/model_doc/clip
- paper:Learning Transferable Visual Models From Natural Language Supervision
ConvBERT (YituTech)
- https://huggingface.co/docs/transformers/model_doc/convbert
- paper:ConvBERT: Improving BERT with Span-based Dynamic Convolution
CPM (Tsinghua University)
- https://huggingface.co/docs/transformers/model_doc/cpm
- paper:CPM: A Large-scale Generative Chinese Pre-trained Language Model
CTRL (Salesforce)
- https://huggingface.co/docs/transformers/model_doc/ctrl
- paper:CTRL: A Conditional Transformer Language Model for Controllable Generation
【D】
Data2Vec (Facebook)
- https://huggingface.co/docs/transformers/main/model_doc/data2vec
- paper:Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
DeBERTa (Microsoft)
- https://huggingface.co/docs/transformers/model_doc/deberta
- paper:DeBERTa: Decoding-enhanced BERT with Disentangled Attention
DeBERTa-v2 (Microsoft)
- https://huggingface.co/docs/transformers/model_doc/deberta-v2
- paper:DeBERTa: Decoding-enhanced BERT with Disentangled Attention
Decision Transformer (Berkeley/Facebook/Google)
- https://huggingface.co/docs/transformers/model_doc/decision_transformer
- paper:Decision Transformer: Reinforcement Learning via Sequence Modeling
DiT (Microsoft Research)
- https://huggingface.co/docs/transformers/model_doc/dit
- paper:DiT: Self-supervised Pre-training for Document Image Transformer
DeiT (Facebook)
- https://huggingface.co/docs/transformers/model_doc/deit
- paper:Training data-efficient image transformers & distillation through attention
DETR (Facebook)
- https://huggingface.co/docs/transformers/model_doc/detr
- paper:End-to-End Object Detection with Transformers
DialoGPT (Microsoft Research)
- https://huggingface.co/docs/transformers/model_doc/dialogpt
- paper:DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation
DistilBERT (HuggingFace)
- https://huggingface.co/docs/transformers/model_doc/distilbert
- paper:DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
- The same method has been applied to compress GPT2 into DistilGPT2, RoBERTa into DistilRoBERTa, Multilingual BERT into DistilmBERT and a German version of DistilBERT.
DPR (Facebook)
- https://huggingface.co/docs/transformers/model_doc/dpr
- paper:Dense Passage Retrieval for Open-Domain Question Answering
DPT (Intel Labs)
- https://huggingface.co/docs/transformers/master/model_doc/dpt
- paper:Vision Transformers for Dense Prediction
【E】
EncoderDecoder (Google Research)
- https://huggingface.co/docs/transformers/model_doc/encoder-decoder
- paper:Leveraging Pre-trained Checkpoints for Sequence Generation Tasks
ELECTRA (Google Research/Stanford University)
- https://huggingface.co/docs/transformers/model_doc/electra
- paper:ELECTRA: Pre-training text encoders as discriminators rather than generators
【F】
FlauBERT (CNRS)
- https://huggingface.co/docs/transformers/model_doc/flaubert
- paper:FlauBERT: Unsupervised Language Model Pre-training for French
FNet (Google Research)
- https://huggingface.co/docs/transformers/model_doc/fnet
- paper:FNet: Mixing Tokens with Fourier Transforms
Funnel Transformer (CMU/Google Brain)
- https://huggingface.co/docs/transformers/model_doc/funnel
- paper:Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing
【G】
GLPN (KAIST)
- https://huggingface.co/docs/transformers/main/model_doc/glpn
- paper:Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth
GPT (OpenAI)
- https://huggingface.co/docs/transformers/model_doc/openai-gpt
- paper:Improving Language Understanding by Generative Pre-Training
GPT-2 (OpenAI)
- https://huggingface.co/docs/transformers/model_doc/gpt2
- paper:Language Models are Unsupervised Multitask Learners
GPT-J (EleutherAI)
GPT Neo (EleutherAI)
【H】
Hubert (Facebook)
- https://huggingface.co/docs/transformers/model_doc/hubert
- paper:HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
(2)I-T
【I】
I-BERT (Berkeley)
- https://huggingface.co/docs/transformers/model_doc/ibert
- paper:I-BERT: Integer-only BERT Quantization
ImageGPT (OpenAI)
- https://huggingface.co/docs/transformers/main/model_doc/imagegpt
- paper:Generative Pretraining from Pixels
【L】
LayoutLM (Microsoft Research Asia)
- https://huggingface.co/docs/transformers/model_doc/layoutlm
- paper:LayoutLM: Pre-training of Text and Layout for Document Image Understanding
LayoutLMv2 (Microsoft Research Asia)
- https://huggingface.co/docs/transformers/model_doc/layoutlmv2
- paper:LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
LayoutXLM (Microsoft Research Asia)
- https://huggingface.co/docs/transformers/model_doc/layoutlmv2
- paper:LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding
LED (AllenAI)
- https://huggingface.co/docs/transformers/model_doc/led
- paper:Longformer: The Long-Document Transformer
Longformer (AllenAI)
- https://huggingface.co/docs/transformers/model_doc/longformer
- paper:Longformer: The Long-Document Transformer
LUKE (Studio Ousia)
- https://huggingface.co/docs/transformers/model_doc/luke
- paper:LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention
【M】
mLUKE (Studio Ousia)
- https://huggingface.co/docs/transformers/model_doc/mluke
- paper:mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models
LXMERT (UNC Chapel Hill)
- https://huggingface.co/docs/transformers/model_doc/lxmert
- paper:LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering
M2M100 (Facebook)
- https://huggingface.co/docs/transformers/model_doc/m2m_100
- paper:Beyond English-Centric Multilingual Machine Translation
MarianMT(Microsoft Translator Team & Jörg Tiedemann)
- Machine translation models trained using OPUS data by .
- https://huggingface.co/docs/transformers/model_doc/marian
MaskFormer (Meta and UIUC)
- https://huggingface.co/docs/transformers/main/model_doc/maskformer
- paper:Per-Pixel Classification is Not All You Need for Semantic Segmentation
MBart (Facebook)
- https://huggingface.co/docs/transformers/model_doc/mbart
- paper:Multilingual Denoising Pre-training for Neural Machine Translation
MBart-50 (Facebook)
- https://huggingface.co/docs/transformers/model_doc/mbart
- paper:Multilingual Translation with Extensible Multilingual Pretraining and Finetuning
Megatron-BERT (NVIDIA)
- https://huggingface.co/docs/transformers/model_doc/megatron-bert
- paper:Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Megatron-GPT2 (NVIDIA)
- https://huggingface.co/docs/transformers/model_doc/megatron_gpt2
- paper:Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
MPNet (Microsoft Research)
- https://huggingface.co/docs/transformers/model_doc/mpnet
- paper:MPNet: Masked and Permuted Pre-training for Language Understanding
MT5 (Google AI)
- https://huggingface.co/docs/transformers/model_doc/mt5
- paper:mT5: A massively multilingual pre-trained text-to-text transformer
【N】
Nyströmformer (University of Wisconsin - Madison)
- https://huggingface.co/docs/transformers/main/model_doc/nystromformer
- paper:Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention
【P】
Pegasus (Google)
- https://huggingface.co/docs/transformers/model_doc/pegasus
- paper:PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization
Perceiver IO (Deepmind)
- https://huggingface.co/docs/transformers/model_doc/perceiver
- paper:Perceiver IO: A General Architecture for Structured Inputs & Outputs
PhoBERT (VinAI Research)
- https://huggingface.co/docs/transformers/model_doc/phobert
- paper:PhoBERT: Pre-trained language models for Vietnamese
PLBart (UCLA NLP)
- https://huggingface.co/docs/transformers/main/model_doc/plbart
- paper:Unified Pre-training for Program Understanding and Generation
PoolFormer (Sea AI Labs)
- https://huggingface.co/docs/transformers/main/model_doc/poolformer
- paper:MetaFormer is Actually What You Need for Vision
ProphetNet (Microsoft Research)
- https://huggingface.co/docs/transformers/model_doc/prophetnet
- paper:ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training
【Q】
QDQBert (NVIDIA)
- https://huggingface.co/docs/transformers/model_doc/qdqbert
- paper:Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation
【R】
REALM (Google Research)
- https://huggingface.co/docs/transformers/model_doc/realm.html
- paper:REALM: Retrieval-Augmented Language Model Pre-Training
Reformer (Google Research)
- https://huggingface.co/docs/transformers/model_doc/reformer
- paper:Reformer: The Efficient Transformer
RemBERT (Google Research)
- https://huggingface.co/docs/transformers/model_doc/rembert
- paper:Rethinking embedding coupling in pre-trained language models
RegNet (META Platforms)
ResNet (Microsoft Research)
- https://huggingface.co/docs/transformers/main/model_doc/resnet
- paper:Deep Residual Learning for Image Recognition
RoBERTa (Facebook)
- https://huggingface.co/docs/transformers/model_doc/roberta
- paper:RoBERTa: A Robustly Optimized BERT Pretraining Approach
RoFormer (ZhuiyiTechnology)
- https://huggingface.co/docs/transformers/model_doc/roformer
- paper:RoFormer: Enhanced Transformer with Rotary Position Embedding
【S】
SegFormer (NVIDIA)
- https://huggingface.co/docs/transformers/model_doc/segformer
- paper:SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers
SEW (ASAPP)
- https://huggingface.co/docs/transformers/model_doc/sew
- paper:Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition
SEW-D (ASAPP)
- https://huggingface.co/docs/transformers/model_doc/sew_d
- paper:Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition
SpeechToTextTransformer (Facebook)
- https://huggingface.co/docs/transformers/model_doc/speech_to_text
- paper:fairseq S2T: Fast Speech-to-Text Modeling with fairseq
SpeechToTextTransformer2 (Facebook)
- https://huggingface.co/docs/transformers/model_doc/speech_to_text_2
- paper:Large-Scale Self- and Semi-Supervised Learning for Speech Translation
Splinter (Tel Aviv University)
- https://huggingface.co/docs/transformers/model_doc/splinter
- paper:Few-Shot Question Answering by Pretraining Span Selection
SqueezeBert (Berkeley)
- https://huggingface.co/docs/transformers/model_doc/squeezebert
- paper:SqueezeBERT: What can computer vision teach NLP about efficient neural networks?
Swin Transformer (Microsoft)
- https://huggingface.co/docs/transformers/main/model_doc/swin
- paper:Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
【T】
T5 (Google AI)
- https://huggingface.co/docs/transformers/model_doc/t5
- paper:Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
T5v1.1 (Google AI)
- https://huggingface.co/docs/transformers/model_doc/t5v1.1
- repository:google-research/text-to-text-transfer-transformer
TAPAS (Google AI)
- https://huggingface.co/docs/transformers/model_doc/tapas
- paper:TAPAS: Weakly Supervised Table Parsing via Pre-training
TAPEX (Microsoft Research)
- https://huggingface.co/docs/transformers/main/model_doc/tapex
- paper:TAPEX: Table Pre-training via Learning a Neural SQL Executor
Transformer-XL (Google/CMU)
- https://huggingface.co/docs/transformers/model_doc/transfo-xl
- paper:Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
TrOCR (Microsoft)
- https://huggingface.co/docs/transformers/model_doc/trocr
- paper:TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models
(3)U-Z
【U】
UniSpeech (Microsoft Research)
- https://huggingface.co/docs/transformers/model_doc/unispeech
- paper:UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data
UniSpeechSat (Microsoft Research)
- https://huggingface.co/docs/transformers/model_doc/unispeech-sat
- paper:UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING
【V】
VAN (Tsinghua University and Nankai University)
ViLT (NAVER AI Lab/Kakao Enterprise/Kakao Brain)
- https://huggingface.co/docs/transformers/main/model_doc/vilt
- paper:ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
Vision Transformer (ViT) (Google AI)
- https://huggingface.co/docs/transformers/model_doc/vit
- paper:An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
ViTMAE (Meta AI)
- https://huggingface.co/docs/transformers/main/model_doc/vit_mae
- paper:Masked Autoencoders Are Scalable Vision Learners
VisualBERT (UCLA NLP)
- https://huggingface.co/docs/transformers/model_doc/visual_bert
- paper:VisualBERT: A Simple and Performant Baseline for Vision and Language
【W】
WavLM (Microsoft Research)
- https://huggingface.co/docs/transformers/main/model_doc/wavlm
- paper:WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Wav2Vec2 (Facebook AI)
- https://huggingface.co/docs/transformers/model_doc/wav2vec2
- paper:wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Wav2Vec2Phoneme (Facebook AI)
- https://huggingface.co/docs/transformers/model_doc/wav2vec2_phoneme
- paper:Simple and Effective Zero-shot Cross-lingual Phoneme Recognition
【X】
XGLM (Facebook AI)
- https://huggingface.co/docs/transformers/model_doc/xglm
- paper:Few-shot Learning with Multilingual Language Models
XLM (Facebook)
- https://huggingface.co/docs/transformers/model_doc/xlm
- paper:Cross-lingual Language Model Pretraining
XLM-ProphetNet (Microsoft Research)
- https://huggingface.co/docs/transformers/model_doc/xlm-prophetnet
- paper:ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training
XLM-RoBERTa (Facebook AI)
- https://huggingface.co/docs/transformers/model_doc/xlm-roberta
- paper:Unsupervised Cross-lingual Representation Learning at Scale
XLM-RoBERTa-XL (Facebook AI)
- https://huggingface.co/docs/transformers/model_doc/xlm-roberta-xl
- paper:Larger-Scale Transformers for Multilingual Masked Language Modeling
XLNet (Google/CMU)
- https://huggingface.co/docs/transformers/model_doc/xlnet
- paper:XLNet: Generalized Autoregressive Pretraining for Language Understanding
XLSR-Wav2Vec2 (Facebook AI)
- https://huggingface.co/docs/transformers/model_doc/xlsr_wav2vec2
- paper:Unsupervised Cross-Lingual Representation Learning For Speech Recognition
XLS-R (Facebook AI)
- https://huggingface.co/docs/transformers/model_doc/xls_r
- paper:XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale
【Y】
YOSO (University of Wisconsin - Madison)
- https://huggingface.co/docs/transformers/main/model_doc/yoso
- paper:You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling
2、Summary of the models
- model summary
- To check if each model has an implementation in Flax, PyTorch or TensorFlow, or has an associated tokenizer backed by the Tokenizers library, refer to: