[toc]
Models

Dataset

Token
utterance,发音的最小单元
phoneme,音素,和lexicon一起使用
grapheme

word,
morpheme

bytes,直接用utf-8编码,language independent

Feature Extraction

Down Sampling
Pyramid RNN

TDNN
实质是CNN
限制self-attention的距离

Attention
z0是关键字,α由z0和ht得到

RNN-T with language model

