Regression(回归)Overfitting(过拟合)local minima、saddle pointBatch、MomentumAdaptive Learning RateBatch Normalization(归一化)ClassificationCNNRNNSelf-attentionTransformerBERT