SGD

Optimizer - 图1

Optimizer - 图2

SGDM

SGD with momentum

Optimizer - 图3

Optimizer - 图4

starting at Optimizer - 图5, and Optimizer - 图6
Optimizer - 图7
momentum相当于对之前每个时间步的gradiant做了权重为Optimizer - 图8的衰减累加

Adagrad

Optimizer - 图9

Optimizer - 图10%7D%5E2%7D%0A#card=math&code=%5Csigmai%5Et%3D%5Csqrt%7B%5Cfrac%7B1%7D%7Bt%2B1%7D%5CSigma%7Bi%3D0%7D%5Et%7B%28g_i%5Et%29%7D%5E2%7D%0A&id=byVOT)

learning rate adapts dynamically

RMSProp

Optimizer - 图11

Optimizer - 图12

Optimizer - 图13%5E2%7D%0A#card=math&code=%5Csigma_i%5E0%3D%5Csqrt%7B%28g_i%5E0%29%5E2%7D%0A&id=hZeOP)

Adam

RMSProp+Momentum
配合Adam论文修改了部分记号

SGDM

Optimizer - 图14

Optimizer - 图15g%7Bt-1%7D%0A#card=math&code=m_t%3D%5Cbeta_1m%7Bt-1%7D%2B%281-%5Cbeta1%29g%7Bt-1%7D%0A&id=PVsXE)

RMSProp

Optimizer - 图16

Optimizer - 图17

Optimizer - 图18g%7Bt-1%7D%5E2%0A#card=math&code=v_t%3D%5Cbeta_2v%7Bt-1%7D%2B%281-%5Cbeta2%29g%7Bt-1%7D%5E2%0A&id=iImez)

Adam

Optimizer - 图19

Optimizer - 图20

Optimizer - 图21

Optimizer - 图22 and Optimizer - 图23 means Optimizer - 图24 and Optimizer - 图25 to the power Optimizer - 图26.
Good default settings

Optimizer - 图27

Adam.JPG