SGD
SGDM
SGD with momentum
starting at , and
momentum相当于对之前每个时间步的gradiant做了权重为的衰减累加
Adagrad
%7D%5E2%7D%0A#card=math&code=%5Csigmai%5Et%3D%5Csqrt%7B%5Cfrac%7B1%7D%7Bt%2B1%7D%5CSigma%7Bi%3D0%7D%5Et%7B%28g_i%5Et%29%7D%5E2%7D%0A&id=byVOT)
learning rate adapts dynamically
RMSProp
%5E2%7D%0A#card=math&code=%5Csigma_i%5E0%3D%5Csqrt%7B%28g_i%5E0%29%5E2%7D%0A&id=hZeOP)
Adam
RMSProp+Momentum
配合Adam论文修改了部分记号
SGDM
g%7Bt-1%7D%0A#card=math&code=m_t%3D%5Cbeta_1m%7Bt-1%7D%2B%281-%5Cbeta1%29g%7Bt-1%7D%0A&id=PVsXE)
RMSProp
g%7Bt-1%7D%5E2%0A#card=math&code=v_t%3D%5Cbeta_2v%7Bt-1%7D%2B%281-%5Cbeta2%29g%7Bt-1%7D%5E2%0A&id=iImez)
Adam
and
means
and
to the power
.
Good default settings

