hidden unitsmini batch sizelayerslearning rate decay learning ratebeta(momemtum),常设为0.9 hidden unitsmini batch sizeAdam(beta1 常设为0.9, beta2 常设为0.999, epsilon 常设为10) layerslearning rate decay