
- 论文:https://arxiv.org/pdf/1901.09335.pdf
- 代码:https://github.com/eladhoffer/convNet.pytorch(不太好阅读)
In this work, we have introduced “Batch Augmentation”(BA), a simple yet effective method to improve generalization performance of deep networks by training with large batches composed of multiple transforms of each sample. We have demonstrated significant improvements on various datasets and models, with both faster convergence per epoch, as well as better final validation accuracy. We suggest a theoretical analysis to explain the advantageof BA over traditional large batch methods. We also show that batch augmentation causes a decrease in gradient variance throughout the training, which is then reflected in the gradient’s l2 norm used in each optimization step. This may be used in the future to search and adapt more suitable training hyper-parameters, enabling faster convergence and even better performance. Recent hardware developments allowed the community to use larger batches without increasing the wall clock time either by using data parallelism or by leveraging more advanced hardware. However, several papers claimed that working with large batch results in accuracy degradation (Revisiting small batchtraining for deep neural networks, 2018; On the computational inefficiency of large batch sizesfor stochastic gradient descent, 2019). Here we argue that by using multiple instances of the same sample we can leverage the larger batch capability to increase accuracy. These findings give another reason to prefer training settings utilizing significantly larger batches than those advocated in the past.
