Weekly Summary
- Reviewed and revised MoCo code then create a simple framework to test different pretext tasks.
- Finish patch-level rotation augmentation code.
- Run some testing patch-level rotation self supervised training, the result is not promising. Particularly, when the number of patches increases, the loss of this pretext task stuck and stop changing at a very early stage, or even before the first epoch is finished in some cases .
- Run MoCo pretraining on VOC segmentation dataset and tested the pretrained ResNet50 model using previous segmentation training pipeline. Though the performance is significantly worse than ImageNet MoCo pretraining, let alone ImageNet supervised pretraining, the reported mIoU is better than a randomly initialized ResNet50 backbone in the same setting, which in turn, shows that first, VOC dataset is generally fine for self supervised learning, and second, self supervised learning on small datasets can be helpful, only to a limited extent.
Test Results



Tests are done in the setting that we run Encoder-Decoder finetuning on VOCaug segmentation dataset with ResNet50 pretrained in different manner. MoCo-ImageNet means MoCo pretraining on ImageNet dataset, MoCo-VOCaug means MoCo pretraining on VOCaug dataset, RandomInit means randomly initialize resnet50.
Results shows that MoCo-style contrastive learning relies on large dataset. Applying MoCo on small dataset like VOCaug (which is about 1/100 the size of ImageNet) is not an appealing solution.



Same as before, tests are done in the setting that we run Encoder-Decoder finetuning on VOCaug segmentation dataset with ResNet50 pretrained in different manner. MoCo-VOCaug again, means MoCo pretraining on VOCaug dataset, additionally, PLR-VOCaug means PatchLevelRotation on VOCaug dataset. For now we are rotating the whole image, since 16 patches failed and more to be tested to determine a good number of patches.
