Weekly Report
- Confirmed that fail of patch level rotation with 16 patches and 4 directions is caused by insufficient image resolution. Especially, the ResNet50 encoder scales the original image down 32 strides, causing trouble for the decoder to properly classify rotate directions.
- Patch level rotation with 4 patches and 2 directions (0, 180) reaches the highest performance of all pretext task tried on VOC dataset.
- The better the ResNet50 encoder performs on patch level rotation task, the worse it performs on segmentation task.
- Encoder-Decoder as base encoder for MoCo, performs badly on segmentation.
Test Report
Patch Level Rotation Pipeline
Good News



Two test are done under different settings, by accident. PLR-1p-2d stands for Patch level rotation with 1 patch (the whole image) and 2 directions (0, 180). PLR-4p-2d stands for PLR with 4 patches and 2 directions. Performance seems ok.Bad News
High Resolution + Longer training for pretext task => Lower PerformanceEncoder-Decoder MoCo Pipeline


Segmentation on VOC dataset
Encoder-Decoder model: ResNet50 + FCN8s
Optimizer: SGD, Lr = 0.01, Min_lr = 0.0001, Poly decay with power = 0.9, 20k iterations.
