Weekly Report

  1. Confirmed that fail of patch level rotation with 16 patches and 4 directions is caused by insufficient image resolution. Especially, the ResNet50 encoder scales the original image down 32 strides, causing trouble for the decoder to properly classify rotate directions.
  2. Patch level rotation with 4 patches and 2 directions (0, 180) reaches the highest performance of all pretext task tried on VOC dataset.
  3. The better the ResNet50 encoder performs on patch level rotation task, the worse it performs on segmentation task.
  4. Encoder-Decoder as base encoder for MoCo, performs badly on segmentation.

    Test Report

    Patch Level Rotation Pipeline

    Good News

    Unknown.pngUnknown.png
    Unknown.png
    Two test are done under different settings, by accident. PLR-1p-2d stands for Patch level rotation with 1 patch (the whole image) and 2 directions (0, 180). PLR-4p-2d stands for PLR with 4 patches and 2 directions. Performance seems ok.

    Bad News

    High Resolution + Longer training for pretext task => Lower Performance

    Encoder-Decoder MoCo Pipeline

    Unknown.pngUnknown.png
    Segmentation on VOC dataset
    Encoder-Decoder model: ResNet50 + FCN8s
    Optimizer: SGD, Lr = 0.01, Min_lr = 0.0001, Poly decay with power = 0.9, 20k iterations.