Weekly Summary
- MMsegmentation uses its own pretrained ResNet model as backbone for Encoder-Decoder structure then conduct further finetuning on downstream segmentation tasks on different datasets. Since its training pipelines are also based on pretraining and finetuning, and its structure are mostly Encoder-Decoder like, we only need to find the best decoder head and run sample tests on a different ImageNet-supervised-pretrained ResNet50 and MoCo-pretrained ResNet50 and compare its performance. PSPnet seems to be the front runner. Test reports are shown below.
- From my observation, it seems that simpler decoder heads are more easier to yield better results. A possible explanation could still be that the complex model though being more capable of higher performance, do nevertheless contain certain obscurity which makes them harder to train.
Test Report (mIoU)
| | moco-FCN | vanilla-FCN | PSPnet | Unet | semantic-FPN | | —- | —- | —- | —- | —- | —- | | mmseg-20k | 72.96% | 67.08% | 74.32% | - | 70.25% | | moco-30k | 72.65% | 66.27% | 70.32% | 69.92% | 69.53% | | mmseg-40k | 73.84% | 66.97% | 76.12% | 70.96% | 70.25% | | moco-40k | 70.68% | 64.91% | 70.03% | 68.96% | 68.33% |
mmseg- means that the pretrained ResNet50 model is from supervised ImageNet pretraining provided by mmsegmentation.moco- means that the pretrained ResNet50 model is from MoCo pretraining provided by MoCo.
20k and 40k is the default schedule provided by mmsegmentation with larger learning rate.30k is the schedule used by MoCo when they were finetuning on VOC2012 segmentation dataset as described in their paper, only with learning rate scheduler turned from step to linear.
Ideas about VAE
Enc->Mean Vec&Deviation Vec
Mean Vec -> instance push closer
Deviation Vec -> used to predict the augmentation done to the original image
Mean&Deviation -> restore image
