Attention Map

Cosine similarity

Global

Cosine similarity (should have high value in areas except background, as t-SNE visualization reveals that background labeled feature vectors still sparsely distributed in supervise-trained representation space) as attention heat map, calculated between

global average pooling of feature map, size [1, channel]
feature map, size [(height, width), channel]

Supervised Learning

MoCo

PLMoCo

PLMoCoPlus (MoCo+PLMoCo)

PLSiam

Local
Cosine similarity as attention heat map, calculated between
selected point on feature map, size [1, channel] (brightest dot on attention map)
feature map, size [(height, width), channel]
Supervised Learning

MoCo
非常明显的dilated convolution

PLMoCo

PLMoCoPlus

PLSiam

P2Vec 🙁

PRCL 🤷‍♂️

Recent Progress in SSL

ViT — an alternative to ResNet, a step closer to NLP

Screen Shot 2021-05-05 at 8.23.10 AM.png
Money is all you need

MoCo-v3

Screen Shot 2021-05-05 at 8.40.56 AM.png
Screen Shot 2021-05-05 at 8.42.00 AM.png
大胆猜测：gradient问题由first layer出现，逐渐引入后面层，然后instability，于是把projection固定成了random

Screen Shot 2021-05-05 at 8.44.11 AM.png
然后就好多了。。。
算是把contrastive learning与vit结合的头一篇吧

DINO

self-DIstillation with NO labels
Screen Shot 2021-05-05 at 8.44.53 AM.png
loss跟InfoNCE有些不同，key feature直接作为了label，而不是先cos sim再给定0/1的label，整体还是cross entropy
Screen Shot 2021-05-05 at 8.22.44 AM.png Screen Shot 2021-05-05 at 8.56.26 AM.png
another day, another state of the art. 😄