Larry Site

Bachelor of Science, Intelligence Science and Technology Sun Yat-Sen University

MoCo v1

self-supervised

DL paper

Publish Date: 2020-04-21

Author: Liangyu Cui

Word Count: 672

Read Times: 2 Min

motivation

NLP中常用的SSL往往是由于NLP中常使用离散变量（单词），而CV中多数为高维连续变量
contrastive learning作为字典查询角度需要构建较大的字典

contribution

提出MoCo构建用于contrastive learning的large and consistent dictionaries
purpose：pre-train representations (i.e., features) that can be transferred to downstream tasks by fine-tuning
适用于real-world, billionimage scale, and relatively uncurated scenario

method

Contrastive Learning as Dictionary Look-up

InfoNCE（contrastive loss function） t为温度变量
a (K+1)-way softmax-based classifier that tries to classify q as k+

Momentum Contrast

动态字典
key encoder在训练过程中不断进化
假设：good features can be learned by a large dictionary that covers a rich set of negative samples, while the encoder for the dictionary keys is kept as consistent as possible despite its evolution

Dictionary as a queue

At the core of our approach is maintaining the dictionary as a queue of data samples
字典可以大于mini-batch size，可以作为超参数进行调节
每次更新mini batch个样本，FIFO

Momentum update

使用队列可以扩充字典的大小，但是对key encoder进行反向传播变得更难
将query encoder 直接复制给key encoder ，这样快速地改变key encoder会破坏键值表示的一致性，不够稳定
队列保存特征图信息，而非原始图像信息
momentum update
实验中较大的m（0.999）取得较好的结果，

slowly evolving key encoder is a core to making use of a queue

Relations to previous mechanisms

end2end
- 字典大小和mini-batch大小相同，受限于GPU显存
- 对大的mini-batch进行优化也是挑战
memory bank
- 包含数据集所有数据的特征表示，从Memory Bank中采样数据不需要进行反向传播，所以能支持比较大的字典
- 一个样本的特征表示只在它出现时才在Memory Bank更新，因此具有更少的一致性
- 更新只是进行特征表示的更新，不涉及encoder

Pretext Task

2 random view ：positive pair
queries & keys独立编码

Technical details

ResNet
output：128-D
output：L2-norm
random color jittering, random horizontal flip, and random grayscale conversion

Shuffling BN

BN会导致模型难以学习到有效的representation
The model appears to “cheat” the pretext task and easily finds a low-loss solution. This is possibly because the intra-batch communication among samples (caused by BN) leaks information

模型每个batch的内的样本之间communication 泄露了信息导致的
shuffling BN：每个GPU分开做BN

对于键值编码器fk，在当前mini-batch中打乱样本的顺序，再把它们送到GPU上分别进行BN，然后再恢复样本的顺序；

对于查询编码器fq，不改变样本的顺序。

这能够保证用于计算查询和其正键值的批统计值出自两个不同的子集

experiment

Linear Classification Protocol

contrastive loss mechanisms

momentum

Comparison with previous results

Reprint policy

《MoCo v1》 by Liangyu Cui is licensed under a Creative Commons Attribution 4.0 International License

Previous

SSGAN CVPR2019

Self-Supervised GANs via Auxiliary Rotation Loss

2020-04-22 DL paper

GAN self-supervised

Next

SimCLR

A Simple Framework for Contrastive Learning of Visual Representations

2020-04-20 DL paper

self-supervised