SimCLR v2

Method

SimCLR v2

  • deeper but less wider network:
    • 152-layer ResNet with 3× wider channels
    • selective kernels (SK) ,channel-wise attention mechanism
    • obtain a 29% relative improvement in top-1 accuracy when fine-tuned on 1% of labeled examples
  • deep projection (non-linear):
    • instead of throwing away g(·) entirely after pretraining as in SimCLR, we fine-tune from a middle layer
    • use a 3-layer projection head and fine-tuning from the 1st layer of projection head
    • 14% relative improvement in top-1 accuracy when fine-tuned on 1% of labeled examples
  • memory mechanism (MoCo):
    • designates a memory network (with a moving average of weights for stabilization) whose output will be buffered as negative examples
    • yields an improvement of ∼1% for linear evaluation as well as when fine-tuning on 1% of labeled examples

Fine-tuning

  • incorporate part of the MLP projection head into the base encoder during the fine-tuning

Self-training / knowledge distillation via unlabeled examples

  • mean-teacher model:

    • use the fine-tuned network as a teacher to impute labels for training a student network

    • loss:

      t: scalar temperature parameter, balance the distribution


   Reprint policy


《SimCLR v2》 by Liangyu Cui is licensed under a Creative Commons Attribution 4.0 International License
  TOC