Deep Self-Learning From Noisy Labels

PULSE

motivation

  • “ConvNets achieve good results when training from clean data, but learning from noisy labels significantly degrades performances and remains challenging.”
  • existing method:
    • wrong assume: There is a single transition probability between the noisy label and ground-truth label
    • using extra supervision

method

如上图所示,整体框架分为两部分:Training Phase & Label Correction Phase。

首先在原始数据集上粗训练一个特征提取器与分类器。随后Training Phase & Label Correction Phase迭代训练对框架进行优化。通过共享权重的网络对于随机采样的样本提取到的信息进行聚类,对每个类提取prototype,用这些作为clear-label来优化特征提取网络。

Training Phase

前期正常训练

Iterative Self-Learning期间通过Y(实际数据集中label)Y‘(Label Correction Phase 聚类得到的label)来优化特征提取器F

Label Correction Phase

  1. 每个类别随机采样(避免数据集中单一类别数量过多)

  2. 首先根据cos相似度计算cos similarity matrix得到随机采样batch中各图片深度特征之间相似度(Euclidean distance效果差)

  3. 定义density,以此选择prototype

  4. 采用多个prototype来进行优化,定义similarity measurement防止prototype之间相似度过高导致效果提升不明显。该值表示在density小于自身的图片中的最大相似度

    smaller similarity value ηi indicates that the features corresponding to the image i are not too close the other images with density ρ larger than it

    sample with high-density value ρ (probability a clean label), and low similarity value η (a clean label but moderately far away from other clean labels)

Iterative Self-Learning


   Reprint policy


《Deep Self-Learning From Noisy Labels》 by Liangyu Cui is licensed under a Creative Commons Attribution 4.0 International License
  TOC