PULSE
motivation
- “ConvNets achieve good results when training from clean data, but learning from noisy labels significantly degrades performances and remains challenging.”
- existing method:
- wrong assume: There is a single transition probability between the noisy label and ground-truth label
- using extra supervision
method
如上图所示,整体框架分为两部分:Training Phase & Label Correction Phase。
首先在原始数据集上粗训练一个特征提取器与分类器。随后Training Phase & Label Correction Phase迭代训练对框架进行优化。通过共享权重的网络对于随机采样的样本提取到的信息进行聚类,对每个类提取prototype,用这些作为clear-label来优化特征提取网络。
Training Phase
前期正常训练
Iterative Self-Learning期间通过Y(实际数据集中label)Y‘(Label Correction Phase 聚类得到的label)来优化特征提取器F
Label Correction Phase
每个类别随机采样(避免数据集中单一类别数量过多)
首先根据cos相似度计算cos similarity matrix得到随机采样batch中各图片深度特征之间相似度(Euclidean distance效果差)
定义density,以此选择prototype
采用多个prototype来进行优化,定义similarity measurement防止prototype之间相似度过高导致效果提升不明显。该值表示在density小于自身的图片中的最大相似度
smaller similarity value ηi indicates that the features corresponding to the image i are not too close the other images with density ρ larger than it
sample with high-density value ρ (probability a clean label), and low similarity value η (a clean label but moderately far away from other clean labels)