PULSE

motivation

“ConvNets achieve good results when training from clean data, but learning from noisy labels significantly degrades performances and remains challenging.”
existing method:
- wrong assume: There is a single transition probability between the noisy label and ground-truth label
- using extra supervision

如上图所示，整体框架分为两部分：Training Phase & Label Correction Phase。

首先在原始数据集上粗训练一个特征提取器与分类器。随后Training Phase & Label Correction Phase迭代训练对框架进行优化。通过共享权重的网络对于随机采样的样本提取到的信息进行聚类，对每个类提取prototype，用这些作为clear-label来优化特征提取网络。

前期正常训练

Iterative Self-Learning期间通过Y（实际数据集中label）Y‘（Label Correction Phase 聚类得到的label）来优化特征提取器F

每个类别随机采样（避免数据集中单一类别数量过多）
首先根据cos相似度计算cos similarity matrix得到随机采样batch中各图片深度特征之间相似度（Euclidean distance效果差）
定义density，以此选择prototype
采用多个prototype来进行优化，定义similarity measurement防止prototype之间相似度过高导致效果提升不明显。该值表示在density小于自身的图片中的最大相似度

smaller similarity value ηi indicates that the features corresponding to the image i are not too close the other images with density ρ larger than it

sample with high-density value ρ (probability a clean label), and low similarity value η (a clean label but moderately far away from other clean labels)