We also show that MILe is effective reducing label noise, achieving state-of-the-art performance on real-world large-scale noisy data such as WebVision.
Ranked #3 on Image Classification on WebVision-1000
This ambiguity biases models towards a single prediction, which could result in the suppression of classes that tend to co-occur in the data.
At each time step, the teacher is created by copying the student agent, before being finetuned to maximize task completion.
The proposed model allows minor variations in content across frames while maintaining the temporal dependence through latent vectors encoding the pose or motion features.
In many environments only a tiny subset of all states yield high reward.