Label Noise in Context

ACL 2020 · Michael Desmond, Catherine Finegan-Dollak, Jeff Boston, Matt Arnold ·

Label noise{---}incorrectly or ambiguously labeled training examples{---}can negatively impact model performance. Although noise detection techniques have been around for decades, practitioners rarely apply them, as manual noise remediation is a tedious process. Examples incorrectly flagged as noise waste reviewers{'} time, and correcting label noise without guidance can be difficult. We propose LNIC, a noise-detection method that uses an example{'}s neighborhood within the training set to (a) reduce false positives and (b) provide an explanation as to why the ex- ample was flagged as noise. We demonstrate on several short-text classification datasets that LNIC outperforms the state of the art on measures of precision and F0.5-score. We also show how LNIC{'}s training set context helps a reviewer to understand and correct label noise in a dataset. The LNIC tool lowers the barriers to label noise remediation, increasing its utility for NLP practitioners.

PDF Abstract