Search Results for author: Curtis G. Northcutt

Found 5 papers, 4 papers with code

Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks

2 code implementations26 Mar 2021 Curtis G. Northcutt, Anish Athalye, Jonas Mueller

Errors in test sets are numerous and widespread: we estimate an average of at least 3. 3% errors across the 10 datasets, where for example label errors comprise at least 6% of the ImageNet validation set.

BIG-bench Machine Learning

Rapformer: Conditional Rap Lyrics Generation with Denoising Autoencoders

no code implementations INLG (ACL) 2020 Nikola I. Nikolov, Eric Malmi, Curtis G. Northcutt, Loreto Parisi

The ability to combine symbols to generate language is a defining characteristic of human intelligence, particularly in the context of artistic story-telling through lyrics.

Denoising Information Retrieval +1

Comment Ranking Diversification in Forum Discussions

1 code implementation27 Feb 2020 Curtis G. Northcutt, Kimberly A. Leon, Naichun Chen

We conducted a double-blind, small-scale evaluation experiment requiring subjects to select between the top 5 comments of a diversified ranking and a baseline ranking ordered by score.

Re-Ranking Semantic Similarity +1

Confident Learning: Estimating Uncertainty in Dataset Labels

2 code implementations31 Oct 2019 Curtis G. Northcutt, Lu Jiang, Isaac L. Chuang

Confident learning (CL) is an alternative approach which focuses instead on label quality by characterizing and identifying label errors in datasets, based on the principles of pruning noisy data, counting with probabilistic thresholds to estimate noise, and ranking examples to train with confidence.

Learning with noisy labels Sentiment Analysis

Learning with Confident Examples: Rank Pruning for Robust Classification with Noisy Labels

3 code implementations4 May 2017 Curtis G. Northcutt, Tailin Wu, Isaac L. Chuang

To highlight, RP with a CNN classifier can predict if an MNIST digit is a "one"or "not" with only 0. 25% error, and 0. 46 error across all digits, even when 50% of positive examples are mislabeled and 50% of observed positive labels are mislabeled negative examples.

General Classification Noise Estimation +1

Cannot find the paper you are looking for? You can Submit a new open access paper.