1 code implementation • 2 Sep 2023 • Ulyana Tkachenko, Aditya Thyagarajan, Jonas Mueller
Despite powering sensitive systems like autonomous vehicles, object detection remains fairly brittle in part due to annotation errors that plague most real-world training datasets.
no code implementations • 30 Aug 2023 • Jiuhai Chen, Jonas Mueller
We introduce BSDetector, a method for detecting bad and speculative answers from a pretrained Large Language Model by estimating a numeric confidence score for any output it generated.
1 code implementation • 11 Jul 2023 • Vedang Lad, Jonas Mueller
We study algorithms to automatically detect such annotation errors, in particular methods to score label quality, such that the images with the lowest scores are least likely to be correctly labeled.
2 code implementations • 26 May 2023 • Hang Zhou, Jonas Mueller, Mayank Kumar, Jane-Ling Wang, Jing Lei
Noise plagues many numerical datasets, where the recorded values in the data may fail to match the true underlying values due to reasons including: erroneous sensors, data entry/processing mistakes, or imperfect human estimates.
1 code implementation • 25 May 2023 • Jesse Cummings, Elías Snorrason, Jonas Mueller
We present a straightforward statistical test to detect certain violations of the assumption that the data are Independent and Identically Distributed (IID).
1 code implementation • 27 Jan 2023 • Hui Wen Goh, Jonas Mueller
It is thus common to employ multiple annotators to label data with some overlap between their examples.
2 code implementations • 25 Nov 2022 • Aditya Thyagarajan, Elías Snorrason, Curtis Northcutt, Jonas Mueller
In multi-label classification, each example in a dataset may be annotated as belonging to one or more classes (or none of the classes).
2 code implementations • 13 Oct 2022 • Hui Wen Goh, Ulyana Tkachenko, Jonas Mueller
For analyzing such data, we introduce CROWDLAB, a straightforward approach to utilize any trained classifier to estimate: (1) A consensus label for each example that aggregates the available annotations; (2) A confidence score for how likely each consensus label is correct; (3) A rating for each annotator quantifying the overall correctness of their labels.
2 code implementations • 8 Oct 2022 • Wei-Chen Wang, Jonas Mueller
Mislabeled examples are a common issue in real-world data, particularly for tasks like token classification where many labels must be chosen on a fine-grained basis.
no code implementations • 4 Oct 2022 • Rasool Fakoor, Jonas Mueller, Zachary C. Lipton, Pratik Chaudhari, Alexander J. Smola
Real-world deployment of machine learning models is challenging because data evolves over time.
1 code implementation • 20 Jul 2022 • Mark Mazumder, Colby Banbury, Xiaozhe Yao, Bojan Karlaš, William Gaviria Rojas, Sudnya Diamos, Greg Diamos, Lynn He, Alicia Parrish, Hannah Rose Kirk, Jessica Quaye, Charvi Rastogi, Douwe Kiela, David Jurado, David Kanter, Rafael Mosquera, Juan Ciro, Lora Aroyo, Bilge Acun, Lingjiao Chen, Mehul Smriti Raje, Max Bartolo, Sabri Eyuboglu, Amirata Ghorbani, Emmett Goodman, Oana Inel, Tariq Kane, Christine R. Kirkpatrick, Tzu-Sheng Kuo, Jonas Mueller, Tristan Thrush, Joaquin Vanschoren, Margaret Warren, Adina Williams, Serena Yeung, Newsha Ardalani, Praveen Paritosh, Ce Zhang, James Zou, Carole-Jean Wu, Cody Coleman, Andrew Ng, Peter Mattson, Vijay Janapa Reddi
Machine learning research has long focused on models rather than datasets, and prominent datasets are used for common ML tasks without regard to the breadth, difficulty, and faithfulness of the underlying problems.
2 code implementations • 7 Jul 2022 • Johnson Kuan, Jonas Mueller
We study simple methods for out-of-distribution (OOD) image detection that are compatible with any already trained classifier, relying on only its predictions or learned representations.
Out-of-Distribution Detection
Out of Distribution (OOD) Detection
no code implementations • 16 Jun 2022 • Jiuhai Chen, Jonas Mueller, Vassilis N. Ioannidis, Tom Goldstein, David Wipf
Graph Neural Networks (GNNs) with numerical node features and graph structure as inputs have demonstrated superior performance on various supervised learning tasks with graph data.
2 code implementations • 28 May 2022 • Massimo Caccia, Jonas Mueller, Taesup Kim, Laurent Charlin, Rasool Fakoor
We pose two hypotheses: (1) task-agnostic methods might provide advantages in settings with limited data, computation, or high dimensionality, and (2) faster adaptation may be particularly beneficial in continual learning settings, helping to mitigate the effects of catastrophic forgetting.
1 code implementation • 4 Nov 2021 • Xingjian Shi, Jonas Mueller, Nick Erickson, Mu Li, Alexander J. Smola
We consider the use of automated supervised learning systems for data tables that not only contain numeric/categorical columns, but one or more text fields as well.
1 code implementation • 26 Oct 2021 • Jiuhai Chen, Jonas Mueller, Vassilis N. Ioannidis, Soji Adeshina, Yangkun Wang, Tom Goldstein, David Wipf
For supervised learning with tabular data, decision tree ensembles produced via boosting techniques generally dominate real-world applications involving iid training/test sets.
no code implementations • ICLR 2022 • Jiuhai Chen, Jonas Mueller, Vassilis N. Ioannidis, Soji Adeshina, Yangkun Wang, Tom Goldstein, David Wipf
Many practical modeling tasks require making predictions using tabular data composed of heterogeneous feature types (e. g., text-based, categorical, continuous, etc.).
no code implementations • EMNLP (sustainlp) 2021 • Haoyu He, Xingjian Shi, Jonas Mueller, Zha Sheng, Mu Li, George Karypis
We aim to identify how different components in the KD pipeline affect the resulting performance and how much the optimal KD pipeline varies across different datasets/tasks, such as the data augmentation policy, the loss function, and the intermediate representation for transferring the knowledge between teacher and student.
2 code implementations • 19 Jun 2021 • Junwen Yao, Jonas Mueller, Jane-Ling Wang
Despite their widespread success, the application of deep neural networks to functional data remains scarce today.
2 code implementations • ICML Workshop AutoML 2021 • Xingjian Shi, Jonas Mueller, Nick Erickson, Mu Li, Alex Smola
We design automated supervised learning systems for data tables that not only contain numeric/categorical columns, but text fields as well.
2 code implementations • 26 Mar 2021 • Curtis G. Northcutt, Anish Athalye, Jonas Mueller
Errors in test sets are numerous and widespread: we estimate an average of at least 3. 3% errors across the 10 datasets, where for example label errors comprise at least 6% of the ImageNet validation set.
1 code implementation • 26 Feb 2021 • Rasool Fakoor, Taesup Kim, Jonas Mueller, Alexander J. Smola, Ryan J. Tibshirani
Quantile regression is a fundamental problem in statistical learning motivated by a need to quantify uncertainty in predictions, or to model a diverse population without being overly reductive.
1 code implementation • NeurIPS 2021 • Rasool Fakoor, Jonas Mueller, Kavosh Asadi, Pratik Chaudhari, Alexander J. Smola
Reliant on too many experiments to learn good actions, current Reinforcement Learning (RL) algorithms have limited applicability in real-world settings, which can be too expensive to allow exploration.
no code implementations • 1 Jan 2021 • Rasool Fakoor, Pratik Anil Chaudhari, Jonas Mueller, Alex Smola
We present TraDE, a self-attention-based architecture for auto-regressive density estimation with continuous and discrete valued data.
1 code implementation • NeurIPS 2020 • Rasool Fakoor, Jonas Mueller, Nick Erickson, Pratik Chaudhari, Alexander J. Smola
Automated machine learning (AutoML) can produce complex model ensembles by stacking, bagging, and boosting many individual models like trees, deep networks, and nearest neighbor estimators.
32 code implementations • 19 Apr 2020 • Hang Zhang, Chongruo wu, Zhongyue Zhang, Yi Zhu, Haibin Lin, Zhi Zhang, Yue Sun, Tong He, Jonas Mueller, R. Manmatha, Mu Li, Alexander Smola
It is well known that featuremap attention and multi-path representation are important for visual recognition.
Ranked #8 on
Instance Segmentation
on COCO test-dev
(APM metric)
no code implementations • 6 Apr 2020 • Rasool Fakoor, Pratik Chaudhari, Jonas Mueller, Alexander J. Smola
We present TraDE, a self-attention-based architecture for auto-regressive density estimation with continuous and discrete valued data.
2 code implementations • NeurIPS 2021 • Brandon Carter, Siddhartha Jain, Jonas Mueller, David Gifford
Here, we demonstrate that neural networks trained on CIFAR-10 and ImageNet suffer from overinterpretation, and we find models on CIFAR-10 make confident predictions even when 95% of input images are masked and humans cannot discern salient features in the remaining pixel-subsets.
8 code implementations • 13 Mar 2020 • Nick Erickson, Jonas Mueller, Alexander Shirkov, Hang Zhang, Pedro Larroy, Mu Li, Alexander Smola
We introduce AutoGluon-Tabular, an open-source AutoML framework that requires only a single line of Python to train highly accurate machine learning models on an unprocessed tabular dataset such as a CSV file.
no code implementations • 25 Sep 2019 • Tianxiao Shen, Jonas Mueller, Regina Barzilay, Tommi Jaakkola
Neural language models have recently shown impressive gains in unconditional text generation, but controllable generation and manipulation of text remain challenging.
no code implementations • 11 Sep 2019 • Jonas Mueller, Alex Smola
A key obstacle in automated analytics and meta-learning is the inability to recognize when different datasets contain measurements of the same variable.
no code implementations • 18 Jun 2019 • Siddhartha Jain, Ge Liu, Jonas Mueller, David Gifford
The inaccuracy of neural network models on inputs that do not stem from the training data distribution is both problematic and at times unrecognized.
3 code implementations • ICML 2020 • Tianxiao Shen, Jonas Mueller, Regina Barzilay, Tommi Jaakkola
We prove that this simple modification guides the latent space geometry of the resulting model by encouraging the encoder to map similar texts to similar latent representations.
3 code implementations • IJCNLP 2019 • Zhijing Jin, Di Jin, Jonas Mueller, Nicholas Matthews, Enrico Santus
Text attribute transfer aims to automatically rewrite sentences such that they possess certain linguistic attributes, while simultaneously preserving their semantic content.
1 code implementation • 9 Oct 2018 • Brandon Carter, Jonas Mueller, Siddhartha Jain, David Gifford
Local explanation frameworks aim to rationalize particular decisions made by a black-box prediction model.
1 code implementation • NeurIPS 2019 • Jonas Mueller, Vasilis Syrgkanis, Matt Taddy
We consider dynamic pricing with many products under an evolving but low-dimensional demand model.
no code implementations • ICML 2017 • Jonas Mueller, David Gifford, Tommi Jaakkola
Under this model, gradient methods can be used to efficiently optimize the continuous latent factors with respect to inferred outcomes.
no code implementations • 16 Jun 2016 • Jonas Mueller, David N. Reshef, George Du, Tommi Jaakkola
Assuming the underlying relationship remains invariant under intervention, we develop efficient algorithms to identify the optimal intervention policy from limited data and provide theoretical guarantees for our approach in a Gaussian Process setting.
no code implementations • NeurIPS 2015 • Jonas Mueller, Tommi Jaakkola
We introduce principal differences analysis (PDA) for analyzing differences between high-dimensional distributions.