no code implementations • 22 Dec 2024 • Yewon Byun, Sanket Vaibhav Mehta, Saurabh Garg, Emma Strubell, Michael Oberst, Bryan Wilder, Zachary C. Lipton
In this paper, we propose Generate to Discriminate (G2D), a domain-incremental continual learning method that leverages synthetic data to train a domain-discriminator that routes samples at inference time to the appropriate expert.
1 code implementation • 13 Nov 2024 • Daniel P. Jeong, Pranav Mani, Saurabh Garg, Zachary C. Lipton, Michael Oberst
Several recent works seek to develop foundation models specifically for medical applications, adapting general-purpose large language models (LLMs) and vision-language models (VLMs) via continued pretraining on publicly available biomedical corpora.
1 code implementation • 6 Nov 2024 • Daniel P. Jeong, Saurabh Garg, Zachary C. Lipton, Michael Oberst
For instance, across the tasks and model pairs we consider in the 3-shot setting, medical LLMs only outperform their base models in 12. 1% of cases, reach a (statistical) tie in 49. 8% of cases, and are significantly worse than their base models in the remaining 38. 2% of cases.
1 code implementation • 9 Oct 2024 • Pravesh Agrawal, Szymon Antoniak, Emma Bou Hanna, Baptiste Bout, Devendra Chaplot, Jessica Chudnovsky, Diogo Costa, Baudouin De Monicault, Saurabh Garg, Theophile Gervet, Soham Ghosh, Amélie Héliou, Paul Jacob, Albert Q. Jiang, Kartik Khandelwal, Timothée Lacroix, Guillaume Lample, Diego Las Casas, Thibaut Lavril, Teven Le Scao, Andy Lo, William Marshall, Louis Martin, Arthur Mensch, Pavankumar Muddireddy, Valera Nemychnikova, Marie Pellat, Patrick von Platen, Nikhil Raghuraman, Baptiste Rozière, Alexandre Sablayrolles, Lucile Saulnier, Romain Sauvestre, Wendy Shang, Roman Soletskyi, Lawrence Stewart, Pierre Stock, Joachim Studnia, Sandeep Subramanian, Sagar Vaze, Thomas Wang, Sophia Yang
Unlike many open-source models, Pixtral is also a cutting-edge text model for its size, and does not compromise on natural language performance to excel in multimodal tasks.
1 code implementation • 20 Jun 2024 • Amrith Setlur, Saurabh Garg, Xinyang Geng, Naman Garg, Virginia Smith, Aviral Kumar
With this per-step scheme, we are able to attain consistent gains over only positive data, attaining performance similar to amplifying the amount of synthetic data by $\mathbf{8 \times}$.
2 code implementations • 17 Jun 2024 • Jeffrey Li, Alex Fang, Georgios Smyrnis, Maor Ivgi, Matt Jordan, Samir Gadre, Hritik Bansal, Etash Guha, Sedrick Keh, Kushal Arora, Saurabh Garg, Rui Xin, Niklas Muennighoff, Reinhard Heckel, Jean Mercat, Mayee Chen, Suchin Gururangan, Mitchell Wortsman, Alon Albalak, Yonatan Bitton, Marianna Nezhurina, Amro Abbas, Cheng-Yu Hsieh, Dhruba Ghosh, Josh Gardner, Maciej Kilian, HANLIN ZHANG, Rulin Shao, Sarah Pratt, Sunny Sanyal, Gabriel Ilharco, Giannis Daras, Kalyani Marathe, Aaron Gokaslan, Jieyu Zhang, Khyathi Chandu, Thao Nguyen, Igor Vasiljevic, Sham Kakade, Shuran Song, Sujay Sanghavi, Fartash Faghri, Sewoong Oh, Luke Zettlemoyer, Kyle Lo, Alaaeldin El-Nouby, Hadi Pouransari, Alexander Toshev, Stephanie Wang, Dirk Groeneveld, Luca Soldaini, Pang Wei Koh, Jenia Jitsev, Thomas Kollar, Alexandros G. Dimakis, Yair Carmon, Achal Dave, Ludwig Schmidt, Vaishaal Shankar
We introduce DataComp for Language Models (DCLM), a testbed for controlled dataset experiments with the goal of improving language models.
1 code implementation • 11 Apr 2024 • Rishabh Ranjan, Saurabh Garg, Mrigank Raman, Carlos Guestrin, Zachary Lipton
Trained models are often composed with post-hoc transforms such as temperature scaling (TS), ensembling and stochastic weight averaging (SWA) to improve performance, robustness, uncertainty estimation, etc.
no code implementations • NeurIPS 2023 • Saurabh Garg, Amrith Setlur, Zachary Chase Lipton, Sivaraman Balakrishnan, Virginia Smith, aditi raghunathan
Self-training and contrastive learning have emerged as leading techniques for incorporating unlabeled data, both under distribution shift (unsupervised domain adaptation) and when it is absent (semi-supervised learning).
1 code implementation • 25 Oct 2023 • Jianping Yao, Son N. Tran, Saurabh Garg, Samantha Sawyer
Furthermore, based on the survey of existing approaches in plant pathology and the study of available approaches in machine learning, we propose a new model named Generalised Stacking Multi-output CNN (GSMo-CNN).
1 code implementation • 24 Oct 2023 • Saurabh Garg, Mehrdad Farajtabar, Hadi Pouransari, Raviteja Vemulapalli, Sachin Mehta, Oncel Tuzel, Vaishaal Shankar, Fartash Faghri
We introduce the first set of web-scale Time-Continual (TiC) benchmarks for training vision-language models: TiC-DataComp, TiC-YFCC, and TiC-Redcaps.
no code implementations • 19 Oct 2023 • Jianping Yao, Son N. Tran, Samantha Sawyer, Saurabh Garg
Therefore, it is enormously beneficial for researchers, engineers, managers, and entrepreneurs to have a comprehensive view about the recent development of machine learning technologies and applications for leaf disease detection.
1 code implementation • 6 Feb 2023 • Saurabh Garg, Nick Erickson, James Sharpnack, Alex Smola, Sivaraman Balakrishnan, Zachary C. Lipton
Despite the emergence of principled methods for domain adaptation under label shift, their sensitivity to shifts in class conditional distributions is precariously under explored.
1 code implementation • 6 Feb 2023 • Zachary Novack, Julian McAuley, Zachary C. Lipton, Saurabh Garg
Open vocabulary models (e. g.
no code implementations • 18 Jan 2023 • Renjie Li, Chun Yu Lao, Rebecca St. George, Katherine Lawler, Saurabh Garg, Son N. Tran, Quan Bai, Jane Alty
RMT and a range of DLC models were applied to the video data with tapping frequencies up to 8Hz to extract movement features.
1 code implementation • 29 Nov 2022 • Zachary Novack, Simran Kaur, Tanya Marwah, Saurabh Garg, Zachary C. Lipton
A number of competing hypotheses have been proposed to explain why small-batch Stochastic Gradient Descent (SGD)leads to improved generalization over the full-batch regime, with recent work crediting the implicit regularization of various quantities throughout training.
1 code implementation • 26 Oct 2022 • Pratyush Maini, Saurabh Garg, Zachary C. Lipton, J. Zico Kolter
Popular metrics derived from these dynamics include (i) the epoch at which examples are first correctly classified; (ii) the number of times their predictions flip during training; and (iii) whether their prediction flips if they are held out.
1 code implementation • 28 Sep 2022 • Kundan Krishna, Saurabh Garg, Jeffrey P. Bigham, Zachary C. Lipton
In experiments addressing both ELECTRA and RoBERTa models and 10 distinct downstream classification datasets, we observe that self-pretraining rivals standard pretraining on the BookWiki corpus (despite using around $10\times$--$500\times$ less data), outperforming the latter on $7$ and $5$ datasets, respectively.
1 code implementation • 26 Jul 2022 • Saurabh Garg, Sivaraman Balakrishnan, Zachary C. Lipton
We introduce the problem of domain adaptation under Open Set Label Shift (OSLS) where the label distribution can change arbitrarily and a new class may arrive during deployment, but the class-conditional distributions p(x|y) are domain-invariant.
2 code implementations • 26 Jul 2022 • Manley Roberts, Pranav Mani, Saurabh Garg, Zachary C. Lipton
Thus motivated, we introduce a practical algorithm that leverages domain-discriminative models as follows: (i) push examples through domain discriminator $p(d|\mathbf{x})$; (ii) discretize the data by clustering examples in $p(d|\mathbf{x})$ space; (iii) perform non-negative matrix factorization on the discrete data; (iv) combine the recovered $p(y|d)$ with the discriminator outputs $p(d|\mathbf{x})$ to compute $p_d(y|x) \; \forall d$.
no code implementations • 6 Jul 2022 • Renjie Li, Xinyi Wang, Guan Huang, Wenli Yang, Kaining Zhang, Xiaotong Gu, Son N. Tran, Saurabh Garg, Jane Alty, Quan Bai
Deep supervision, or known as 'intermediate supervision' or 'auxiliary supervision', is to add supervision at hidden layers of a neural network.
1 code implementation • 20 Feb 2022 • Gal Kaplun, Nikhil Ghosh, Saurabh Garg, Boaz Barak, Preetum Nakkiran
In this work, we propose a new approach: we measure the performance of a collection of models when evaluated on a $\textit{single input point}$.
1 code implementation • ICLR 2022 • Saurabh Garg, Sivaraman Balakrishnan, Zachary C. Lipton, Behnam Neyshabur, Hanie Sedghi
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions that may cause performance drops.
no code implementations • 19 Dec 2021 • Renjie Li, Son Tran, Saurabh Garg, Katherine Lawler, Jane Alty, Quan Bai
Keypoint detection plays an important role in a wide range of applications.
2 code implementations • NeurIPS 2021 • Saurabh Garg, Yifan Wu, Alex Smola, Sivaraman Balakrishnan, Zachary C. Lipton
Formally, this task is broken down into two subtasks: (i) Mixture Proportion Estimation (MPE) -- determining the fraction of positive examples in the unlabeled data; and (ii) PU-learning -- given such an estimate, learning the desired positive-versus-negative classifier.
1 code implementation • NeurIPS 2021 • Saurabh Garg, Yifan Wu, Alex Smola, Sivaraman Balakrishnan, Zachary Chase Lipton
Formally, this task is broken down into two subtasks: (i) Mixture Proportion Estimation (MPE)---determining the fraction of positive examples in the unlabeled data; and (ii) PU-learning---given such an estimate, learning the desired positive-versus-negative classifier.
1 code implementation • 1 May 2021 • Saurabh Garg, Sivaraman Balakrishnan, J. Zico Kolter, Zachary C. Lipton
To assess generalization, machine learning scientists typically either (i) bound the generalization gap and then (after training) plug in the empirical risk to obtain a bound on the true risk; or (ii) validate empirically on holdout data.
no code implementations • 29 Apr 2021 • Renjie Li, Xinyi Wang, Katherine Lawler, Saurabh Garg, Quan Bai, Jane Alty
With populations ageing, the number of people with dementia worldwide is expected to triple to 152 million by 2050.
no code implementations • 20 Feb 2021 • Saurabh Garg, Joshua Zhanson, Emilio Parisotto, Adarsh Prasad, J. Zico Kolter, Zachary C. Lipton, Sivaraman Balakrishnan, Ruslan Salakhutdinov, Pradeep Ravikumar
In this paper, we present a detailed empirical study to characterize the heavy-tailed nature of the gradients of the PPO surrogate reward function.
no code implementations • NeurIPS 2020 • Saurabh Garg, Yifan Wu, Sivaraman Balakrishnan, Zachary C. Lipton
Our contributions include (i) consistency conditions for MLLS, which include calibration of the classifier and a confusion matrix invertibility condition that BBSE also requires; (ii) a unified framework, casting BBSE as roughly equivalent to MLLS for a particular choice of calibration method; and (iii) a decomposition of MLLS's finite-sample error into terms reflecting miscalibration and estimation error.
no code implementations • EMNLP 2018 • Saurabh Garg, Tanmay Parekh, Preethi Jyothi
This work focuses on building language models (LMs) for code-switched text.
no code implementations • 3 Nov 2017 • Saurabh Garg, Tanmay Parekh, Preethi Jyothi
Since code-switching is a blend of two or more different languages, a standard bilingual language model can be improved upon by using structures of the monolingual language models.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3