Search Results for author: Saurabh Garg

Found 31 papers, 20 papers with code

Generate to Discriminate: Expert Routing for Continual Learning

no code implementations22 Dec 2024 Yewon Byun, Sanket Vaibhav Mehta, Saurabh Garg, Emma Strubell, Michael Oberst, Bryan Wilder, Zachary C. Lipton

In this paper, we propose Generate to Discriminate (G2D), a domain-incremental continual learning method that leverages synthetic data to train a domain-discriminator that routes samples at inference time to the appropriate expert.

Continual Learning Incremental Learning

The Limited Impact of Medical Adaptation of Large Language and Vision-Language Models

1 code implementation13 Nov 2024 Daniel P. Jeong, Pranav Mani, Saurabh Garg, Zachary C. Lipton, Michael Oberst

Several recent works seek to develop foundation models specifically for medical applications, adapting general-purpose large language models (LLMs) and vision-language models (VLMs) via continued pretraining on publicly available biomedical corpora.

Question Answering

Medical Adaptation of Large Language and Vision-Language Models: Are We Making Progress?

1 code implementation6 Nov 2024 Daniel P. Jeong, Saurabh Garg, Zachary C. Lipton, Michael Oberst

For instance, across the tasks and model pairs we consider in the 3-shot setting, medical LLMs only outperform their base models in 12. 1% of cases, reach a (statistical) tie in 49. 8% of cases, and are significantly worse than their base models in the remaining 38. 2% of cases.

Question Answering

RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold

1 code implementation20 Jun 2024 Amrith Setlur, Saurabh Garg, Xinyang Geng, Naman Garg, Virginia Smith, Aviral Kumar

With this per-step scheme, we are able to attain consistent gains over only positive data, attaining performance similar to amplifying the amount of synthetic data by $\mathbf{8 \times}$.

Math Reinforcement Learning (RL)

Post-Hoc Reversal: Are We Selecting Models Prematurely?

1 code implementation11 Apr 2024 Rishabh Ranjan, Saurabh Garg, Mrigank Raman, Carlos Guestrin, Zachary Lipton

Trained models are often composed with post-hoc transforms such as temperature scaling (TS), ensembling and stochastic weight averaging (SWA) to improve performance, robustness, uncertainty estimation, etc.

Language Modelling MMLU

Complementary Benefits of Contrastive Learning and Self-Training Under Distribution Shift

no code implementations NeurIPS 2023 Saurabh Garg, Amrith Setlur, Zachary Chase Lipton, Sivaraman Balakrishnan, Virginia Smith, aditi raghunathan

Self-training and contrastive learning have emerged as leading techniques for incorporating unlabeled data, both under distribution shift (unsupervised domain adaptation) and when it is absent (semi-supervised learning).

Contrastive Learning Unsupervised Domain Adaptation

Deep Learning for Plant Identification and Disease Classification from Leaf Images: Multi-prediction Approaches

1 code implementation25 Oct 2023 Jianping Yao, Son N. Tran, Saurabh Garg, Samantha Sawyer

Furthermore, based on the survey of existing approaches in plant pathology and the study of available approaches in machine learning, we propose a new model named Generalised Stacking Multi-output CNN (GSMo-CNN).

TiC-CLIP: Continual Training of CLIP Models

1 code implementation24 Oct 2023 Saurabh Garg, Mehrdad Farajtabar, Hadi Pouransari, Raviteja Vemulapalli, Sachin Mehta, Oncel Tuzel, Vaishaal Shankar, Fartash Faghri

We introduce the first set of web-scale Time-Continual (TiC) benchmarks for training vision-language models: TiC-DataComp, TiC-YFCC, and TiC-Redcaps.

Continual Learning Retrieval

Machine Learning for Leaf Disease Classification: Data, Techniques and Applications

no code implementations19 Oct 2023 Jianping Yao, Son N. Tran, Samantha Sawyer, Saurabh Garg

Therefore, it is enormously beneficial for researchers, engineers, managers, and entrepreneurs to have a comprehensive view about the recent development of machine learning technologies and applications for leaf disease detection.

RLSbench: Domain Adaptation Under Relaxed Label Shift

1 code implementation6 Feb 2023 Saurabh Garg, Nick Erickson, James Sharpnack, Alex Smola, Sivaraman Balakrishnan, Zachary C. Lipton

Despite the emergence of principled methods for domain adaptation under label shift, their sensitivity to shifts in class conditional distributions is precariously under explored.

Domain Adaptation

Rapid-Motion-Track: Markerless Tracking of Fast Human Motion with Deeper Learning

no code implementations18 Jan 2023 Renjie Li, Chun Yu Lao, Rebecca St. George, Katherine Lawler, Saurabh Garg, Son N. Tran, Quan Bai, Jane Alty

RMT and a range of DLC models were applied to the video data with tapping frequencies up to 8Hz to extract movement features.

Disentangling the Mechanisms Behind Implicit Regularization in SGD

1 code implementation29 Nov 2022 Zachary Novack, Simran Kaur, Tanya Marwah, Saurabh Garg, Zachary C. Lipton

A number of competing hypotheses have been proposed to explain why small-batch Stochastic Gradient Descent (SGD)leads to improved generalization over the full-batch regime, with recent work crediting the implicit regularization of various quantities throughout training.

Characterizing Datapoints via Second-Split Forgetting

1 code implementation26 Oct 2022 Pratyush Maini, Saurabh Garg, Zachary C. Lipton, J. Zico Kolter

Popular metrics derived from these dynamics include (i) the epoch at which examples are first correctly classified; (ii) the number of times their predictions flip during training; and (iii) whether their prediction flips if they are held out.

Downstream Datasets Make Surprisingly Good Pretraining Corpora

1 code implementation28 Sep 2022 Kundan Krishna, Saurabh Garg, Jeffrey P. Bigham, Zachary C. Lipton

In experiments addressing both ELECTRA and RoBERTa models and 10 distinct downstream classification datasets, we observe that self-pretraining rivals standard pretraining on the BookWiki corpus (despite using around $10\times$--$500\times$ less data), outperforming the latter on $7$ and $5$ datasets, respectively.

Question Answering

Domain Adaptation under Open Set Label Shift

1 code implementation26 Jul 2022 Saurabh Garg, Sivaraman Balakrishnan, Zachary C. Lipton

We introduce the problem of domain adaptation under Open Set Label Shift (OSLS) where the label distribution can change arbitrarily and a new class may arrive during deployment, but the class-conditional distributions p(x|y) are domain-invariant.

Domain Adaptation

Unsupervised Learning under Latent Label Shift

2 code implementations26 Jul 2022 Manley Roberts, Pranav Mani, Saurabh Garg, Zachary C. Lipton

Thus motivated, we introduce a practical algorithm that leverages domain-discriminative models as follows: (i) push examples through domain discriminator $p(d|\mathbf{x})$; (ii) discretize the data by clustering examples in $p(d|\mathbf{x})$ space; (iii) perform non-negative matrix factorization on the discrete data; (iv) combine the recovered $p(y|d)$ with the discriminator outputs $p(d|\mathbf{x})$ to compute $p_d(y|x) \; \forall d$.

A Comprehensive Review on Deep Supervision: Theories and Applications

no code implementations6 Jul 2022 Renjie Li, Xinyi Wang, Guan Huang, Wenli Yang, Kaining Zhang, Xiaotong Gu, Son N. Tran, Saurabh Garg, Jane Alty, Quan Bai

Deep supervision, or known as 'intermediate supervision' or 'auxiliary supervision', is to add supervision at hidden layers of a neural network.

Deconstructing Distributions: A Pointwise Framework of Learning

1 code implementation20 Feb 2022 Gal Kaplun, Nikhil Ghosh, Saurabh Garg, Boaz Barak, Preetum Nakkiran

In this work, we propose a new approach: we measure the performance of a collection of models when evaluated on a $\textit{single input point}$.

Leveraging Unlabeled Data to Predict Out-of-Distribution Performance

1 code implementation ICLR 2022 Saurabh Garg, Sivaraman Balakrishnan, Zachary C. Lipton, Behnam Neyshabur, Hanie Sedghi

Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions that may cause performance drops.

Mixture Proportion Estimation and PU Learning: A Modern Approach

2 code implementations NeurIPS 2021 Saurabh Garg, Yifan Wu, Alex Smola, Sivaraman Balakrishnan, Zachary C. Lipton

Formally, this task is broken down into two subtasks: (i) Mixture Proportion Estimation (MPE) -- determining the fraction of positive examples in the unlabeled data; and (ii) PU-learning -- given such an estimate, learning the desired positive-versus-negative classifier.

Mixture Proportion Estimation and PU Learning:A Modern Approach

1 code implementation NeurIPS 2021 Saurabh Garg, Yifan Wu, Alex Smola, Sivaraman Balakrishnan, Zachary Chase Lipton

Formally, this task is broken down into two subtasks: (i) Mixture Proportion Estimation (MPE)---determining the fraction of positive examples in the unlabeled data; and (ii) PU-learning---given such an estimate, learning the desired positive-versus-negative classifier.

RATT: Leveraging Unlabeled Data to Guarantee Generalization

1 code implementation1 May 2021 Saurabh Garg, Sivaraman Balakrishnan, J. Zico Kolter, Zachary C. Lipton

To assess generalization, machine learning scientists typically either (i) bound the generalization gap and then (after training) plug in the empirical risk to obtain a bound on the true risk; or (ii) validate empirically on holdout data.

Generalization Bounds Holdout Set +1

A Unified View of Label Shift Estimation

no code implementations NeurIPS 2020 Saurabh Garg, Yifan Wu, Sivaraman Balakrishnan, Zachary C. Lipton

Our contributions include (i) consistency conditions for MLLS, which include calibration of the classifier and a confusion matrix invertibility condition that BBSE also requires; (ii) a unified framework, casting BBSE as roughly equivalent to MLLS for a particular choice of calibration method; and (iii) a decomposition of MLLS's finite-sample error into terms reflecting miscalibration and estimation error.

Dual Language Models for Code Switched Speech Recognition

no code implementations3 Nov 2017 Saurabh Garg, Tanmay Parekh, Preethi Jyothi

Since code-switching is a blend of two or more different languages, a standard bilingual language model can be improved upon by using structures of the monolingual language models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Cannot find the paper you are looking for? You can Submit a new open access paper.