Search Results for author: Suchin Gururangan

Found 26 papers, 17 papers with code

Expected Validation Performance and Estimation of a Random Variable’s Maximum

no code implementations • Findings (EMNLP) 2021 • Jesse Dodge, Suchin Gururangan, Dallas Card, Roy Schwartz, Noah A. Smith

We find that the two biased estimators lead to the fewest incorrect conclusions, which hints at the importance of minimizing variance and MSE.

Paper
Add Code

Language models scale reliably with over-training and on downstream tasks

1 code implementation • 13 Mar 2024 • Samir Yitzhak Gadre, Georgios Smyrnis, Vaishaal Shankar, Suchin Gururangan, Mitchell Wortsman, Rulin Shao, Jean Mercat, Alex Fang, Jeffrey Li, Sedrick Keh, Rui Xin, Marianna Nezhurina, Igor Vasiljevic, Jenia Jitsev, Alexandros G. Dimakis, Gabriel Ilharco, Shuran Song, Thomas Kollar, Yair Carmon, Achal Dave, Reinhard Heckel, Niklas Muennighoff, Ludwig Schmidt

We fit scaling laws that extrapolate in both the number of model parameters and the ratio of training tokens to parameters.

Language Modelling

Paper
Code

LESS: Selecting Influential Data for Targeted Instruction Tuning

1 code implementation • 6 Feb 2024 • Mengzhou Xia, Sadhika Malladi, Suchin Gururangan, Sanjeev Arora, Danqi Chen

Instruction tuning has unlocked powerful capabilities in large language models (LLMs), effectively using combined datasets to develop generalpurpose chatbots.

211

Paper
Code

Breaking the Curse of Multilinguality with Cross-lingual Expert Language Models

no code implementations • 19 Jan 2024 • Terra Blevins, Tomasz Limisiewicz, Suchin Gururangan, Margaret Li, Hila Gonen, Noah A. Smith, Luke Zettlemoyer

Despite their popularity in non-English NLP, multilingual language models often underperform monolingual ones due to inter-language competition for model parameters.

Paper
Add Code

AboutMe: Using Self-Descriptions in Webpages to Document the Effects of English Pretraining Data Filters

1 code implementation • 12 Jan 2024 • Li Lucy, Suchin Gururangan, Luca Soldaini, Emma Strubell, David Bamman, Lauren Klein, Jesse Dodge

Large language models' (LLMs) abilities are drawn from their pretraining data, and model development begins with data curation.

Language Identification

Paper
Code

Time is Encoded in the Weights of Finetuned Language Models

1 code implementation • 20 Dec 2023 • Kai Nylund, Suchin Gururangan, Noah A. Smith

We present time vectors, a simple tool to customize language models to new time periods.

Language Modelling

Paper
Code

SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore

1 code implementation • 8 Aug 2023 • Sewon Min, Suchin Gururangan, Eric Wallace, Hannaneh Hajishirzi, Noah A. Smith, Luke Zettlemoyer

SILO is built by (1) training a parametric LM on Open License Corpus (OLC), a new corpus we curate with 228B tokens of public domain and permissively licensed text and (2) augmenting it with a more general and easily modifiable nonparametric datastore (e. g., containing copyrighted books or news) that is only queried during inference.

Language Modelling Sentence

Paper
Code

Information Flow Control in Machine Learning through Modular Model Architecture

no code implementations • 5 Jun 2023 • Trishita Tiwari, Suchin Gururangan, Chuan Guo, Weizhe Hua, Sanjay Kariyappa, Udit Gupta, Wenjie Xiong, Kiwan Maeng, Hsien-Hsin S. Lee, G. Edward Suh

In today's machine learning (ML) models, any part of the training data can affect its output.

Language Modelling

Paper
Add Code

Scaling Expert Language Models with Unsupervised Domain Discovery

1 code implementation • 24 Mar 2023 • Suchin Gururangan, Margaret Li, Mike Lewis, Weijia Shi, Tim Althoff, Noah A. Smith, Luke Zettlemoyer

Large language models are typically trained densely: all parameters are updated with respect to all inputs.

Language Modelling

103

Paper
Code

Editing Models with Task Arithmetic

3 code implementations • 8 Dec 2022 • Gabriel Ilharco, Marco Tulio Ribeiro, Mitchell Wortsman, Suchin Gururangan, Ludwig Schmidt, Hannaneh Hajishirzi, Ali Farhadi

Changing how pre-trained models behave -- e. g., improving their performance on a downstream task or mitigating biases learned during pre-training -- is a common practice when developing machine learning systems.

Negation

349

Paper
Code

lo-fi: distributed fine-tuning without communication

no code implementations • 19 Oct 2022 • Mitchell Wortsman, Suchin Gururangan, Shen Li, Ali Farhadi, Ludwig Schmidt, Michael Rabbat, Ari S. Morcos

When fine-tuning DeiT-base and DeiT-large on ImageNet, this procedure matches accuracy in-distribution and improves accuracy under distribution shift compared to the baseline, which observes the same amount of data but communicates gradients at each step.

Paper
Add Code

M2D2: A Massively Multi-domain Language Modeling Dataset

1 code implementation • 13 Oct 2022 • Machel Reid, Victor Zhong, Suchin Gururangan, Luke Zettlemoyer

We present M2D2, a fine-grained, massively multi-domain corpus for studying domain adaptation in language models (LMs).

Domain Generalization Language Modelling

Paper
Code

Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models

2 code implementations • 5 Aug 2022 • Margaret Li, Suchin Gururangan, Tim Dettmers, Mike Lewis, Tim Althoff, Noah A. Smith, Luke Zettlemoyer

New ELMs are learned by branching from (mixtures of) ELMs in the current set, further training the parameters on data for the new domain, and then merging the resulting model back into the set for future use.

202

Paper
Code

kNN-Prompt: Nearest Neighbor Zero-Shot Inference

1 code implementation • 27 May 2022 • Weijia Shi, Julian Michael, Suchin Gururangan, Luke Zettlemoyer

Retrieval-augmented language models (LMs) use non-parametric memory to substantially outperform their non-retrieval counterparts on perplexity-based evaluations, but it is an open question whether they achieve similar gains in few- and zero-shot end-task accuracy.

Domain Adaptation Language Modelling +6

Paper
Code

Whose Language Counts as High Quality? Measuring Language Ideologies in Text Data Selection

no code implementations • 25 Jan 2022 • Suchin Gururangan, Dallas Card, Sarah K. Dreier, Emily K. Gade, Leroy Z. Wang, Zeyu Wang, Luke Zettlemoyer, Noah A. Smith

Language models increasingly rely on massive web dumps for diverse text data.

Language Modelling

Paper
Add Code

Time Waits for No One! Analysis and Challenges of Temporal Misalignment

1 code implementation • NAACL 2022 • Kelvin Luu, Daniel Khashabi, Suchin Gururangan, Karishma Mandyam, Noah A. Smith

When an NLP model is trained on text data from one time period and tested or deployed on data from another, the resulting temporal misalignment can degrade end-task performance.

Paper
Code

Expected Validation Performance and Estimation of a Random Variable's Maximum

no code implementations • 1 Oct 2021 • Jesse Dodge, Suchin Gururangan, Dallas Card, Roy Schwartz, Noah A. Smith

We find that the two biased estimators lead to the fewest incorrect conclusions, which hints at the importance of minimizing variance and MSE.

Paper
Add Code

DEMix Layers: Disentangling Domains for Modular Language Modeling

2 code implementations • NAACL 2022 • Suchin Gururangan, Mike Lewis, Ari Holtzman, Noah A. Smith, Luke Zettlemoyer

We introduce a new domain expert mixture (DEMix) layer that enables conditioning a language model (LM) on the domain of the input text.

Language Modelling

Paper
Code

All That's `Human' Is Not Gold: Evaluating Human Evaluation of Generated Text

no code implementations • ACL 2021 • Elizabeth Clark, Tal August, Sofia Serrano, Nikita Haduong, Suchin Gururangan, Noah A. Smith

Human evaluations are typically considered the gold standard in natural language generation, but as models{'} fluency improves, how well can evaluators detect and judge machine-generated text?

nlg evaluation Text Generation

Paper
Add Code

All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text

no code implementations • 30 Jun 2021 • Elizabeth Clark, Tal August, Sofia Serrano, Nikita Haduong, Suchin Gururangan, Noah A. Smith

Human evaluations are typically considered the gold standard in natural language generation, but as models' fluency improves, how well can evaluators detect and judge machine-generated text?

nlg evaluation Text Generation

Paper
Add Code

Detoxifying Language Models Risks Marginalizing Minority Voices

1 code implementation • NAACL 2021 • Albert Xu, Eshaan Pathak, Eric Wallace, Suchin Gururangan, Maarten Sap, Dan Klein

Language models (LMs) must be both safe and equitable to be responsibly deployed in practice.

Text Generation

Paper
Code

RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models

2 code implementations • Findings of the Association for Computational Linguistics 2020 • Samuel Gehman, Suchin Gururangan, Maarten Sap, Yejin Choi, Noah A. Smith

We investigate the extent to which pretrained LMs can be prompted to generate toxic language, and the effectiveness of controllable text generation algorithms at preventing such toxic degeneration.

Sentence Text Generation

165

Paper
Code

Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

6 code implementations • ACL 2020 • Suchin Gururangan, Ana Marasović, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, Noah A. Smith

Language models pretrained on text from a wide variety of sources form the foundation of today's NLP.

Citation Intent Classification

519

Paper
Code

Show Your Work: Improved Reporting of Experimental Results

4 code implementations • IJCNLP 2019 • Jesse Dodge, Suchin Gururangan, Dallas Card, Roy Schwartz, Noah A. Smith

Research in natural language processing proceeds, in part, by demonstrating that new models achieve superior performance (e. g., accuracy) on held-out test data, compared to previous results.

2,096

Paper
Code

Variational Pretraining for Semi-supervised Text Classification

1 code implementation • ACL 2019 • Suchin Gururangan, Tam Dang, Dallas Card, Noah A. Smith

We accompany this paper with code to pretrain and use VAMPIRE embeddings in downstream tasks.

General Classification Semi-Supervised Text Classification

175

Paper
Code

Annotation Artifacts in Natural Language Inference Data

no code implementations • NAACL 2018 • Suchin Gururangan, Swabha Swayamdipta, Omer Levy, Roy Schwartz, Samuel R. Bowman, Noah A. Smith

Large-scale datasets for natural language inference are created by presenting crowd workers with a sentence (premise), and asking them to generate three new sentences (hypotheses) that it entails, contradicts, or is logically neutral with respect to.

Natural Language Inference Negation +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.