no code implementations • 13 Apr 2025 • Joseph Liu, Yoonsoo Nam, Xinyue Cui, Swabha Swayamdipta
Second, existing human ratings associated with the benchmarks often contain a high degree of disagreement, resulting in inconsistent ratings; nevertheless, existing metrics still have to show higher correlations with these imperfect ratings.
1 code implementation • 6 Mar 2025 • Xinyue Cui, Johnny Tian-Zheng Wei, Swabha Swayamdipta, Robin Jia
Our watermarks are designed to be memorized by the LLM through seamlessly integrating in its training data, making them harder to detect lexically during preprocessing.
no code implementations • 9 Dec 2024 • Lincan Li, Jiaqi Li, Catherine Chen, Fred Gui, Hongjia Yang, Chenxiao Yu, Zhengguang Wang, Jianing Cai, Junlong Aaron Zhou, Bolin Shen, Alex Qian, Weixin Chen, Zhongkai Xue, Lichao Sun, Lifang He, Hanjie Chen, Kaize Ding, Zijian Du, Fangzhou Mu, Jiaxin Pei, Jieyu Zhao, Swabha Swayamdipta, Willie Neiswanger, Hua Wei, Xiyang Hu, Shixiang Zhu, Tianlong Chen, Yingzhou Lu, Yang Shi, Lianhui Qin, Tianfan Fu, Zhengzhong Tu, Yuzhe Yang, Jaemin Yoo, Jiaheng Zhang, Ryan Rossi, Liang Zhan, Liang Zhao, Emilio Ferrara, Yan Liu, Furong Huang, Xiangliang Zhang, Lawrence Rothenberg, Shuiwang Ji, Philip S. Yu, Yue Zhao, Yushun Dong
In recent years, large language models (LLMs) have been widely adopted in political science tasks such as election prediction, sentiment analysis, policy impact assessment, and misinformation detection.
no code implementations • 26 Aug 2024 • Urja Khurana, Eric Nalisnick, Antske Fokkens, Swabha Swayamdipta
Subjective tasks in NLP have been mostly relegated to objective standards, where the gold label is decided by taking the majority vote.
no code implementations • 18 Jul 2024 • Aryan Gulati, Xingjian Dong, Carlos Hurtado, Sarath Shekkizhar, Swabha Swayamdipta, Antonio Ortega
Our soft clustering approach for OOD detection highlights its potential for detecting tail-end phenomena in extreme-scale data settings.
1 code implementation • 2 Jul 2024 • Sayan Ghosh, Tejas Srinivasan, Swabha Swayamdipta
We address these challenges by introducing a meta-evaluation measure, separability, which estimates how suitable a test instance is for pairwise preference evaluation.
no code implementations • 21 Jun 2024 • Jaspreet Ranjit, Brihi Joshi, Rebecca Dorn, Laura Petry, Olga Koumoundouros, Jayne Bottarini, Peichen Liu, Eric Rice, Swabha Swayamdipta
We release annotations with varying degrees of assistance from language models, with immense benefits in scaling: 6. 5x speedup in annotation time while only incurring a 3 point F1 reduction in performance with respect to the domain experts.
1 code implementation • 7 Jun 2024 • Xinyue Cui, Swabha Swayamdipta
Despite the remarkable generative capabilities of language models in producing naturalistic language, their effectiveness on explicit manipulation and generation of linguistic structures remain understudied.
no code implementations • 14 Mar 2024 • Matthew Finlayson, Xiang Ren, Swabha Swayamdipta
Large language model (LLM) providers often hide the architectural details and parameters of their proprietary models by restricting public access to a limited API.
1 code implementation • 2 Oct 2023 • Matthew Finlayson, John Hewitt, Alexander Koller, Swabha Swayamdipta, Ashish Sabharwal
We provide a theoretical explanation for the effectiveness of the truncation sampling by proving that truncation methods that discard tokens below some probability threshold (the most common type of truncation) can guarantee that all sampled tokens have nonzero true probability.
no code implementations • 18 Sep 2023 • Yoonsoo Nam, Adam Lehavi, Daniel Yang, Digbalay Bose, Swabha Swayamdipta, Shrikanth Narayanan
Video summarization remains a huge challenge in computer vision due to the size of the input videos to be summarized.
no code implementations • 3 Jun 2023 • Xuhui Zhou, Hao Zhu, Akhila Yerukola, Thomas Davidson, Jena D. Hwang, Swabha Swayamdipta, Maarten Sap
To study the contextual dynamics of offensiveness, we train models to generate COBRA explanations, with and without access to the context.
1 code implementation • 8 May 2023 • Phillip Howard, Junlin Wang, Vasudev Lal, Gadi Singer, Yejin Choi, Swabha Swayamdipta
We introduce NeuroComparatives, a novel framework for comparative knowledge distillation overgenerated from language models such as GPT-variants and LLaMA, followed by stringent filtering of the generated knowledge.
1 code implementation • 27 Apr 2023 • Alisa Liu, Zhaofeng Wu, Julian Michael, Alane Suhr, Peter West, Alexander Koller, Swabha Swayamdipta, Noah A. Smith, Yejin Choi
We find that the task remains extremely challenging, including for GPT-4, whose generated disambiguations are considered correct only 32% of the time in human evaluation, compared to 90% for disambiguations in our dataset.
1 code implementation • 30 Dec 2022 • Krishna Pillutla, Lang Liu, John Thickstun, Sean Welleck, Swabha Swayamdipta, Rowan Zellers, Sewoong Oh, Yejin Choi, Zaid Harchaoui
We present MAUVE, a family of comparison measures between pairs of distributions such as those encountered in the generative modeling of text or images.
no code implementations • 19 Dec 2022 • Chandra Bhagavatula, Jena D. Hwang, Doug Downey, Ronan Le Bras, Ximing Lu, Lianhui Qin, Keisuke Sakaguchi, Swabha Swayamdipta, Peter West, Yejin Choi
Here, we investigate an alternative that a priori seems impossible: can smaller language models (e. g., GPT-2) win over models that are orders of magnitude larger and better (e. g., GPT-3), if powered with novel commonsense distillation algorithms?
1 code implementation • 22 Oct 2022 • Phillip Howard, Gadi Singer, Vasudev Lal, Yejin Choi, Swabha Swayamdipta
While counterfactual data augmentation offers a promising step towards robust generalization in natural language processing, producing a set of counterfactuals that offer valuable inductive bias for models remains a challenge.
1 code implementation • 10 Oct 2022 • Hanjie Chen, Faeze Brahman, Xiang Ren, Yangfeng Ji, Yejin Choi, Swabha Swayamdipta
More concretely, we propose a metric called REV (Rationale Evaluation with conditional V-information), to quantify the amount of new, label-relevant information in a rationale beyond the information already available in the input or the label.
no code implementations • 25 May 2022 • Jiao Sun, Swabha Swayamdipta, Jonathan May, Xuezhe Ma
After controlling for instances where rationales leak the correct answer while not providing additional background knowledge, we find that incorporating only 5% of rationales during training can boost model performance by 47. 22% for CoS-E and 57. 14% for ECQA during inference.
1 code implementation • 16 Jan 2022 • Alisa Liu, Swabha Swayamdipta, Noah A. Smith, Yejin Choi
Starting with an existing dataset, MultiNLI for natural language inference (NLI), our approach uses dataset cartography to automatically identify examples that demonstrate challenging reasoning patterns, and instructs GPT-3 to compose new examples with similar patterns.
1 code implementation • NAACL 2022 • Sarah Wiegreffe, Jack Hessel, Swabha Swayamdipta, Mark Riedl, Yejin Choi
We create a pipeline that combines GPT-3 with a supervised filter that incorporates binary acceptability judgments from humans in the loop.
no code implementations • NAACL 2022 • Maarten Sap, Swabha Swayamdipta, Laura Vianna, Xuhui Zhou, Yejin Choi, Noah A. Smith
The perceived toxicity of language can vary based on someone's identity and beliefs, but this variation is often ignored when collecting toxic language datasets, resulting in dataset and model biases.
1 code implementation • 16 Oct 2021 • Kawin Ethayarajh, Yejin Choi, Swabha Swayamdipta
However, this comparison provides little understanding of how difficult each instance in a given distribution is, or what attributes make the dataset difficult for a given model.
1 code implementation • EMNLP (LAW, DMR) 2021 • Ayush Pancholy, Miriam R. L. Petruck, Swabha Swayamdipta
While FrameNet is widely regarded as a rich resource of semantics in natural language processing, a major criticism concerns its lack of coverage and the relative paucity of its labeled data compared to other commonly used lexical resources such as PropBank and VerbNet.
1 code implementation • ACL 2021 • Alisa Liu, Maarten Sap, Ximing Lu, Swabha Swayamdipta, Chandra Bhagavatula, Noah A. Smith, Yejin Choi
Despite recent advances in natural language generation, it remains challenging to control attributes of generated text.
1 code implementation • EMNLP 2021 • Alon Jacovi, Swabha Swayamdipta, Shauli Ravfogel, Yanai Elazar, Yejin Choi, Yoav Goldberg
Our method is based on projecting model representation to a latent space that captures only the features that are useful (to the model) to differentiate two potential decisions.
5 code implementations • NeurIPS 2021 • Krishna Pillutla, Swabha Swayamdipta, Rowan Zellers, John Thickstun, Sean Welleck, Yejin Choi, Zaid Harchaoui
As major progress is made in open-ended text generation, measuring how close machine-generated text is to human language remains a critical open problem.
2 code implementations • EACL 2021 • Xuhui Zhou, Maarten Sap, Swabha Swayamdipta, Noah A. Smith, Yejin Choi
Overall, our findings show that debiasing a model trained on biased toxic language data is not as effective as simply relabeling the data to remove existing biases.
6 code implementations • EMNLP 2020 • Swabha Swayamdipta, Roy Schwartz, Nicholas Lourie, Yizhong Wang, Hannaneh Hajishirzi, Noah A. Smith, Yejin Choi
Experiments across four datasets show that these model-dependent measures reveal three distinct regions in the data map, each with pronounced characteristics.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Yiben Yang, Chaitanya Malaviya, Jared Fernandez, Swabha Swayamdipta, Ronan Le Bras, Ji-Ping Wang, Chandra Bhagavatula, Yejin Choi, Doug Downey
Recent advances in commonsense reasoning depend on large-scale human-annotated training data to achieve peak performance.
Ranked #1 on
Question Answering
on CODAH
6 code implementations • ACL 2020 • Suchin Gururangan, Ana Marasović, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, Noah A. Smith
Language models pretrained on text from a wide variety of sources form the foundation of today's NLP.
1 code implementation • ACL 2020 • Roy Schwartz, Gabriel Stanovsky, Swabha Swayamdipta, Jesse Dodge, Noah A. Smith
Our method presents a favorable speed/accuracy tradeoff in almost all cases, producing models which are up to five times faster than the state of the art, while preserving their accuracy.
1 code implementation • ICML 2020 • Ronan Le Bras, Swabha Swayamdipta, Chandra Bhagavatula, Rowan Zellers, Matthew E. Peters, Ashish Sabharwal, Yejin Choi
Large neural models have demonstrated human-level performance on language and vision benchmarks, while their performance degrades considerably on adversarial or out-of-distribution samples.
no code implementations • 29 Aug 2019 • Swabha Swayamdipta, Matthew Peters, Brendan Roof, Chris Dyer, Noah A. Smith
Shallow syntax provides an approximation of phrase-syntactic structure of sentences; it can be produced with high accuracy, and is computationally cheap to obtain.
no code implementations • NAACL 2019 • Sebastian Ruder, Matthew E. Peters, Swabha Swayamdipta, Thomas Wolf
The classic supervised machine learning paradigm is based on learning in isolation, a single predictive model for a task using a single dataset.
1 code implementation • EMNLP 2018 • Swabha Swayamdipta, Sam Thomson, Kenton Lee, Luke Zettlemoyer, Chris Dyer, Noah A. Smith
We introduce the syntactic scaffold, an approach to incorporating syntactic information into semantic tasks.
no code implementations • ACL 2018 • Phoebe Mulcaire, Swabha Swayamdipta, Noah Smith
Previous approaches to multilingual semantic dependency parsing treat languages independently, without exploiting the similarities between semantic structures across languages.
2 code implementations • NAACL 2018 • Hao Peng, Sam Thomson, Swabha Swayamdipta, Noah A. Smith
We present a new approach to learning semantic parsers from multiple datasets, even when the target semantic formalisms are drastically different, and the underlying corpora do not overlap.
no code implementations • NAACL 2018 • Suchin Gururangan, Swabha Swayamdipta, Omer Levy, Roy Schwartz, Samuel R. Bowman, Noah A. Smith
Large-scale datasets for natural language inference are created by presenting crowd workers with a sentence (premise), and asking them to generate three new sentences (hypotheses) that it entails, contradicts, or is logically neutral with respect to.
no code implementations • ICLR 2018 • Swabha Swayamdipta, Ankur P. Parikh, Tom Kwiatkowski
Reading comprehension is a challenging task, especially when executed across longer or across multiple evidence documents, where the answer is likely to reoccur.
10 code implementations • 29 Jun 2017 • Swabha Swayamdipta, Sam Thomson, Chris Dyer, Noah A. Smith
We present a new, efficient frame-semantic parser that labels semantic arguments to FrameNet predicates.
4 code implementations • 15 Jan 2017 • Graham Neubig, Chris Dyer, Yoav Goldberg, Austin Matthews, Waleed Ammar, Antonios Anastasopoulos, Miguel Ballesteros, David Chiang, Daniel Clothiaux, Trevor Cohn, Kevin Duh, Manaal Faruqui, Cynthia Gan, Dan Garrette, Yangfeng Ji, Lingpeng Kong, Adhiguna Kuncoro, Gaurav Kumar, Chaitanya Malaviya, Paul Michel, Yusuke Oda, Matthew Richardson, Naomi Saphra, Swabha Swayamdipta, Pengcheng Yin
In the static declaration strategy that is used in toolkits like Theano, CNTK, and TensorFlow, the user first defines a computation graph (a symbolic representation of the computation), and then examples are fed into an engine that executes this computation and computes its derivatives.
1 code implementation • CONLL 2016 • Swabha Swayamdipta, Miguel Ballesteros, Chris Dyer, Noah A. Smith
We present a transition-based parser that jointly produces syntactic and semantic dependencies.