no code implementations • 6 Feb 2025 • Shangbin Feng, Wenxuan Ding, Alisa Liu, Zifeng Wang, Weijia Shi, Yike Wang, Zejiang Shen, Xiaochuang Han, Hunter Lang, Chen-Yu Lee, Tomas Pfister, Yejin Choi, Yulia Tsvetkov
This position paper argues that in many realistic (i. e., complex, contextualized, subjective) scenarios, one LLM is not enough to produce a reliable output.
1 code implementation • 22 Nov 2024 • Nathan Lambert, Jacob Morrison, Valentina Pyatkin, Shengyi Huang, Hamish Ivison, Faeze Brahman, Lester James V. Miranda, Alisa Liu, Nouha Dziri, Shane Lyu, Yuling Gu, Saumya Malik, Victoria Graf, Jena D. Hwang, Jiangjiang Yang, Ronan Le Bras, Oyvind Tafjord, Chris Wilhelm, Luca Soldaini, Noah A. Smith, Yizhong Wang, Pradeep Dasigi, Hannaneh Hajishirzi
Language model post-training is applied to refine behaviors and unlock new skills across a wide range of recent language models, but open recipes for applying these techniques lag behind proprietary ones.
1 code implementation • 12 Aug 2024 • Hila Gonen, Terra Blevins, Alisa Liu, Luke Zettlemoyer, Noah A. Smith
Despite their wide adoption, the biases and unintended behaviors of language models remain poorly understood.
1 code implementation • 23 Jul 2024 • Jonathan Hayase, Alisa Liu, Yejin Choi, Sewoong Oh, Noah A. Smith
Our key insight is that the ordered list of merge rules learned by a BPE tokenizer naturally reveals information about the token frequencies in its training data.
1 code implementation • 27 Jun 2024 • Ruizhe Shi, Yifang Chen, Yushi Hu, Alisa Liu, Hannaneh Hajishirzi, Noah A. Smith, Simon S. Du
Unlike traditional methods that require careful curation of a mixture of datasets to achieve comprehensive improvement, we can quickly experiment with preference weightings using MOD to find the best combination of models.
no code implementations • 21 Mar 2024 • Margaret Y. Li, Alisa Liu, Zhaofeng Wu, Noah A. Smith
Ambiguity is an critical component of language that allows for more effective communication between speakers, but is often ignored in NLP.
2 code implementations • 16 Jan 2024 • Alisa Liu, Xiaochuang Han, Yizhong Wang, Yulia Tsvetkov, Yejin Choi, Noah A. Smith
Despite the general capabilities of large pretrained language models, they consistently benefit from further adaptation to better achieve desired behaviors.
1 code implementation • 23 Oct 2023 • Jaechan Lee, Alisa Liu, Orevaoghene Ahia, Hila Gonen, Noah A. Smith
In experiments, we compare MT-specific models and language models for (i) their preference when given an ambiguous subsentence, (ii) their sensitivity to disambiguating context, and (iii) the performance disparity between figurative and literal source sentences.
no code implementations • 15 Jun 2023 • Ian R. McKenzie, Alexander Lyzhov, Michael Pieler, Alicia Parrish, Aaron Mueller, Ameya Prabhu, Euan McLean, Aaron Kirtland, Alexis Ross, Alisa Liu, Andrew Gritsevskiy, Daniel Wurgaft, Derik Kauffman, Gabriel Recchia, Jiacheng Liu, Joe Cavanagh, Max Weiss, Sicong Huang, The Floating Droid, Tom Tseng, Tomasz Korbak, Xudong Shen, Yuhui Zhang, Zhengping Zhou, Najoung Kim, Samuel R. Bowman, Ethan Perez
Here, we present evidence for the claim that LMs may show inverse scaling, or worse task performance with increased scale, e. g., due to flaws in the training objective and data.
1 code implementation • 22 May 2023 • Muru Zhang, Ofir Press, William Merrill, Alisa Liu, Noah A. Smith
A major risk of using language models in practical applications is their tendency to hallucinate incorrect statements.
1 code implementation • 27 Apr 2023 • Alisa Liu, Zhaofeng Wu, Julian Michael, Alane Suhr, Peter West, Alexander Koller, Swabha Swayamdipta, Noah A. Smith, Yejin Choi
We find that the task remains extremely challenging, including for GPT-4, whose generated disambiguations are considered correct only 32% of the time in human evaluation, compared to 90% for disambiguations in our dataset.
1 code implementation • 20 Dec 2022 • Skyler Hallinan, Alisa Liu, Yejin Choi, Maarten Sap
Text detoxification has the potential to mitigate the harms of toxicity by rephrasing text to remove offensive meaning, but subtle toxicity remains challenging to tackle.
17 code implementations • 20 Dec 2022 • Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A. Smith, Daniel Khashabi, Hannaneh Hajishirzi
Applying our method to the vanilla GPT3, we demonstrate a 33% absolute improvement over the original model on Super-NaturalInstructions, on par with the performance of InstructGPT-001, which was trained with private user data and human annotations.
1 code implementation • 16 Jan 2022 • Alisa Liu, Swabha Swayamdipta, Noah A. Smith, Yejin Choi
Starting with an existing dataset, MultiNLI for natural language inference (NLI), our approach uses dataset cartography to automatically identify examples that demonstrate challenging reasoning patterns, and instructs GPT-3 to compose new examples with similar patterns.
1 code implementation • ACL 2022 • Jiacheng Liu, Alisa Liu, Ximing Lu, Sean Welleck, Peter West, Ronan Le Bras, Yejin Choi, Hannaneh Hajishirzi
It remains an open question whether incorporating external knowledge benefits commonsense reasoning while maintaining the flexibility of pretrained sequence models.
1 code implementation • ACL 2021 • Alisa Liu, Maarten Sap, Ximing Lu, Swabha Swayamdipta, Chandra Bhagavatula, Noah A. Smith, Yejin Choi
Despite recent advances in natural language generation, it remains challenging to control attributes of generated text.
1 code implementation • 23 Jun 2020 • Alexander Fang, Alisa Liu, Prem Seetharaman, Bryan Pardo
Deep generative systems that learn probabilistic models from a corpus of existing music do not explicitly encode knowledge of a musical style, compared to traditional rule-based systems.
1 code implementation • 23 Jun 2020 • Alisa Liu, Alexander Fang, Gaëtan Hadjeres, Prem Seetharaman, Bryan Pardo
In this paper, we present augmentative generation (Aug-Gen), a method of dataset augmentation for any music generation system trained on a resource-constrained domain.
no code implementations • 23 Oct 2019 • Alisa Liu, Prem Seetharaman, Bryan Pardo
We compare our confidence-based ensemble approach to using individual models with no selection, to an oracle that always selects the best model and to a random model selector.
no code implementations • 19 Sep 2019 • Ruimin Zhu, Thanapon Noraset, Alisa Liu, Wenxin Jiang, Doug Downey
Word embeddings capture syntactic and semantic information about words.
1 code implementation • WS 2019 • Michael Chen, Mike D{'}Arcy, Alisa Liu, Fern, Jared ez, Doug Downey
To produce a more difficult dataset, we introduce a novel procedure for question acquisition in which workers author questions designed to target weaknesses of state-of-the-art neural question answering systems.
2 code implementations • 8 Apr 2019 • Michael Chen, Mike D'Arcy, Alisa Liu, Jared Fernandez, Doug Downey
To produce a more difficult dataset, we introduce a novel procedure for question acquisition in which workers author questions designed to target weaknesses of state-of-the-art neural question answering systems.
Ranked #1 on
Common Sense Reasoning
on CODAH
(using extra training data)