3 code implementations • 31 Dec 2024 • Team OLMo, Pete Walsh, Luca Soldaini, Dirk Groeneveld, Kyle Lo, Shane Arora, Akshita Bhagia, Yuling Gu, Shengyi Huang, Matt Jordan, Nathan Lambert, Dustin Schwenk, Oyvind Tafjord, Taira Anderson, David Atkinson, Faeze Brahman, Christopher Clark, Pradeep Dasigi, Nouha Dziri, Michal Guerquin, Hamish Ivison, Pang Wei Koh, Jiacheng Liu, Saumya Malik, William Merrill, Lester James V. Miranda, Jacob Morrison, Tyler Murray, Crystal Nam, Valentina Pyatkin, Aman Rangapur, Michael Schmitz, Sam Skjonsberg, David Wadden, Christopher Wilhelm, Michael Wilson, Luke Zettlemoyer, Ali Farhadi, Noah A. Smith, Hannaneh Hajishirzi
Our modified model architecture and training recipe achieve both better training stability and improved per-token efficiency.
1 code implementation • 22 Nov 2024 • Nathan Lambert, Jacob Morrison, Valentina Pyatkin, Shengyi Huang, Hamish Ivison, Faeze Brahman, Lester James V. Miranda, Alisa Liu, Nouha Dziri, Shane Lyu, Yuling Gu, Saumya Malik, Victoria Graf, Jena D. Hwang, Jiangjiang Yang, Ronan Le Bras, Oyvind Tafjord, Chris Wilhelm, Luca Soldaini, Noah A. Smith, Yizhong Wang, Pradeep Dasigi, Hannaneh Hajishirzi
Language model post-training is applied to refine behaviors and unlock new skills across a wide range of recent language models, but open recipes for applying these techniques lag behind proprietary ones.
no code implementations • 17 Oct 2024 • Yuling Gu, Oyvind Tafjord, Hyunwoo Kim, Jared Moore, Ronan Le Bras, Peter Clark, Yejin Choi
"), and (c) judgment ("Mary paid for the chips.
2 code implementations • 3 Sep 2024 • Niklas Muennighoff, Luca Soldaini, Dirk Groeneveld, Kyle Lo, Jacob Morrison, Sewon Min, Weijia Shi, Pete Walsh, Oyvind Tafjord, Nathan Lambert, Yuling Gu, Shane Arora, Akshita Bhagia, Dustin Schwenk, David Wadden, Alexander Wettig, Binyuan Hui, Tim Dettmers, Douwe Kiela, Ali Farhadi, Noah A. Smith, Pang Wei Koh, Amanpreet Singh, Hannaneh Hajishirzi
We introduce OLMoE, a fully open, state-of-the-art language model leveraging sparse Mixture-of-Experts (MoE).
no code implementations • 12 Jun 2024 • Yuling Gu, Oyvind Tafjord, Bailey Kuehl, Dany Haddad, Jesse Dodge, Hannaneh Hajishirzi
Evaluating language models in particular is challenging, as small changes to how a model is evaluated on a task can lead to large changes in measured performance.
1 code implementation • 25 Apr 2024 • Wenlong Zhao, Debanjan Mondal, Niket Tandon, Danica Dillion, Kurt Gray, Yuling Gu
The awareness of multi-cultural human values is critical to the ability of language models (LMs) to generate safe and personalized responses.
no code implementations • 29 Feb 2024 • Tianyi Zhang, Li Zhang, Zhaoyi Hou, Ziyu Wang, Yuling Gu, Peter Clark, Chris Callison-Burch, Niket Tandon
Planning in a text-based environment continues to be a major challenge for AI systems.
3 code implementations • 1 Feb 2024 • Dirk Groeneveld, Iz Beltagy, Pete Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Ananya Harsh Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, Shane Arora, David Atkinson, Russell Authur, Khyathi Raghavi Chandu, Arman Cohan, Jennifer Dumas, Yanai Elazar, Yuling Gu, Jack Hessel, Tushar Khot, William Merrill, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Valentina Pyatkin, Abhilasha Ravichander, Dustin Schwenk, Saurabh Shah, Will Smith, Emma Strubell, Nishant Subramani, Mitchell Wortsman, Pradeep Dasigi, Nathan Lambert, Kyle Richardson, Luke Zettlemoyer, Jesse Dodge, Kyle Lo, Luca Soldaini, Noah A. Smith, Hannaneh Hajishirzi
Given the importance of these details in scientifically studying these models, including their biases and potential risks, we believe it is essential for the research community to have access to powerful, truly open LMs.
no code implementations • 16 Nov 2023 • Yuling Gu, Oyvind Tafjord, Peter Clark
While LLMs can provide reasoned explanations along with their answers, the nature and quality of those explanations are still poorly understood.
no code implementations • 24 Oct 2023 • Kavel Rao, Liwei Jiang, Valentina Pyatkin, Yuling Gu, Niket Tandon, Nouha Dziri, Faeze Brahman, Yejin Choi
From this model we distill a high-quality dataset, \delta-Rules-of-Thumb, of 1. 2M entries of contextualizations and rationales for 115K defeasible moral actions rated highly by human annotators 85. 9% to 99. 8% of the time.
1 code implementation • 20 Dec 2022 • Yuling Gu, Bhavana Dalvi Mishra, Peter Clark
Using these questions as probes, we observe that state-of-the-art pre-trained language models (LMs) like GPT-3 and Macaw have fragments of knowledge about these everyday things, but do not have fully coherent "parts mental models" (54-59% accurate, 19-43% conditional constraint violation).
no code implementations • 20 Dec 2022 • Yuling Gu
Transformer-based language models have shown strong performance on an array of natural language understanding tasks.
no code implementations • 17 Nov 2022 • Bingchen Zhao, Yuling Gu, Jessica Zosa Forde, Naomi Saphra
At NeurIPS, American and Chinese institutions cite papers from each other's regions substantially less than they cite endogamously.
1 code implementation • 28 Oct 2022 • Yuling Gu, Yao Fu, Valentina Pyatkin, Ian Magnusson, Bhavana Dalvi Mishra, Peter Clark
We hypothesize that to perform this task well, the reader needs to mentally elaborate the scene being described to identify a sensible meaning of the language.
no code implementations • 18 Feb 2022 • Yuling Gu, Nancy F. Chen
In tense and lax vowel pairs, we also consistently observe that the distinction is less conspicuous for Singaporean children compared to the other speaker groups.
1 code implementation • NAACL 2022 • Yuling Gu, Bhavana Dalvi Mishra, Peter Clark
To test this conjecture, we train a new model, DREAM, to answer questions that elaborate the scenes that situated questions are about, and then provide those elaborations as additional context to a question-answering (QA) model.
no code implementations • WS 2019 • Yuling Gu, Nancy Chen
We investigate English pronunciation patterns in Singaporean children in relation to their American and British counterparts by conducting archetypal analysis on selected vowel pairs.