no code implementations • 21 Mar 2025 • Luxi He, Xiangyu Qi, Michel Liao, Inyoung Cheong, Prateek Mittal, Danqi Chen, Peter Henderson
We are at a turning point for language models that accept audio input.
1 code implementation • 11 Feb 2025 • Yong Lin, Shange Tang, Bohan Lyu, Jiayun Wu, Hongzhou Lin, Kaiyu Yang, Jia Li, Mengzhou Xia, Danqi Chen, Sanjeev Arora, Chi Jin
On the miniF2F benchmark, it achieves a 57. 6% success rate (Pass@32), exceeding the previous best open-source model by 7. 6%.
no code implementations • 9 Jan 2025 • Xi Ye, Fangcong Yin, Yinghui He, Joie Zhang, Howard Yen, Tianyu Gao, Greg Durrett, Danqi Chen
LongProc consists of six diverse procedural generation tasks, such as extracting structured information from HTML pages into a TSV format and executing complex search procedures to create travel plans.
1 code implementation • 3 Jan 2025 • Tianyu Gao, Alexander Wettig, Luxi He, Yihe Dong, Sadhika Malladi, Danqi Chen
The vast diversity of styles, domains, and quality levels present in language model pre-training corpora is essential in developing general model capabilities, but efficiently learning and deploying the correct behaviors exemplified in each of these heterogeneous data sources is challenging.
1 code implementation • 11 Nov 2024 • Howard Chen, Jiayi Geng, Adithya Bhaskar, Dan Friedman, Danqi Chen
REMIX prevents forgetting by mixing generic data sampled from pretraining corpora or even randomly generated word sequences during each stage, despite being unrelated to the memorized factoids in the first stage.
1 code implementation • 11 Oct 2024 • Noam Razin, Sadhika Malladi, Adithya Bhaskar, Danqi Chen, Sanjeev Arora, Boris Hanin
Direct Preference Optimization (DPO) and its variants are increasingly used for aligning language models with human preferences.
1 code implementation • 3 Oct 2024 • Howard Yen, Tianyu Gao, Minmin Hou, Ke Ding, Daniel Fleischer, Peter Izsak, Moshe Wasserblat, Danqi Chen
There have been many benchmarks for evaluating long-context language models (LCLMs), but developers often rely on synthetic tasks like needle-in-a-haystack (NIAH) or arbitrary subsets of tasks.
1 code implementation • 3 Oct 2024 • Tianyu Gao, Alexander Wettig, Howard Yen, Danqi Chen
We study continued training and supervised fine-tuning (SFT) of a language model (LM) to make effective use of long-context information.
no code implementations • 16 Jul 2024 • Hongjin Su, Howard Yen, Mengzhou Xia, Weijia Shi, Niklas Muennighoff, Han-yu Wang, Haisu Liu, Quan Shi, Zachary S. Siegel, Michael Tang, Ruoxi Sun, Jinsung Yoon, Sercan O. Arik, Danqi Chen, Tao Yu
To better benchmark retrieval on such challenging queries, we introduce BRIGHT, the first text retrieval benchmark that requires intensive reasoning to retrieve relevant documents.
1 code implementation • 15 Jul 2024 • Dan Friedman, Abhishek Panigrahi, Danqi Chen
Next, we train Transformers on a dataset of synthetically generated ELIZA conversations and investigate the mechanisms the models learn.
1 code implementation • 10 Jul 2024 • Anirudh Ajith, Mengzhou Xia, Alexis Chevalier, Tanya Goyal, Danqi Chen, Tianyu Gao
LitSearch is constructed using a combination of (1) questions generated by GPT-4 based on paragraphs containing inline citations from research papers and (2) questions manually written by authors about their recently published papers.
1 code implementation • 26 Jun 2024 • ZiRui Wang, Mengzhou Xia, Luxi He, Howard Chen, Yitao Liu, Richard Zhu, Kaiqu Liang, Xindi Wu, Haotian Liu, Sadhika Malladi, Alexis Chevalier, Sanjeev Arora, Danqi Chen
All models lag far behind human performance of 80. 5%, underscoring weaknesses in the chart understanding capabilities of existing MLLMs.
1 code implementation • 24 Jun 2024 • Adithya Bhaskar, Alexander Wettig, Dan Friedman, Danqi Chen
Our method finds circuits in GPT-2 that use less than half the number of edges compared to circuits found by previous methods while being equally faithful to the full model predictions on standard circuit-finding tasks.
1 code implementation • 20 Jun 2024 • Tinghao Xie, Xiangyu Qi, Yi Zeng, Yangsibo Huang, Udari Madhushani Sehwag, Kaixuan Huang, Luxi He, Boyi Wei, Dacheng Li, Ying Sheng, Ruoxi Jia, Bo Li, Kai Li, Danqi Chen, Peter Henderson, Prateek Mittal
First, existing methods often use coarse-grained taxonomies of unsafe topics, and are over-representing some fine-grained topics.
1 code implementation • 20 Jun 2024 • Luxi He, Yangsibo Huang, Weijia Shi, Tinghao Xie, Haotian Liu, Yue Wang, Luke Zettlemoyer, Chiyuan Zhang, Danqi Chen, Peter Henderson
Our evaluation systematically shows that both image and video generation models can still generate characters even if characters' names are not explicitly mentioned in the prompt, sometimes with only two generic keywords (e. g., prompting with "videogame, plumber" consistently generates Nintendo's Mario character).
no code implementations • 29 May 2024 • Xiangyu Qi, Yangsibo Huang, Yi Zeng, Edoardo Debenedetti, Jonas Geiping, Luxi He, Kaixuan Huang, Udari Madhushani, Vikash Sehwag, Weijia Shi, Boyi Wei, Tinghao Xie, Danqi Chen, Pin-Yu Chen, Jeffrey Ding, Ruoxi Jia, Jiaqi Ma, Arvind Narayanan, Weijie J Su, Mengdi Wang, Chaowei Xiao, Bo Li, Dawn Song, Peter Henderson, Prateek Mittal
The exposure of security vulnerabilities in safety-aligned language models, e. g., susceptibility to adversarial attacks, has shed light on the intricate interplay between AI safety and AI security.
1 code implementation • 24 May 2024 • Chong Xiang, Tong Wu, Zexuan Zhong, David Wagner, Danqi Chen, Prateek Mittal
Retrieval-augmented generation (RAG) has been shown vulnerable to retrieval corruption attacks: an attacker can inject malicious passages into retrieval results to induce inaccurate responses.
2 code implementations • 23 May 2024 • Yu Meng, Mengzhou Xia, Danqi Chen
Direct Preference Optimization (DPO) is a widely used offline preference optimization algorithm that reparameterizes reward functions in reinforcement learning from human feedback (RLHF) to enhance simplicity and training stability.
no code implementations • 6 May 2024 • Zexuan Zhong, Mengzhou Xia, Danqi Chen, Mike Lewis
Mixture-of-experts (MoE) models facilitate efficient scaling; however, training the router network introduces the challenge of optimizing a non-differentiable, discrete objective.
1 code implementation • 15 Apr 2024 • Usman Anwar, Abulhair Saparov, Javier Rando, Daniel Paleka, Miles Turpin, Peter Hase, Ekdeep Singh Lubana, Erik Jenner, Stephen Casper, Oliver Sourbut, Benjamin L. Edelman, Zhaowei Zhang, Mario Günther, Anton Korinek, Jose Hernandez-Orallo, Lewis Hammond, Eric Bigelow, Alexander Pan, Lauro Langosco, Tomasz Korbak, Heidi Zhang, Ruiqi Zhong, Seán Ó hÉigeartaigh, Gabriel Recchia, Giulio Corsi, Alan Chan, Markus Anderljung, Lilian Edwards, Aleksandar Petrov, Christian Schroeder de Witt, Sumeet Ramesh Motwan, Yoshua Bengio, Danqi Chen, Philip H. S. Torr, Samuel Albanie, Tegan Maharaj, Jakob Foerster, Florian Tramer, He He, Atoosa Kasirzadeh, Yejin Choi, David Krueger
This work identifies 18 foundational challenges in assuring the alignment and safety of large language models (LLMs).
1 code implementation • 6 Mar 2024 • Adithya Bhaskar, Dan Friedman, Danqi Chen
Instead of finding competing subnetworks, we find that all subnetworks -- whether they generalize or not -- share a set of attention heads, which we refer to as the heuristic core.
no code implementations • 5 Mar 2024 • Akari Asai, Zexuan Zhong, Danqi Chen, Pang Wei Koh, Luke Zettlemoyer, Hannaneh Hajishirzi, Wen-tau Yih
Parametric language models (LMs), which are trained on vast amounts of web data, exhibit remarkable flexibility and capability.
1 code implementation • 26 Feb 2024 • Howard Yen, Tianyu Gao, Danqi Chen
However, the substantial computational cost of transformers and limited generalization of positional encoding restrict the size of their context window.
1 code implementation • 21 Feb 2024 • Tianyu Gao, ZiRui Wang, Adithya Bhaskar, Danqi Chen
An emerging family of language models (LMs), capable of processing both text and images within a single visual view, has the promise to unlock complex tasks such as chart understanding and UI navigation.
1 code implementation • 16 Feb 2024 • Alexis Chevalier, Jiayi Geng, Alexander Wettig, Howard Chen, Sebastian Mizera, Toni Annala, Max Jameson Aragon, Arturo Rodríguez Fanlo, Simon Frieder, Simon Machado, Akshara Prabhakar, Ellie Thieu, Jiachen T. Wang, ZiRui Wang, Xindi Wu, Mengzhou Xia, Wenhan Xia, Jiatong Yu, Jun-Jie Zhu, Zhiyong Jason Ren, Sanjeev Arora, Danqi Chen
We use TutorChat to fine-tune Llemma models with 7B and 34B parameters.
1 code implementation • 15 Feb 2024 • Alexander Wettig, Aatmik Gupta, Saumya Malik, Danqi Chen
We train a QuRater model to learn scalar ratings from pairwise judgments, and use it to annotate a 260B training corpus with quality ratings for each of the four criteria.
3 code implementations • 6 Feb 2024 • Mengzhou Xia, Sadhika Malladi, Suchin Gururangan, Sanjeev Arora, Danqi Chen
Instruction tuning has unlocked powerful capabilities in large language models (LLMs), effectively using combined datasets to develop generalpurpose chatbots.
no code implementations • 6 Dec 2023 • Dan Friedman, Andrew Lampinen, Lucas Dixon, Danqi Chen, Asma Ghandeharioun
We find consistent generalization gaps: cases in which the simplified proxies are more faithful to the original model on the in-distribution evaluations and less faithful on various tests of systematic generalization.
1 code implementation • 29 Oct 2023 • Zexuan Zhong, Ziqing Huang, Alexander Wettig, Danqi Chen
Dense retrievers have achieved state-of-the-art performance in various information retrieval tasks, but to what extent can they be safely deployed in real-world applications?
1 code implementation • 25 Oct 2023 • Weijia Shi, Anirudh Ajith, Mengzhou Xia, Yangsibo Huang, Daogao Liu, Terra Blevins, Danqi Chen, Luke Zettlemoyer
Min-K% Prob can be applied without any knowledge about the pretraining corpus or any additional training, departing from previous detection methods that require training a reference model on data that is similar to the pretraining data.
1 code implementation • 11 Oct 2023 • Zhiyuan Zeng, Jiatong Yu, Tianyu Gao, Yu Meng, Tanya Goyal, Danqi Chen
As research in large language models (LLMs) continues to accelerate, LLM-based evaluation has emerged as a scalable and cost-effective alternative to human evaluations for comparing the ever increasing list of models.
2 code implementations • 10 Oct 2023 • Yangsibo Huang, Samyak Gupta, Mengzhou Xia, Kai Li, Danqi Chen
Finally, we propose an effective alignment method that explores diverse generation strategies, which can reasonably reduce the misalignment rate under our attack.
2 code implementations • 10 Oct 2023 • Mengzhou Xia, Tianyu Gao, Zhiyuan Zeng, Danqi Chen
In this work, we study structured pruning as an effective means to develop smaller LLMs from pre-trained, larger models.
Ranked #45 on
Question Answering
on PIQA
1 code implementation • NeurIPS 2023 • Dan Friedman, Alexander Wettig, Danqi Chen
Recent research in mechanistic interpretability has attempted to reverse-engineer Transformer models by carefully inspecting network weights and activations.
3 code implementations • NeurIPS 2023 • Sadhika Malladi, Tianyu Gao, Eshaan Nichani, Alex Damian, Jason D. Lee, Danqi Chen, Sanjeev Arora
Fine-tuning language models (LMs) has yielded success on diverse downstream tasks, but as LMs grow in size, backpropagation requires a prohibitively large amount of memory.
1 code implementation • 24 May 2023 • Ameet Deshpande, Carlos E. Jimenez, Howard Chen, Vishvak Murahari, Victoria Graf, Tanmay Rajpurohit, Ashwin Kalyan, Danqi Chen, Karthik Narasimhan
Semantic textual similarity (STS), a cornerstone task in NLP, measures the degree of similarity between a pair of sentences, and has broad application in fields such as information retrieval and natural language understanding.
1 code implementation • 24 May 2023 • Yangsibo Huang, Samyak Gupta, Zexuan Zhong, Kai Li, Danqi Chen
Crucially, we find that $k$NN-LMs are more susceptible to leaking private information from their private datastore than parametric models.
1 code implementation • 24 May 2023 • Tianyu Gao, Howard Yen, Jiatong Yu, Danqi Chen
We propose ALCE, the first benchmark for Automatic LLMs' Citation Evaluation.
2 code implementations • 24 May 2023 • Zexuan Zhong, Zhengxuan Wu, Christopher D. Manning, Christopher Potts, Danqi Chen
The information stored in large language models (LLMs) falls out of date quickly, and retraining from scratch is often not an option.
1 code implementation • 24 May 2023 • Alexis Chevalier, Alexander Wettig, Anirudh Ajith, Danqi Chen
Transformer-based language models (LMs) are powerful and widely-applicable tools, but their usefulness is constrained by a finite context window and the expensive computational cost of processing long text documents.
1 code implementation • 22 May 2023 • Chenglei Si, Dan Friedman, Nitish Joshi, Shi Feng, Danqi Chen, He He
We investigate the inductive biases of ICL from the perspective of feature bias: which feature ICL is more likely to use given a set of underspecified demonstrations in which two features are equally predictive of the labels.
1 code implementation • 16 May 2023 • Jane Pan, Tianyu Gao, Howard Chen, Danqi Chen
Large language models (LLMs) exploit in-context learning (ICL) to solve tasks with only a few demonstrations, but its mechanisms are not yet well-understood.
no code implementations • 20 Dec 2022 • Howard Chen, Huihan Li, Danqi Chen, Karthik Narasimhan
We consider the task of text generation in language models with constraints specified in natural language.
1 code implementation • 19 Dec 2022 • Mengzhou Xia, Mikel Artetxe, Chunting Zhou, Xi Victoria Lin, Ramakanth Pasunuru, Danqi Chen, Luke Zettlemoyer, Ves Stoyanov
Why do larger language models demonstrate more desirable behaviors?
2 code implementations • 26 Oct 2022 • Jacqueline He, Mengzhou Xia, Christiane Fellbaum, Danqi Chen
To this end, we propose MABEL (a Method for Attenuating Gender Bias using Entailment Labels), an intermediate pre-training approach for mitigating gender bias in contextualized representations.
no code implementations • 26 Oct 2022 • Mozes van de Kar, Mengzhou Xia, Danqi Chen, Mikel Artetxe
Our results suggest that the success of prompting can partly be explained by the model being exposed to similar examples during pretraining, which can be directly retrieved through regular expressions.
1 code implementation • 20 Oct 2022 • Dan Friedman, Alexander Wettig, Danqi Chen
Many NLP datasets have been found to contain shortcuts: simple decision rules that achieve surprisingly high accuracy.
1 code implementation • 11 Oct 2022 • Sadhika Malladi, Alexander Wettig, Dingli Yu, Danqi Chen, Sanjeev Arora
It has become standard to solve NLP tasks by fine-tuning pre-trained language models (LMs), especially in low-data settings.
5 code implementations • 9 Jun 2022 • Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza, Ambrose Slone, Ameet Rahane, Anantharaman S. Iyer, Anders Andreassen, Andrea Madotto, Andrea Santilli, Andreas Stuhlmüller, Andrew Dai, Andrew La, Andrew Lampinen, Andy Zou, Angela Jiang, Angelica Chen, Anh Vuong, Animesh Gupta, Anna Gottardi, Antonio Norelli, Anu Venkatesh, Arash Gholamidavoodi, Arfa Tabassum, Arul Menezes, Arun Kirubarajan, Asher Mullokandov, Ashish Sabharwal, Austin Herrick, Avia Efrat, Aykut Erdem, Ayla Karakaş, B. Ryan Roberts, Bao Sheng Loe, Barret Zoph, Bartłomiej Bojanowski, Batuhan Özyurt, Behnam Hedayatnia, Behnam Neyshabur, Benjamin Inden, Benno Stein, Berk Ekmekci, Bill Yuchen Lin, Blake Howald, Bryan Orinion, Cameron Diao, Cameron Dour, Catherine Stinson, Cedrick Argueta, César Ferri Ramírez, Chandan Singh, Charles Rathkopf, Chenlin Meng, Chitta Baral, Chiyu Wu, Chris Callison-Burch, Chris Waites, Christian Voigt, Christopher D. Manning, Christopher Potts, Cindy Ramirez, Clara E. Rivera, Clemencia Siro, Colin Raffel, Courtney Ashcraft, Cristina Garbacea, Damien Sileo, Dan Garrette, Dan Hendrycks, Dan Kilman, Dan Roth, Daniel Freeman, Daniel Khashabi, Daniel Levy, Daniel Moseguí González, Danielle Perszyk, Danny Hernandez, Danqi Chen, Daphne Ippolito, Dar Gilboa, David Dohan, David Drakard, David Jurgens, Debajyoti Datta, Deep Ganguli, Denis Emelin, Denis Kleyko, Deniz Yuret, Derek Chen, Derek Tam, Dieuwke Hupkes, Diganta Misra, Dilyar Buzan, Dimitri Coelho Mollo, Diyi Yang, Dong-Ho Lee, Dylan Schrader, Ekaterina Shutova, Ekin Dogus Cubuk, Elad Segal, Eleanor Hagerman, Elizabeth Barnes, Elizabeth Donoway, Ellie Pavlick, Emanuele Rodola, Emma Lam, Eric Chu, Eric Tang, Erkut Erdem, Ernie Chang, Ethan A. Chi, Ethan Dyer, Ethan Jerzak, Ethan Kim, Eunice Engefu Manyasi, Evgenii Zheltonozhskii, Fanyue Xia, Fatemeh Siar, Fernando Martínez-Plumed, Francesca Happé, Francois Chollet, Frieda Rong, Gaurav Mishra, Genta Indra Winata, Gerard de Melo, Germán Kruszewski, Giambattista Parascandolo, Giorgio Mariani, Gloria Wang, Gonzalo Jaimovitch-López, Gregor Betz, Guy Gur-Ari, Hana Galijasevic, Hannah Kim, Hannah Rashkin, Hannaneh Hajishirzi, Harsh Mehta, Hayden Bogar, Henry Shevlin, Hinrich Schütze, Hiromu Yakura, Hongming Zhang, Hugh Mee Wong, Ian Ng, Isaac Noble, Jaap Jumelet, Jack Geissinger, Jackson Kernion, Jacob Hilton, Jaehoon Lee, Jaime Fernández Fisac, James B. Simon, James Koppel, James Zheng, James Zou, Jan Kocoń, Jana Thompson, Janelle Wingfield, Jared Kaplan, Jarema Radom, Jascha Sohl-Dickstein, Jason Phang, Jason Wei, Jason Yosinski, Jekaterina Novikova, Jelle Bosscher, Jennifer Marsh, Jeremy Kim, Jeroen Taal, Jesse Engel, Jesujoba Alabi, Jiacheng Xu, Jiaming Song, Jillian Tang, Joan Waweru, John Burden, John Miller, John U. Balis, Jonathan Batchelder, Jonathan Berant, Jörg Frohberg, Jos Rozen, Jose Hernandez-Orallo, Joseph Boudeman, Joseph Guerr, Joseph Jones, Joshua B. Tenenbaum, Joshua S. Rule, Joyce Chua, Kamil Kanclerz, Karen Livescu, Karl Krauth, Karthik Gopalakrishnan, Katerina Ignatyeva, Katja Markert, Kaustubh D. Dhole, Kevin Gimpel, Kevin Omondi, Kory Mathewson, Kristen Chiafullo, Ksenia Shkaruta, Kumar Shridhar, Kyle McDonell, Kyle Richardson, Laria Reynolds, Leo Gao, Li Zhang, Liam Dugan, Lianhui Qin, Lidia Contreras-Ochando, Louis-Philippe Morency, Luca Moschella, Lucas Lam, Lucy Noble, Ludwig Schmidt, Luheng He, Luis Oliveros Colón, Luke Metz, Lütfi Kerem Şenel, Maarten Bosma, Maarten Sap, Maartje ter Hoeve, Maheen Farooqi, Manaal Faruqui, Mantas Mazeika, Marco Baturan, Marco Marelli, Marco Maru, Maria Jose Ramírez Quintana, Marie Tolkiehn, Mario Giulianelli, Martha Lewis, Martin Potthast, Matthew L. Leavitt, Matthias Hagen, Mátyás Schubert, Medina Orduna Baitemirova, Melody Arnaud, Melvin McElrath, Michael A. Yee, Michael Cohen, Michael Gu, Michael Ivanitskiy, Michael Starritt, Michael Strube, Michał Swędrowski, Michele Bevilacqua, Michihiro Yasunaga, Mihir Kale, Mike Cain, Mimee Xu, Mirac Suzgun, Mitch Walker, Mo Tiwari, Mohit Bansal, Moin Aminnaseri, Mor Geva, Mozhdeh Gheini, Mukund Varma T, Nanyun Peng, Nathan A. Chi, Nayeon Lee, Neta Gur-Ari Krakover, Nicholas Cameron, Nicholas Roberts, Nick Doiron, Nicole Martinez, Nikita Nangia, Niklas Deckers, Niklas Muennighoff, Nitish Shirish Keskar, Niveditha S. Iyer, Noah Constant, Noah Fiedel, Nuan Wen, Oliver Zhang, Omar Agha, Omar Elbaghdadi, Omer Levy, Owain Evans, Pablo Antonio Moreno Casares, Parth Doshi, Pascale Fung, Paul Pu Liang, Paul Vicol, Pegah Alipoormolabashi, Peiyuan Liao, Percy Liang, Peter Chang, Peter Eckersley, Phu Mon Htut, Pinyu Hwang, Piotr Miłkowski, Piyush Patil, Pouya Pezeshkpour, Priti Oli, Qiaozhu Mei, Qing Lyu, Qinlang Chen, Rabin Banjade, Rachel Etta Rudolph, Raefer Gabriel, Rahel Habacker, Ramon Risco, Raphaël Millière, Rhythm Garg, Richard Barnes, Rif A. Saurous, Riku Arakawa, Robbe Raymaekers, Robert Frank, Rohan Sikand, Roman Novak, Roman Sitelew, Ronan LeBras, Rosanne Liu, Rowan Jacobs, Rui Zhang, Ruslan Salakhutdinov, Ryan Chi, Ryan Lee, Ryan Stovall, Ryan Teehan, Rylan Yang, Sahib Singh, Saif M. Mohammad, Sajant Anand, Sam Dillavou, Sam Shleifer, Sam Wiseman, Samuel Gruetter, Samuel R. Bowman, Samuel S. Schoenholz, Sanghyun Han, Sanjeev Kwatra, Sarah A. Rous, Sarik Ghazarian, Sayan Ghosh, Sean Casey, Sebastian Bischoff, Sebastian Gehrmann, Sebastian Schuster, Sepideh Sadeghi, Shadi Hamdan, Sharon Zhou, Shashank Srivastava, Sherry Shi, Shikhar Singh, Shima Asaadi, Shixiang Shane Gu, Shubh Pachchigar, Shubham Toshniwal, Shyam Upadhyay, Shyamolima, Debnath, Siamak Shakeri, Simon Thormeyer, Simone Melzi, Siva Reddy, Sneha Priscilla Makini, Soo-Hwan Lee, Spencer Torene, Sriharsha Hatwar, Stanislas Dehaene, Stefan Divic, Stefano Ermon, Stella Biderman, Stephanie Lin, Stephen Prasad, Steven T. Piantadosi, Stuart M. Shieber, Summer Misherghi, Svetlana Kiritchenko, Swaroop Mishra, Tal Linzen, Tal Schuster, Tao Li, Tao Yu, Tariq Ali, Tatsu Hashimoto, Te-Lin Wu, Théo Desbordes, Theodore Rothschild, Thomas Phan, Tianle Wang, Tiberius Nkinyili, Timo Schick, Timofei Kornev, Titus Tunduny, Tobias Gerstenberg, Trenton Chang, Trishala Neeraj, Tushar Khot, Tyler Shultz, Uri Shaham, Vedant Misra, Vera Demberg, Victoria Nyamai, Vikas Raunak, Vinay Ramasesh, Vinay Uday Prabhu, Vishakh Padmakumar, Vivek Srikumar, William Fedus, William Saunders, William Zhang, Wout Vossen, Xiang Ren, Xiaoyu Tong, Xinran Zhao, Xinyi Wu, Xudong Shen, Yadollah Yaghoobzadeh, Yair Lakretz, Yangqiu Song, Yasaman Bahri, Yejin Choi, Yichi Yang, Yiding Hao, Yifu Chen, Yonatan Belinkov, Yu Hou, Yufang Hou, Yuntao Bai, Zachary Seid, Zhuoye Zhao, Zijian Wang, Zijie J. Wang, ZiRui Wang, Ziyi Wu
BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models.
1 code implementation • 30 May 2022 • Mengzhou Xia, Mikel Artetxe, Jingfei Du, Danqi Chen, Ves Stoyanov
In this work, we adapt prompt-based few-shot learning to ELECTRA and show that it outperforms masked language models in a wide range of tasks.
1 code implementation • 25 May 2022 • Mujeen Sung, Jungsoo Park, Jaewoo Kang, Danqi Chen, Jinhyuk Lee
In this paper, we introduce TOUR (Test-Time Optimization of Query Representations), which further optimizes instance-level query representations guided by signals from test-time retrieval results.
1 code implementation • 25 May 2022 • Zexuan Zhong, Tao Lei, Danqi Chen
Recent work has improved language models (LMs) remarkably by equipping them with a non-parametric memory component.
1 code implementation • 25 May 2022 • Kaiyu Yang, Jia Deng, Danqi Chen
In this paper, we present a novel stepwise method, NLProofS (Natural Language Proof Search), which learns to generate relevant steps conditioning on the hypothesis.
1 code implementation • 17 May 2022 • Samyak Gupta, Yangsibo Huang, Zexuan Zhong, Tianyu Gao, Kai Li, Danqi Chen
For the first time, we show the feasibility of recovering text from large batch sizes of up to 128 sentences.
1 code implementation • NAACL 2022 • Howard Chen, Jacqueline He, Karthik Narasimhan, Danqi Chen
Our experiments reveal that the rationale models show the promise to improve robustness, while they struggle in certain scenarios--when the rationalizer is sensitive to positional bias or lexical choices of attack text.
2 code implementations • ACL 2022 • Mengzhou Xia, Zexuan Zhong, Danqi Chen
The growing size of neural language models has led to increased attention in model compression.
1 code implementation • 16 Feb 2022 • Alexander Wettig, Tianyu Gao, Zexuan Zhong, Danqi Chen
In this work, we revisit this important choice of MLM pre-training.
2 code implementations • ACL 2022 • Huihan Li, Tianyu Gao, Manan Goenka, Danqi Chen
In this work, we conduct the first large-scale human evaluation of state-of-the-art conversational QA systems, where human evaluators converse with models and judge the correctness of their answers.
1 code implementation • EMNLP 2021 • Dan Friedman, Ben Dodge, Danqi Chen
Many datasets have been created for training reading comprehension models, and a natural question is whether we can combine them to build models that (1) perform better on all of the training datasets and (2) generalize and transfer better to new datasets.
1 code implementation • EMNLP 2021 • Christopher Sciavolino, Zexuan Zhong, Jinhyuk Lee, Danqi Chen
Open-domain question answering has exploded in popularity recently due to the success of dense retrieval models, which have surpassed sparse models using only a few supervised training examples.
Ranked #3 on
Passage Retrieval
on EntityQuestions
1 code implementation • EMNLP 2021 • Jinhyuk Lee, Alexander Wettig, Danqi Chen
Dense retrieval methods have shown great promise over sparse retrieval methods in a range of NLP problems.
1 code implementation • NAACL 2021 • Howard Chen, Mengzhou Xia, Danqi Chen
One significant challenge in supervised all-words WSD is to classify among senses for a majority of words that lie in the long-tail distribution.
23 code implementations • EMNLP 2021 • Tianyu Gao, Xingcheng Yao, Danqi Chen
This paper presents SimCSE, a simple contrastive learning framework that greatly advances state-of-the-art sentence embeddings.
Ranked #3 on
Linear-Probe Classification
on SentEval
2 code implementations • NAACL 2021 • Zexuan Zhong, Dan Friedman, Danqi Chen
Petroni et al. (2019) demonstrated that it is possible to retrieve world facts from a pre-trained language model by expressing them as cloze-style prompts and interpret the model's prediction accuracy as a lower bound on the amount of factual information it encodes.
no code implementations • 1 Jan 2021 • Sewon Min, Jordan Boyd-Graber, Chris Alberti, Danqi Chen, Eunsol Choi, Michael Collins, Kelvin Guu, Hannaneh Hajishirzi, Kenton Lee, Jennimaria Palomaki, Colin Raffel, Adam Roberts, Tom Kwiatkowski, Patrick Lewis, Yuxiang Wu, Heinrich Küttler, Linqing Liu, Pasquale Minervini, Pontus Stenetorp, Sebastian Riedel, Sohee Yang, Minjoon Seo, Gautier Izacard, Fabio Petroni, Lucas Hosseini, Nicola De Cao, Edouard Grave, Ikuya Yamada, Sonse Shimaoka, Masatoshi Suzuki, Shumpei Miyawaki, Shun Sato, Ryo Takahashi, Jun Suzuki, Martin Fajcik, Martin Docekal, Karel Ondrej, Pavel Smrz, Hao Cheng, Yelong Shen, Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao, Barlas Oguz, Xilun Chen, Vladimir Karpukhin, Stan Peshterliev, Dmytro Okhonko, Michael Schlichtkrull, Sonal Gupta, Yashar Mehdad, Wen-tau Yih
We review the EfficientQA competition from NeurIPS 2020.
9 code implementations • ACL 2021 • Tianyu Gao, Adam Fisch, Danqi Chen
We present LM-BFF--better few-shot fine-tuning of language models--a suite of simple and complementary techniques for fine-tuning language models on a small number of annotated examples.
Ranked #6 on
Zero-Shot Text Classification
on AG News
4 code implementations • ACL 2021 • Jinhyuk Lee, Mujeen Sung, Jaewoo Kang, Danqi Chen
Open-domain question answering can be reformulated as a phrase retrieval problem, without the need for processing documents on-demand during inference (Seo et al., 2019).
Ranked #1 on
Question Answering
on Natural Questions (long)
2 code implementations • NAACL 2021 • Zexuan Zhong, Danqi Chen
Our approach essentially builds on two independent encoders and merely uses the entity model to construct the input for the relation model.
Ranked #1 on
Named Entity Recognition (NER)
on ACE 2005
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Yangsibo Huang, Zhao Song, Danqi Chen, Kai Li, Sanjeev Arora
In addition, TextHide fits well with the popular framework of fine-tuning pre-trained language models (e. g., BERT) for any sentence or sentence-pair task.
no code implementations • ACL 2020 • Danqi Chen, Wen-tau Yih
This tutorial provides a comprehensive and coherent overview of cutting-edge research in open-domain question answering (QA), the task of answering questions using a large collection of documents of diversified topics.
18 code implementations • EMNLP 2020 • Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, Wen-tau Yih
Open-domain question answering relies on efficient passage retrieval to select candidate contexts, where traditional sparse vector space models, such as TF-IDF or BM25, are the de facto method.
Ranked #1 on
Question Answering
on NaturalQA
7 code implementations • 10 Nov 2019 • Sewon Min, Danqi Chen, Luke Zettlemoyer, Hannaneh Hajishirzi
We introduce an approach for open-domain question answering (QA) that retrieves and reads a passage graph, where vertices are passages of text and edges represent relationships that are derived from an external knowledge base or co-occurrence in the same article.
1 code implementation • WS 2019 • Adam Fisch, Alon Talmor, Robin Jia, Minjoon Seo, Eunsol Choi, Danqi Chen
We present the results of the Machine Reading for Question Answering (MRQA) 2019 shared task on evaluating the generalization capabilities of reading comprehension systems.
1 code implementation • IJCNLP 2019 • Sewon Min, Danqi Chen, Hannaneh Hajishirzi, Luke Zettlemoyer
Many question answering (QA) tasks only provide weak supervision for how the answer should be computed.
Ranked #2 on
Question Answering
on NarrativeQA
67 code implementations • 26 Jul 2019 • Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov
Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging.
Ranked #1 on
Only Connect Walls Dataset Task 1 (Grouping)
on OCW
(Wasserstein Distance (WD) metric, using extra
training data)
6 code implementations • TACL 2020 • Mandar Joshi, Danqi Chen, Yinhan Liu, Daniel S. Weld, Luke Zettlemoyer, Omer Levy
We present SpanBERT, a pre-training method that is designed to better represent and predict spans of text.
Ranked #1 on
Question Answering
on TriviaQA
(F1 metric)
4 code implementations • TACL 2019 • Siva Reddy, Danqi Chen, Christopher D. Manning
Humans gather information by engaging in conversations involving a series of interconnected questions and answers.
Ranked #3 on
Generative Question Answering
on CoQA
2 code implementations • EMNLP 2017 • Yuhao Zhang, Victor Zhong, Danqi Chen, Gabor Angeli, Christopher D. Manning
The combination of better supervised data and a more appropriate high-capacity model enables much better relation extraction performance.
Ranked #7 on
Relation Extraction
on Re-TACRED
10 code implementations • ACL 2017 • Danqi Chen, Adam Fisch, Jason Weston, Antoine Bordes
This paper proposes to tackle open- domain question answering using Wikipedia as the unique knowledge source: the answer to any factoid question is a text span in a Wikipedia article.
Ranked #1 on
Open-Domain Question Answering
on SQuAD1.1
3 code implementations • ACL 2016 • Danqi Chen, Jason Bolton, Christopher D. Manning
Enabling a computer to understand a document so that it can answer comprehension questions is a central, yet unsolved goal of NLP.
Ranked #3 on
Question Answering
on CNN / Daily Mail
no code implementations • NeurIPS 2013 • Richard Socher, Danqi Chen, Christopher D. Manning, Andrew Ng
We assess the model by considering the problem of predicting additional true relations between entities given a partial knowledge base.