1 code implementation • 28 Oct 2024 • Julie Kallini, Shikhar Murty, Christopher D. Manning, Christopher Potts, Róbert Csordás
Models that rely on subword tokenization have significant drawbacks, such as sensitivity to character-level noise like spelling errors and inconsistent compression rates across different languages and scripts.
no code implementations • 22 Oct 2024 • Dante Everaert, Rohit Patki, Tianqi Zheng, Christopher Potts
Query Autocomplete (QAC) is a critical feature in modern search engines, facilitating user interaction by predicting search queries based on input prefixes.
1 code implementation • 21 Oct 2024 • Aryaman Arora, Dan Jurafsky, Christopher Potts, Noah D. Goodman
In all cases, Bayesian scaling laws accurately predict the conditions under which ICL will cause the suppressed behavior to reemerge, which sheds light on the ineffectiveness of post-training at increasing LLM safety.
no code implementations • 15 Oct 2024 • Xuan Guo, Rohit Patki, Dante Everaert, Christopher Potts
The rapid introduction of new brand names into everyday language poses a unique challenge for e-commerce spelling correction services, which must distinguish genuine misspellings from novel brand names that use unconventional spelling.
no code implementations • 9 Sep 2024 • Tristan Thrush, Christopher Potts, Tatsunori Hashimoto
Quality pretraining data is often seen as the key to high-performance language models.
1 code implementation • 20 Aug 2024 • Róbert Csordás, Christopher Potts, Christopher D. Manning, Atticus Geiger
In this paper, we present a counterexample to this strong LRH: when trained to repeat an input token sequence, gated recurrent neural networks (RNNs) learn to represent the token at each position with a particular order of magnitude, rather than a direction.
1 code implementation • 12 Aug 2024 • Karel D'Oosterlinck, Winnie Xu, Chris Develder, Thomas Demeester, Amanpreet Singh, Christopher Potts, Douwe Kiela, Shikib Mehri
We study this and find that (i) preference data gives a better learning signal when the underlying responses are contrastive, and (ii) alignment objectives lead to better performance when they specify more control over the model during training.
1 code implementation • 25 Jul 2024 • Jing Huang, Diyi Yang, Christopher Potts
Large Language Models (LLMs) frequently memorize long sequences verbatim, often with serious legal and privacy implications.
no code implementations • 15 Jul 2024 • Dilara Soylu, Christopher Potts, Omar Khattab
Natural Language Processing (NLP) systems are increasingly taking the form of sophisticated modular pipelines, e. g., Retrieval Augmented Generation (RAG), where each module may involve a distinct Language Model (LM) and an associated prompt template.
1 code implementation • 17 Jun 2024 • Krista Opsahl-Ong, Michael J Ryan, Josh Purtell, David Broman, Christopher Potts, Matei Zaharia, Omar Khattab
To make this tractable, we factorize our problem into optimizing the free-form instructions and few-shot demonstrations of every module and introduce several strategies to craft task-grounded instructions and navigate credit assignment across modules.
no code implementations • 17 Jun 2024 • Jasper Xian, Saron Samuel, Faraz Khoubsirat, Ronak Pradeep, Md Arafat Sultan, Radu Florian, Salim Roukos, Avirup Sil, Christopher Potts, Omar Khattab
We develop a method for training small-scale (under 100M parameter) neural information retrieval models with as few as 10 gold relevance labels.
1 code implementation • 12 Jun 2024 • Amir Zur, Elisa Kreiss, Karel D'Oosterlinck, Christopher Potts, Atticus Geiger
This model correlates with the judgements of blind and low-vision people while preserving transfer capabilities and has interpretable structure that sheds light on the caption--description distinction.
1 code implementation • 25 May 2024 • Róbert Csordás, Kazuki Irie, Jürgen Schmidhuber, Christopher Potts, Christopher D. Manning
The resulting UT model, for the first time, slightly outperforms standard Transformers on language modeling tasks such as BLiMP and PIQA, while using significantly less compute and memory.
2 code implementations • 4 Apr 2024 • Zhengxuan Wu, Aryaman Arora, Zheng Wang, Atticus Geiger, Dan Jurafsky, Christopher D. Manning, Christopher Potts
We define a strong instance of the ReFT family, Low-rank Linear Subspace ReFT (LoReFT), and we identify an ablation of this method that trades some performance for increased efficiency.
1 code implementation • 1 Apr 2024 • Weixin Liang, Yaohui Zhang, Zhengxuan Wu, Haley Lepp, Wenlong Ji, Xuandong Zhao, Hancheng Cao, Sheng Liu, Siyu He, Zhi Huang, Diyi Yang, Christopher Potts, Christopher D Manning, James Y. Zou
To address this gap, we conduct the first systematic, large-scale analysis across 950, 965 papers published between January 2020 and February 2024 on the arXiv, bioRxiv, and Nature portfolio journals, using a population-level statistical framework to measure the prevalence of LLM-modified content over time.
3 code implementations • 12 Mar 2024 • Zhengxuan Wu, Atticus Geiger, Aryaman Arora, Jing Huang, Zheng Wang, Noah D. Goodman, Christopher D. Manning, Christopher Potts
Interventions on model-internal states are fundamental operations in many areas of AI, including model editing, steering, robustness, and interpretability.
1 code implementation • 27 Feb 2024 • Jing Huang, Zhengxuan Wu, Christopher Potts, Mor Geva, Atticus Geiger
Individual neurons participate in the representation of multiple high-level concepts.
1 code implementation • 22 Feb 2024 • Nandita Shankar Naik, Christopher Potts, Elisa Kreiss
To evaluate how situating images within naturalistic contexts shapes visual questions, we introduce CommVQA, a VQA dataset consisting of images, image descriptions, real-world communicative scenarios where the image might appear (e. g., a travel website), and follow-up questions and answers conditioned on the scenario and description.
1 code implementation • 19 Feb 2024 • Aryaman Arora, Dan Jurafsky, Christopher Potts
Language models (LMs) have proven to be powerful tools for psycholinguistic research, but most prior work has focused on purely behavioural measures (e. g., surprisal comparisons).
1 code implementation • 23 Jan 2024 • Zhengxuan Wu, Atticus Geiger, Jing Huang, Aryaman Arora, Thomas Icard, Christopher Potts, Noah D. Goodman
We respond to the recent paper by Makelov et al. (2023), which reviews subspace interchange intervention methods like distributed alignment search (DAS; Geiger et al. 2023) and claims that these methods potentially cause "interpretability illusions".
2 code implementations • 22 Jan 2024 • Karel D'Oosterlinck, Omar Khattab, François Remy, Thomas Demeester, Chris Develder, Christopher Potts
Multi-label classification problems with thousands of classes are hard to solve with in-context learning alone, as language models (LMs) might lack prior knowledge about the precise classes or how to assign them, and it is generally infeasible to demonstrate every class in a prompt.
1 code implementation • 12 Jan 2024 • Julie Kallini, Isabel Papadimitriou, Richard Futrell, Kyle Mahowald, Christopher Potts
Chomsky and others have very directly claimed that large language models (LLMs) are equally capable of learning languages that are possible and impossible for humans to learn.
1 code implementation • 10 Jan 2024 • Tristan Thrush, Jared Moore, Miguel Monares, Christopher Potts, Douwe Kiela
We also provide minimally different metalinguistic non-self-reference examples to complement the main dataset by probing for whether models can handle metalinguistic language at all.
1 code implementation • 7 Jan 2024 • Emrah Budur, Rıza Özçelik, Dilara Soylu, Omar Khattab, Tunga Güngör, Christopher Potts
Our results show that SQuAD-TR makes OpenQA feasible for Turkish, which we hope encourages researchers to build OpenQA systems in other low-resource languages.
1 code implementation • 20 Dec 2023 • Arnav Singhvi, Manish Shetty, Shangyin Tan, Christopher Potts, Koushik Sen, Matei Zaharia, Omar Khattab
We integrate our constructs into the recent DSPy programming model for LMs, and present new strategies that allow DSPy to compile programs with LM Assertions into more reliable and accurate systems.
no code implementations • 20 Nov 2023 • Jingfen Zhang, Xuan Guo, Sravan Bodapati, Christopher Potts
Accurate spelling correction is a critical step in modern search interfaces, especially in an era of mobile devices and speech-to-text interfaces.
no code implementations • 17 Nov 2023 • Karel D'Oosterlinck, Thomas Demeester, Chris Develder, Christopher Potts
Model interpretability and model editing are crucial goals in the age of large language models.
1 code implementation • 16 Nov 2023 • Jon Saad-Falcon, Omar Khattab, Christopher Potts, Matei Zaharia
Evaluating retrieval-augmented generation (RAG) systems traditionally relies on hand annotations for input queries, passages to retrieve, and responses to generate.
1 code implementation • 9 Oct 2023 • Karel D'Oosterlinck, Semere Kiros Bitew, Brandon Papineau, Christopher Potts, Thomas Demeester, Chris Develder
State-of-the-art coreference resolutions systems depend on multiple LLM calls per document and are thus prohibitively expensive for many use cases (e. g., information extraction with large corpora).
Ranked #4 on Coreference Resolution on OntoNotes
2 code implementations • 5 Oct 2023 • Omar Khattab, Arnav Singhvi, Paridhi Maheshwari, Zhiyuan Zhang, Keshav Santhanam, Sri Vardhamanan, Saiful Haq, Ashutosh Sharma, Thomas T. Joshi, Hanna Moazam, Heather Miller, Matei Zaharia, Christopher Potts
The ML community is rapidly exploring techniques for prompting language models (LMs) and for stacking them into pipelines that solve complex tasks.
1 code implementation • 21 Sep 2023 • Elisa Kreiss, Eric Zelikman, Christopher Potts, Nick Haber
None of the methods is successful with ContextRef, but we show that careful fine-tuning yields substantial improvements.
no code implementations • 19 Sep 2023 • Jing Huang, Atticus Geiger, Karel D'Oosterlinck, Zhengxuan Wu, Christopher Potts
Natural language is an appealing medium for explaining how large language models process and store information, but evaluating the faithfulness of such explanations is challenging.
1 code implementation • 28 Jul 2023 • Nandita Naik, Christopher Potts, Elisa Kreiss
We also find that context effects are especially important when participants can't see the image.
1 code implementation • 20 Jun 2023 • Dante Everaert, Christopher Potts
Our contribution is in showing that it can be made highly scalable through a simple relaxation of the objective and a highly efficient implementation.
1 code implementation • 30 May 2023 • Jingyuan Selena She, Christopher Potts, Samuel R. Bowman, Atticus Geiger
For in-context learning, we test InstructGPT models and find that most prompt strategies are not successful, including those using step-by-step reasoning.
2 code implementations • 24 May 2023 • Zexuan Zhong, Zhengxuan Wu, Christopher D. Manning, Christopher Potts, Danqi Chen
The information stored in large language models (LLMs) falls out of date quickly, and retraining from scratch is often not an option.
1 code implementation • 22 May 2023 • Karel D'Oosterlinck, François Remy, Johannes Deleu, Thomas Demeester, Chris Develder, Klim Zaporojets, Aneiss Ghodsi, Simon Ellershaw, Jack Collins, Christopher Potts
We introduce BioDEX, a large-scale resource for Biomedical adverse Drug Event Extraction, rooted in the historical output of drug safety reporting in the U. S. BioDEX consists of 65k abstracts and 19k full-text biomedical papers with 256k associated document-level safety reports created by medical experts.
1 code implementation • NeurIPS 2023 • Zhengxuan Wu, Atticus Geiger, Thomas Icard, Christopher Potts, Noah D. Goodman
With Boundless DAS, we discover that Alpaca does this by implementing a causal model with two interpretable boolean variables.
1 code implementation • 24 Mar 2023 • Zhengxuan Wu, Christopher D. Manning, Christopher Potts
We argue that this concern is realized for the COGS benchmark.
1 code implementation • 5 Mar 2023 • Atticus Geiger, Zhengxuan Wu, Christopher Potts, Thomas Icard, Noah D. Goodman
In DAS, we find the alignment between high-level and low-level models using gradient descent rather than conducting a brute-force search, and we allow individual neurons to play multiple distinct roles by analyzing representations in non-standard bases-distributed representations.
1 code implementation • 1 Mar 2023 • Jon Saad-Falcon, Omar Khattab, Keshav Santhanam, Radu Florian, Martin Franz, Salim Roukos, Avirup Sil, Md Arafat Sultan, Christopher Potts
Many information retrieval tasks require large labeled datasets for fine-tuning.
no code implementations • 11 Jan 2023 • Atticus Geiger, Duligur Ibeling, Amir Zur, Maheep Chaudhary, Sonakshi Chauhan, Jing Huang, Aryaman Arora, Zhengxuan Wu, Noah Goodman, Christopher Potts, Thomas Icard
Causal abstraction provides a theoretical foundation for mechanistic interpretability, the field concerned with providing intelligible algorithms that are faithful simplifications of the known, but opaque low-level details of black box AI models.
2 code implementations • 28 Dec 2022 • Omar Khattab, Keshav Santhanam, Xiang Lisa Li, David Hall, Percy Liang, Christopher Potts, Matei Zaharia
Retrieval-augmented in-context learning has emerged as a powerful approach for addressing knowledge-intensive tasks using frozen language models (LM) and retrieval models (RM).
1 code implementation • 19 Dec 2022 • Jing Huang, Zhengxuan Wu, Kyle Mahowald, Christopher Potts
Language tasks involving character-level manipulations (e. g., spelling corrections, arithmetic operations, word games) are challenging for models operating on subword units.
no code implementations • 19 Dec 2022 • Daniel N. Sosa, Malavika Suresh, Christopher Potts, Russ B. Altman
Our task is to automatically identify contradictory claims about COVID-19 drug efficacy.
no code implementations • 2 Dec 2022 • Keshav Santhanam, Jon Saad-Falcon, Martin Franz, Omar Khattab, Avirup Sil, Radu Florian, Md Arafat Sultan, Salim Roukos, Matei Zaharia, Christopher Potts
Neural information retrieval (IR) systems have progressed rapidly in recent years, in large part due to the release of publicly available benchmarking tasks.
no code implementations • 18 Oct 2022 • Siyan Li, Riley Carlson, Christopher Potts
However, this evidence is consistent with GPT3 reasoning only about specific lexical items rather than the more abstract conceptual categories of Levin et al.'s theory.
1 code implementation • 28 Sep 2022 • Zhengxuan Wu, Karel D'Oosterlinck, Atticus Geiger, Amir Zur, Christopher Potts
The core of our proposal is the Causal Proxy Model (CPM).
1 code implementation • 16 Sep 2022 • Ben Prystawski, Paul Thibodeau, Christopher Potts, Noah D. Goodman
Probabilistic models of language understanding are valuable tools for investigating human language use.
4 code implementations • 9 Jun 2022 • Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza, Ambrose Slone, Ameet Rahane, Anantharaman S. Iyer, Anders Andreassen, Andrea Madotto, Andrea Santilli, Andreas Stuhlmüller, Andrew Dai, Andrew La, Andrew Lampinen, Andy Zou, Angela Jiang, Angelica Chen, Anh Vuong, Animesh Gupta, Anna Gottardi, Antonio Norelli, Anu Venkatesh, Arash Gholamidavoodi, Arfa Tabassum, Arul Menezes, Arun Kirubarajan, Asher Mullokandov, Ashish Sabharwal, Austin Herrick, Avia Efrat, Aykut Erdem, Ayla Karakaş, B. Ryan Roberts, Bao Sheng Loe, Barret Zoph, Bartłomiej Bojanowski, Batuhan Özyurt, Behnam Hedayatnia, Behnam Neyshabur, Benjamin Inden, Benno Stein, Berk Ekmekci, Bill Yuchen Lin, Blake Howald, Bryan Orinion, Cameron Diao, Cameron Dour, Catherine Stinson, Cedrick Argueta, César Ferri Ramírez, Chandan Singh, Charles Rathkopf, Chenlin Meng, Chitta Baral, Chiyu Wu, Chris Callison-Burch, Chris Waites, Christian Voigt, Christopher D. Manning, Christopher Potts, Cindy Ramirez, Clara E. Rivera, Clemencia Siro, Colin Raffel, Courtney Ashcraft, Cristina Garbacea, Damien Sileo, Dan Garrette, Dan Hendrycks, Dan Kilman, Dan Roth, Daniel Freeman, Daniel Khashabi, Daniel Levy, Daniel Moseguí González, Danielle Perszyk, Danny Hernandez, Danqi Chen, Daphne Ippolito, Dar Gilboa, David Dohan, David Drakard, David Jurgens, Debajyoti Datta, Deep Ganguli, Denis Emelin, Denis Kleyko, Deniz Yuret, Derek Chen, Derek Tam, Dieuwke Hupkes, Diganta Misra, Dilyar Buzan, Dimitri Coelho Mollo, Diyi Yang, Dong-Ho Lee, Dylan Schrader, Ekaterina Shutova, Ekin Dogus Cubuk, Elad Segal, Eleanor Hagerman, Elizabeth Barnes, Elizabeth Donoway, Ellie Pavlick, Emanuele Rodola, Emma Lam, Eric Chu, Eric Tang, Erkut Erdem, Ernie Chang, Ethan A. Chi, Ethan Dyer, Ethan Jerzak, Ethan Kim, Eunice Engefu Manyasi, Evgenii Zheltonozhskii, Fanyue Xia, Fatemeh Siar, Fernando Martínez-Plumed, Francesca Happé, Francois Chollet, Frieda Rong, Gaurav Mishra, Genta Indra Winata, Gerard de Melo, Germán Kruszewski, Giambattista Parascandolo, Giorgio Mariani, Gloria Wang, Gonzalo Jaimovitch-López, Gregor Betz, Guy Gur-Ari, Hana Galijasevic, Hannah Kim, Hannah Rashkin, Hannaneh Hajishirzi, Harsh Mehta, Hayden Bogar, Henry Shevlin, Hinrich Schütze, Hiromu Yakura, Hongming Zhang, Hugh Mee Wong, Ian Ng, Isaac Noble, Jaap Jumelet, Jack Geissinger, Jackson Kernion, Jacob Hilton, Jaehoon Lee, Jaime Fernández Fisac, James B. Simon, James Koppel, James Zheng, James Zou, Jan Kocoń, Jana Thompson, Janelle Wingfield, Jared Kaplan, Jarema Radom, Jascha Sohl-Dickstein, Jason Phang, Jason Wei, Jason Yosinski, Jekaterina Novikova, Jelle Bosscher, Jennifer Marsh, Jeremy Kim, Jeroen Taal, Jesse Engel, Jesujoba Alabi, Jiacheng Xu, Jiaming Song, Jillian Tang, Joan Waweru, John Burden, John Miller, John U. Balis, Jonathan Batchelder, Jonathan Berant, Jörg Frohberg, Jos Rozen, Jose Hernandez-Orallo, Joseph Boudeman, Joseph Guerr, Joseph Jones, Joshua B. Tenenbaum, Joshua S. Rule, Joyce Chua, Kamil Kanclerz, Karen Livescu, Karl Krauth, Karthik Gopalakrishnan, Katerina Ignatyeva, Katja Markert, Kaustubh D. Dhole, Kevin Gimpel, Kevin Omondi, Kory Mathewson, Kristen Chiafullo, Ksenia Shkaruta, Kumar Shridhar, Kyle McDonell, Kyle Richardson, Laria Reynolds, Leo Gao, Li Zhang, Liam Dugan, Lianhui Qin, Lidia Contreras-Ochando, Louis-Philippe Morency, Luca Moschella, Lucas Lam, Lucy Noble, Ludwig Schmidt, Luheng He, Luis Oliveros Colón, Luke Metz, Lütfi Kerem Şenel, Maarten Bosma, Maarten Sap, Maartje ter Hoeve, Maheen Farooqi, Manaal Faruqui, Mantas Mazeika, Marco Baturan, Marco Marelli, Marco Maru, Maria Jose Ramírez Quintana, Marie Tolkiehn, Mario Giulianelli, Martha Lewis, Martin Potthast, Matthew L. Leavitt, Matthias Hagen, Mátyás Schubert, Medina Orduna Baitemirova, Melody Arnaud, Melvin McElrath, Michael A. Yee, Michael Cohen, Michael Gu, Michael Ivanitskiy, Michael Starritt, Michael Strube, Michał Swędrowski, Michele Bevilacqua, Michihiro Yasunaga, Mihir Kale, Mike Cain, Mimee Xu, Mirac Suzgun, Mitch Walker, Mo Tiwari, Mohit Bansal, Moin Aminnaseri, Mor Geva, Mozhdeh Gheini, Mukund Varma T, Nanyun Peng, Nathan A. Chi, Nayeon Lee, Neta Gur-Ari Krakover, Nicholas Cameron, Nicholas Roberts, Nick Doiron, Nicole Martinez, Nikita Nangia, Niklas Deckers, Niklas Muennighoff, Nitish Shirish Keskar, Niveditha S. Iyer, Noah Constant, Noah Fiedel, Nuan Wen, Oliver Zhang, Omar Agha, Omar Elbaghdadi, Omer Levy, Owain Evans, Pablo Antonio Moreno Casares, Parth Doshi, Pascale Fung, Paul Pu Liang, Paul Vicol, Pegah Alipoormolabashi, Peiyuan Liao, Percy Liang, Peter Chang, Peter Eckersley, Phu Mon Htut, Pinyu Hwang, Piotr Miłkowski, Piyush Patil, Pouya Pezeshkpour, Priti Oli, Qiaozhu Mei, Qing Lyu, Qinlang Chen, Rabin Banjade, Rachel Etta Rudolph, Raefer Gabriel, Rahel Habacker, Ramon Risco, Raphaël Millière, Rhythm Garg, Richard Barnes, Rif A. Saurous, Riku Arakawa, Robbe Raymaekers, Robert Frank, Rohan Sikand, Roman Novak, Roman Sitelew, Ronan LeBras, Rosanne Liu, Rowan Jacobs, Rui Zhang, Ruslan Salakhutdinov, Ryan Chi, Ryan Lee, Ryan Stovall, Ryan Teehan, Rylan Yang, Sahib Singh, Saif M. Mohammad, Sajant Anand, Sam Dillavou, Sam Shleifer, Sam Wiseman, Samuel Gruetter, Samuel R. Bowman, Samuel S. Schoenholz, Sanghyun Han, Sanjeev Kwatra, Sarah A. Rous, Sarik Ghazarian, Sayan Ghosh, Sean Casey, Sebastian Bischoff, Sebastian Gehrmann, Sebastian Schuster, Sepideh Sadeghi, Shadi Hamdan, Sharon Zhou, Shashank Srivastava, Sherry Shi, Shikhar Singh, Shima Asaadi, Shixiang Shane Gu, Shubh Pachchigar, Shubham Toshniwal, Shyam Upadhyay, Shyamolima, Debnath, Siamak Shakeri, Simon Thormeyer, Simone Melzi, Siva Reddy, Sneha Priscilla Makini, Soo-Hwan Lee, Spencer Torene, Sriharsha Hatwar, Stanislas Dehaene, Stefan Divic, Stefano Ermon, Stella Biderman, Stephanie Lin, Stephen Prasad, Steven T. Piantadosi, Stuart M. Shieber, Summer Misherghi, Svetlana Kiritchenko, Swaroop Mishra, Tal Linzen, Tal Schuster, Tao Li, Tao Yu, Tariq Ali, Tatsu Hashimoto, Te-Lin Wu, Théo Desbordes, Theodore Rothschild, Thomas Phan, Tianle Wang, Tiberius Nkinyili, Timo Schick, Timofei Kornev, Titus Tunduny, Tobias Gerstenberg, Trenton Chang, Trishala Neeraj, Tushar Khot, Tyler Shultz, Uri Shaham, Vedant Misra, Vera Demberg, Victoria Nyamai, Vikas Raunak, Vinay Ramasesh, Vinay Uday Prabhu, Vishakh Padmakumar, Vivek Srikumar, William Fedus, William Saunders, William Zhang, Wout Vossen, Xiang Ren, Xiaoyu Tong, Xinran Zhao, Xinyi Wu, Xudong Shen, Yadollah Yaghoobzadeh, Yair Lakretz, Yangqiu Song, Yasaman Bahri, Yejin Choi, Yichi Yang, Yiding Hao, Yifu Chen, Yonatan Belinkov, Yu Hou, Yufang Hou, Yuntao Bai, Zachary Seid, Zhuoye Zhao, Zijian Wang, Zijie J. Wang, ZiRui Wang, Ziyi Wu
BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models.
1 code implementation • 27 May 2022 • Eldar David Abraham, Karel D'Oosterlinck, Amir Feder, Yair Ori Gat, Atticus Geiger, Christopher Potts, Roi Reichart, Zhengxuan Wu
We introduce CEBaB, a new benchmark dataset for assessing concept-based explanation methods in Natural Language Processing (NLP).
1 code implementation • 21 May 2022 • Elisa Kreiss, Cynthia Bennett, Shayan Hooshmand, Eric Zelikman, Meredith Ringel Morris, Christopher Potts
Few images on the Web receive alt-text descriptions that would make them accessible to blind and low vision (BLV) users.
1 code implementation • 19 May 2022 • Keshav Santhanam, Omar Khattab, Christopher Potts, Matei Zaharia
PLAID uses centroid interaction as well as centroid pruning, a mechanism for sparsifying the bag of centroids, within a highly-optimized engine to reduce late interaction search latency by up to 7$\times$ on a GPU and 45$\times$ on a CPU against vanilla ColBERTv2, while continuing to deliver state-of-the-art retrieval quality.
1 code implementation • 18 May 2022 • Fei Fang, Kunal Sinha, Noah D. Goodman, Christopher Potts, Elisa Kreiss
It seems likely that these patterns are shaped by the environment a speaker is exposed to in complex ways.
1 code implementation • NAACL 2022 • Zhengxuan Wu, Atticus Geiger, Josh Rozner, Elisa Kreiss, Hanson Lu, Thomas Icard, Christopher Potts, Noah D. Goodman
Distillation efforts have led to language models that are more compact and efficient without serious drops in performance.
3 code implementations • NAACL 2022 • Keshav Santhanam, Omar Khattab, Jon Saad-Falcon, Christopher Potts, Matei Zaharia
Neural information retrieval (IR) has greatly advanced search and other knowledge-intensive language tasks.
Ranked #6 on Zero-shot Text Search on BEIR
2 code implementations • 1 Dec 2021 • Atticus Geiger, Zhengxuan Wu, Hanson Lu, Josh Rozner, Elisa Kreiss, Thomas Icard, Noah D. Goodman, Christopher Potts
In IIT, we (1) align variables in a causal model (e. g., a deterministic program or Bayesian network) with representations in a neural model and (2) train the neural model to match the counterfactual behavior of the causal model on a base input when aligned representations in both models are set to be the value they would be for a source input.
no code implementations • ICLR 2022 • Ashwin Paranjape, Omar Khattab, Christopher Potts, Matei Zaharia, Christopher D. Manning
Many text generation systems benefit from using a retriever to retrieve passages from a textual knowledge corpus (e. g., Wikipedia) which are then provided as additional context to the generator.
3 code implementations • 18 Sep 2021 • Zhengxuan Wu, Elisa Kreiss, Desmond C. Ong, Christopher Potts
The ability to compositionally map language to referents, relations, and actions is an essential component of language understanding.
2 code implementations • 16 Aug 2021 • Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel, Noah Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Koh, Mark Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, aditi raghunathan, Rob Reich, Hongyu Ren, Frieda Rong, Yusuf Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, Percy Liang
AI is undergoing a paradigm shift with the rise of models (e. g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks.
1 code implementation • NeurIPS 2021 • Atticus Geiger, Hanson Lu, Thomas Icard, Christopher Potts
Structural analysis methods (e. g., probing and feature attribution) are increasingly important tools for neural network analysis.
no code implementations • NeurIPS 2021 • Zhiyi Ma, Kawin Ethayarajh, Tristan Thrush, Somya Jain, Ledell Wu, Robin Jia, Christopher Potts, Adina Williams, Douwe Kiela
We introduce Dynaboard, an evaluation-as-a-service framework for hosting benchmarks and conducting holistic model comparison, integrated with the Dynabench platform.
1 code implementation • RepL4NLP (ACL) 2022 • Zhengxuan Wu, Nelson F. Liu, Christopher Potts
There is growing evidence that pretrained language models improve task-specific fine-tuning not just for the languages seen in pretraining, but also for new languages and even non-linguistic data.
2 code implementations • NeurIPS 2021 • Joshua Rozner, Christopher Potts, Kyle Mahowald
Cryptic crosswords, the dominant crossword variety in the UK, are a promising target for advancing NLP systems that seek to process semantically complex, highly compositional language.
1 code implementation • 16 Apr 2021 • Elisa Kreiss, Fei Fang, Noah D. Goodman, Christopher Potts
Current deep learning models often achieve excellent results on benchmark image-to-text datasets but fail to generate texts that are useful in practice.
no code implementations • NAACL 2021 • Douwe Kiela, Max Bartolo, Yixin Nie, Divyansh Kaushik, Atticus Geiger, Zhengxuan Wu, Bertie Vidgen, Grusha Prasad, Amanpreet Singh, Pratik Ringshia, Zhiyi Ma, Tristan Thrush, Sebastian Riedel, Zeerak Waseem, Pontus Stenetorp, Robin Jia, Mohit Bansal, Christopher Potts, Adina Williams
We introduce Dynabench, an open-source platform for dynamic dataset creation and model benchmarking.
2 code implementations • NeurIPS 2021 • Omar Khattab, Christopher Potts, Matei Zaharia
Multi-hop reasoning (i. e., reasoning across two or more documents) is a key ingredient for NLP models that leverage large corpora to exhibit broad knowledge.
1 code implementation • ACL 2021 • Christopher Potts, Zhengxuan Wu, Atticus Geiger, Douwe Kiela
We introduce DynaSent ('Dynamic Sentiment'), a new English-language benchmark task for ternary (positive/negative/neutral) sentiment analysis.
5 code implementations • 1 Jul 2020 • Omar Khattab, Christopher Potts, Matei Zaharia
In much recent work, the retriever is a learned component that uses coarse-grained vector representations of questions and passages.
1 code implementation • CONLL 2020 • Elisa Kreiss, Zijian Wang, Christopher Potts
Crime reporting is a prevalent form of journalism with the power to shape public perceptions and social policies.
1 code implementation • 14 Jun 2020 • Atticus Geiger, Alexandra Carstensen, Michael C. Frank, Christopher Potts
In the two latter cases, our models perform tasks proposed in previous work to demarcate human-unique symbolic abilities.
1 code implementation • EMNLP 2020 • Emrah Budur, Rıza Özçelik, Tunga Güngör, Christopher Potts
In this paper, we offer a positive response for natural language inference (NLI) in Turkish.
1 code implementation • EMNLP (BlackboxNLP) 2020 • Atticus Geiger, Kyle Richardson, Christopher Potts
We address whether neural models for Natural Language Inference (NLI) can learn the compositional interactions between lexical entailment and negation, using four methods: the behavioral evaluation methods of (1) challenge test sets and (2) systematic generalization tasks, and the structural evaluation methods of (3) probes and (4) interventions.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Allen Nie, Reuben Cohn-Gordon, Christopher Potts
Image captioning systems have recently improved dramatically, but they still tend to produce captions that are insensitive to the communicative goals that captions should meet.
1 code implementation • IJCNLP 2019 • Zijian Wang, Christopher Potts
Condescending language use is caustic; it can bring dialogues to an end and bifurcate communities.
1 code implementation • SCiL 2020 • Benjamin Newman, Reuben Cohn-Gordon, Christopher Potts
Natural language generation (NLG) systems are commonly evaluated using n-gram overlap measures (e. g. BLEU, ROUGE).
no code implementations • NAACL 2019 • Ignacio Cases, Clemens Rosenbaum, Matthew Riemer, Atticus Geiger, Tim Klinger, Alex Tamkin, Olivia Li, S Agarwal, hini, Joshua D. Greene, Dan Jurafsky, Christopher Potts, Lauri Karttunen
The model jointly optimizes the parameters of the functions and the meta-learner{'}s policy for routing inputs through those functions.
no code implementations • 31 Mar 2019 • Bruno Godefroy, Christopher Potts
FDA drug labels are rich sources of information about drugs and drug-disease relations, but their complexity makes them challenging texts to analyze in isolation.
no code implementations • WS 2019 • Yifeng Tao, Bruno Godefroy, Guillaume Genthial, Christopher Potts
Crucial information about the practice of healthcare is recorded only in free-form text, which creates an enormous opportunity for high-impact NLP.
no code implementations • 30 Oct 2018 • Atticus Geiger, Ignacio Cases, Lauri Karttunen, Christopher Potts
Standard evaluations of deep learning models for semantics using naturalistic corpora are limited in what they can tell us about the fidelity of the learned representations, because the corpora rarely come with good measures of semantic complexity.
no code implementations • WS 2019 • Reuben Cohn-Gordon, Noah D. Goodman, Christopher Potts
Recent Iterated Response (IR) models of pragmatics conceptualize language use as a recursive process in which agents reason about each other to increase communicative efficiency.
no code implementations • 10 Sep 2018 • Christopher Potts
Pater's target article builds a persuasive case for establishing stronger ties between theoretical linguistics and connectionism (deep learning).
1 code implementation • EMNLP 2018 • Y. Alex Kolchinski, Christopher Potts
We explore two methods for representing authors in the context of textual sarcasm detection: a Bayesian approach that directly represents authors' propensities to be sarcastic, and a dense embedding approach that can learn interactions between the author and the text.
no code implementations • NAACL 2018 • Reuben Cohn-Gordon, Noah Goodman, Christopher Potts
We combine a neural image captioner with a Rational Speech Acts (RSA) model to make a system that is pragmatically informative: its objective is to produce captions that are not merely true but also distinguish their inputs from similar images.
1 code implementation • NAACL 2018 • Nicholas Dingwall, Christopher Potts
We present a simple extension of the GloVe representation learning model that begins with general-purpose representations and updates them based on data from a specialized domain.
1 code implementation • NAACL 2018 • Will Monroe, Jennifer Hu, Andrew Jong, Christopher Potts
Contextual influences on language often exhibit substantial cross-lingual regularities; for example, we are more verbose in situations that require finer distinctions.
no code implementations • 5 Oct 2017 • Ignacio Cases, Minh-Thang Luong, Christopher Potts
Neural networks have excelled at many NLP tasks, but there remain open questions about the performance of pretrained distributed word representations and their interaction with weight initialization and other hyperparameters.
1 code implementation • COLING 2018 • Benjamin J. Lengerich, Andrew L. Maas, Christopher Potts
Knowledge graphs are a versatile framework to encode richly structured data relationships, but it can be challenging to combine these graphs with unstructured data.
1 code implementation • TACL 2017 • Will Monroe, Robert X. D. Hawkins, Noah D. Goodman, Christopher Potts
We present a model of pragmatic referring expression interpretation in a grounded communication task (identifying colors from descriptions) that draws upon predictions from two recurrent neural network classifiers, a speaker and a listener, unified by a recursive pragmatic reasoning framework.
1 code implementation • EMNLP 2016 • Will Monroe, Noah D. Goodman, Christopher Potts
The production of color language is essential for grounded language generation.
3 code implementations • ACL 2016 • Samuel R. Bowman, Jon Gauthier, Abhinav Rastogi, Raghav Gupta, Christopher D. Manning, Christopher Potts
Tree-structured neural networks exploit valuable syntactic parse information as they interpret the meanings of sentences.
Ranked #86 on Natural Language Inference on SNLI
no code implementations • 23 Oct 2015 • Will Monroe, Christopher Potts
The Rational Speech Acts (RSA) model treats language use as a recursive process in which probabilistic speaker and listener agents reason about each other's intentions to enrich the literal semantics of their language along broadly Gricean lines.
3 code implementations • EMNLP 2015 • Samuel R. Bowman, Gabor Angeli, Christopher Potts, Christopher D. Manning
Understanding entailment and contradiction is fundamental to understanding natural language, and inference about entailment and contradiction is a valuable testing ground for the development of semantic representations.
Ranked #91 on Natural Language Inference on SNLI
1 code implementation • 16 Jun 2015 • Samuel R. Bowman, Christopher D. Manning, Christopher Potts
We hypothesize that neural sequence models like LSTMs are in fact able to discover and implicitly use recursive compositional structure, at least for tasks with clear cues to that structure in the data.
no code implementations • IJCNLP 2015 • Angel Chang, Will Monroe, Manolis Savva, Christopher Potts, Christopher D. Manning
The ability to map descriptions of scenes to 3D geometric representations has many applications in areas such as art, education, and robotics.
no code implementations • 15 Oct 2014 • Samuel R. Bowman, Christopher Potts, Christopher D. Manning
Natural logic offers a powerful relational conception of meaning that is a natural counterpart to distributed semantic representations, which have proven valuable in a wide range of sophisticated language tasks.
no code implementations • TACL 2014 • Robert West, Hristo S. Paskov, Jure Leskovec, Christopher Potts
Person-to-person evaluations are prevalent in all kinds of discourse and important for establishing reputations, building social bonds, and shaping public opinion.
no code implementations • WS 2015 • Samuel R. Bowman, Christopher Potts, Christopher D. Manning
Tree-structured recursive neural networks (TreeRNNs) for sentence meaning have been successful for many applications, but it remains an open question whether the fixed-length representations that they learn can support tasks as demanding as logical deduction.
no code implementations • ACL 2013 • Cristian Danescu-Niculescu-Mizil, Moritz Sudhof, Dan Jurafsky, Jure Leskovec, Christopher Potts
We propose a computational framework for identifying linguistic aspects of politeness.