no code implementations • 6 Sep 2024 • Adir Rahamim, Naomi Saphra, Sara Kangaslahti, Yonatan Belinkov
Parameter efficient finetuning methods like low-rank adaptation (LoRA) aim to reduce the computational costs of finetuning pretrained Language Models (LMs).
no code implementations • 22 Jul 2024 • Michael Saxon, Ari Holtzman, Peter West, William Yang Wang, Naomi Saphra
Modern language models (LMs) pose a new challenge in capability assessment.
1 code implementation • 9 Jul 2024 • Victoria R. Li, Yida Chen, Naomi Saphra
By generating user biographies that offer ideological and demographic information, we find a number of biases in guardrail sensitivity on GPT-3. 5.
1 code implementation • 25 Jun 2024 • USVSN Sai Prashanth, Alvin Deng, Kyle O'Brien, Jyothir S V, Mohammad Aflah Khan, Jaydeep Borkar, Christopher A. Choquette-Choo, Jacob Ray Fuehne, Stella Biderman, Tracy Ke, Katherine Lee, Naomi Saphra
Memorization in language models is typically treated as a homogenous phenomenon, neglecting the specifics of the memorized data.
no code implementations • 17 Jun 2024 • Edwin Zhang, Vincent Zhu, Naomi Saphra, Anat Kleiman, Benjamin L. Edelman, Milind Tambe, Sham M. Kakade, Eran Malach
Generative models are trained with the simple objective of imitating the conditional probability distribution induced by the data they are trained on.
no code implementations • 19 Mar 2024 • Divyansh Singhvi, Andrej Erkelens, Raghav Jain, Diganta Misra, Naomi Saphra
Measuring nonlinear feature interaction is an established approach to understanding complex patterns of attribution in many models.
no code implementations • 29 Nov 2023 • Yash Gondhalekar, Sultan Hassan, Naomi Saphra, Sambatra Andrianomena
The generalization of machine learning (ML) models to out-of-distribution (OOD) examples remains a key challenge in extracting information from upcoming astronomical surveys.
no code implementations • 15 Nov 2023 • Ian Berlot-Attwell, Kumar Krishna Agrawal, A. Michael Carrell, Yash Sharma, Naomi Saphra
Although modern neural networks often generalize to new combinations of familiar concepts, the conditions that enable such compositionality have long been an open question.
no code implementations • 8 Nov 2023 • Naomi Saphra, Eve Fleisig, Kyunghyun Cho, Adam Lopez
Many NLP researchers are experiencing an existential crisis triggered by the astonishing success of ChatGPT and other systems based on large language models (LLMs).
1 code implementation • 5 Oct 2023 • Tom Sherborne, Naomi Saphra, Pradeep Dasigi, Hao Peng
We propose Trust Region Aware Minimization (TRAM), a SAM algorithm fine-tuning for low parameter sharpness and smooth, informative representations preserving pre-trained structure.
no code implementations • 13 Sep 2023 • Angelica Chen, Ravid Shwartz-Ziv, Kyunghyun Cho, Matthew L. Leavitt, Naomi Saphra
Most interpretability research in NLP focuses on understanding the behavior and features of a fully trained model.
1 code implementation • 18 Aug 2023 • Michael Y. Hu, Angelica Chen, Naomi Saphra, Kyunghyun Cho
We use the HMM representation to study phase transitions and identify latent "detour" states that slow down convergence.
no code implementations • 24 May 2023 • Zachary Ankner, Naomi Saphra, Davis Blalock, Jonathan Frankle, Matthew L. Leavitt
Most works on transformers trained with the Masked Language Modeling (MLM) objective use the original BERT model's fixed masking rate of 15%.
no code implementations • 17 Nov 2022 • Bingchen Zhao, Yuling Gu, Jessica Zosa Forde, Naomi Saphra
At NeurIPS, American and Chinese institutions cite papers from each other's regions substantially less than they cite endogamously.
no code implementations • 6 Oct 2022 • Dieuwke Hupkes, Mario Giulianelli, Verna Dankers, Mikel Artetxe, Yanai Elazar, Tiago Pimentel, Christos Christodoulopoulos, Karim Lasri, Naomi Saphra, Arabella Sinclair, Dennis Ulmer, Florian Schottmann, Khuyagbaatar Batsuren, Kaiser Sun, Koustuv Sinha, Leila Khalatbari, Maria Ryskina, Rita Frieske, Ryan Cotterell, Zhijing Jin
We present a taxonomy for characterising and understanding generalisation research in NLP.
1 code implementation • COLING 2022 • Josef Valvoda, Naomi Saphra, Jonathan Rawski, Adina Williams, Ryan Cotterell
Recombining known primitive concepts into larger novel combinations is a quintessentially human cognitive capability.
1 code implementation • 24 May 2022 • Jeevesh Juneja, Rachit Bansal, Kyunghyun Cho, João Sedoc, Naomi Saphra
It is widely accepted in the mode connectivity literature that when two neural networks are trained similarly on the same data, they are connected by a path through parameter space over which test set accuracy is maintained.
3 code implementations • ICLR 2022 • Thibault Sellam, Steve Yadlowsky, Jason Wei, Naomi Saphra, Alexander D'Amour, Tal Linzen, Jasmijn Bastings, Iulia Turc, Jacob Eisenstein, Dipanjan Das, Ian Tenney, Ellie Pavlick
Experiments with pre-trained models such as BERT are often based on a single checkpoint.
no code implementations • NAACL 2021 • Jennifer C. White, Tiago Pimentel, Naomi Saphra, Ryan Cotterell
Probes are models devised to investigate the encoding of knowledge -- e. g. syntactic structure -- in contextual representations.
no code implementations • Findings of the Association for Computational Linguistics 2020 • Naomi Saphra, Adam Lopez
To explore the inductive biases that cause these compositional representations to arise during training, we conduct simple experiments on synthetic data.
no code implementations • 6 Oct 2020 • Naomi Saphra, Adam Lopez
To explore the inductive biases that cause these compositional representations to arise during training, we conduct simple experiments on synthetic data.
1 code implementation • EMNLP 2020 • Tiago Pimentel, Naomi Saphra, Adina Williams, Ryan Cotterell
In our contribution to this discussion, we argue for a probe metric that reflects the fundamental trade-off between probe complexity and performance: the Pareto hypervolume.
no code implementations • 27 Apr 2020 • Naomi Saphra, Adam Lopez
Recent work in NLP shows that LSTM language models capture compositional structure in language data.
1 code implementation • 12 Nov 2019 • Yekun Chai, Naomi Saphra, Adam Lopez
Diverse word representations have surged in most state-of-the-art natural language processing (NLP) applications.
no code implementations • ICML Workshop Deep_Phenomen 2019 • Naomi Saphra, Adam Lopez
Concerns about interpretability, computational resources, and principled inductive priors have motivated efforts to engineer sparse neural models for NLP tasks.
no code implementations • 28 May 2019 • Naomi Saphra, Adam Lopez
LSTM-based language models exhibit compositionality in their representations, but how this behavior emerges over the course of training has not been explored.
no code implementations • NAACL 2019 • Naomi Saphra, Adam Lopez
Research has shown that neural models implicitly encode linguistic features, but there has been no research showing \emph{how} these encodings arise as the models are trained.
no code implementations • WS 2018 • Naomi Saphra, Adam Lopez
A glut of recent research shows that language models capture linguistic structure.
4 code implementations • 15 Jan 2017 • Graham Neubig, Chris Dyer, Yoav Goldberg, Austin Matthews, Waleed Ammar, Antonios Anastasopoulos, Miguel Ballesteros, David Chiang, Daniel Clothiaux, Trevor Cohn, Kevin Duh, Manaal Faruqui, Cynthia Gan, Dan Garrette, Yangfeng Ji, Lingpeng Kong, Adhiguna Kuncoro, Gaurav Kumar, Chaitanya Malaviya, Paul Michel, Yusuke Oda, Matthew Richardson, Naomi Saphra, Swabha Swayamdipta, Pengcheng Yin
In the static declaration strategy that is used in toolkits like Theano, CNTK, and TensorFlow, the user first defines a computation graph (a symbolic representation of the computation), and then examples are fed into an engine that executes this computation and computes its derivatives.
1 code implementation • WS 2016 • Naomi Saphra, Adam Lopez
Existing corpora for intrinsic evaluation are not targeted towards tasks in informal domains such as Twitter or news comment forums.
no code implementations • CVPR 2014 • Andrea Vedaldi, Siddharth Mahendran, Stavros Tsogkas, Subhransu Maji, Ross Girshick, Juho Kannala, Esa Rahtu, Iasonas Kokkinos, Matthew B. Blaschko, David Weiss, Ben Taskar, Karen Simonyan, Naomi Saphra, Sammy Mohamed
We show that the collected data can be used to study the relation between part detection and attribute prediction by diagnosing the performance of classifiers that pool information from different parts of an object.
1 code implementation • WS 2013 • Nathan Schneider, Brendan O'Connor, Naomi Saphra, David Bamman, Manaal Faruqui, Noah A. Smith, Chris Dyer, Jason Baldridge
We introduce a framework for lightweight dependency syntax annotation.