no code implementations • 13 Sep 2023 • Angelica Chen, Ravid Schwartz-Ziv, Kyunghyun Cho, Matthew L. Leavitt, Naomi Saphra
We further find that SAS competes with other beneficial traits and capabilities during training, and that briefly suppressing SAS can improve model quality.
no code implementations • 18 Aug 2023 • Michael Y. Hu, Angelica Chen, Naomi Saphra, Kyunghyun Cho
We use the HMM representation to study phase transitions and identify latent "detour" states that slow down convergence.
no code implementations • 24 May 2023 • Zachary Ankner, Naomi Saphra, Davis Blalock, Jonathan Frankle, Matthew L. Leavitt
Most works on transformers trained with the Masked Language Modeling (MLM) objective use the original BERT model's fixed masking rate of 15%.
no code implementations • 17 Nov 2022 • Bingchen Zhao, Yuling Gu, Jessica Zosa Forde, Naomi Saphra
At NeurIPS, American and Chinese institutions cite papers from each other's regions substantially less than they cite endogamously.
no code implementations • 6 Oct 2022 • Dieuwke Hupkes, Mario Giulianelli, Verna Dankers, Mikel Artetxe, Yanai Elazar, Tiago Pimentel, Christos Christodoulopoulos, Karim Lasri, Naomi Saphra, Arabella Sinclair, Dennis Ulmer, Florian Schottmann, Khuyagbaatar Batsuren, Kaiser Sun, Koustuv Sinha, Leila Khalatbari, Maria Ryskina, Rita Frieske, Ryan Cotterell, Zhijing Jin
We present a taxonomy for characterising and understanding generalisation research in NLP.
1 code implementation • COLING 2022 • Josef Valvoda, Naomi Saphra, Jonathan Rawski, Adina Williams, Ryan Cotterell
Recombining known primitive concepts into larger novel combinations is a quintessentially human cognitive capability.
1 code implementation • 24 May 2022 • Jeevesh Juneja, Rachit Bansal, Kyunghyun Cho, João Sedoc, Naomi Saphra
It is widely accepted in the mode connectivity literature that when two neural networks are trained similarly on the same data, they are connected by a path through parameter space over which test set accuracy is maintained.
1 code implementation • ICLR 2022 • Thibault Sellam, Steve Yadlowsky, Jason Wei, Naomi Saphra, Alexander D'Amour, Tal Linzen, Jasmijn Bastings, Iulia Turc, Jacob Eisenstein, Dipanjan Das, Ian Tenney, Ellie Pavlick
Experiments with pre-trained models such as BERT are often based on a single checkpoint.
no code implementations • NAACL 2021 • Jennifer C. White, Tiago Pimentel, Naomi Saphra, Ryan Cotterell
Probes are models devised to investigate the encoding of knowledge -- e. g. syntactic structure -- in contextual representations.
no code implementations • Findings of the Association for Computational Linguistics 2020 • Naomi Saphra, Adam Lopez
To explore the inductive biases that cause these compositional representations to arise during training, we conduct simple experiments on synthetic data.
no code implementations • 6 Oct 2020 • Naomi Saphra, Adam Lopez
To explore the inductive biases that cause these compositional representations to arise during training, we conduct simple experiments on synthetic data.
1 code implementation • EMNLP 2020 • Tiago Pimentel, Naomi Saphra, Adina Williams, Ryan Cotterell
In our contribution to this discussion, we argue for a probe metric that reflects the fundamental trade-off between probe complexity and performance: the Pareto hypervolume.
no code implementations • 27 Apr 2020 • Naomi Saphra, Adam Lopez
Recent work in NLP shows that LSTM language models capture compositional structure in language data.
1 code implementation • 12 Nov 2019 • Yekun Chai, Naomi Saphra, Adam Lopez
Diverse word representations have surged in most state-of-the-art natural language processing (NLP) applications.
no code implementations • ICML Workshop Deep_Phenomen 2019 • Naomi Saphra, Adam Lopez
Concerns about interpretability, computational resources, and principled inductive priors have motivated efforts to engineer sparse neural models for NLP tasks.
no code implementations • 28 May 2019 • Naomi Saphra, Adam Lopez
LSTM-based language models exhibit compositionality in their representations, but how this behavior emerges over the course of training has not been explored.
no code implementations • WS 2018 • Naomi Saphra, Adam Lopez
A glut of recent research shows that language models capture linguistic structure.
no code implementations • NAACL 2019 • Naomi Saphra, Adam Lopez
Research has shown that neural models implicitly encode linguistic features, but there has been no research showing \emph{how} these encodings arise as the models are trained.
4 code implementations • 15 Jan 2017 • Graham Neubig, Chris Dyer, Yoav Goldberg, Austin Matthews, Waleed Ammar, Antonios Anastasopoulos, Miguel Ballesteros, David Chiang, Daniel Clothiaux, Trevor Cohn, Kevin Duh, Manaal Faruqui, Cynthia Gan, Dan Garrette, Yangfeng Ji, Lingpeng Kong, Adhiguna Kuncoro, Gaurav Kumar, Chaitanya Malaviya, Paul Michel, Yusuke Oda, Matthew Richardson, Naomi Saphra, Swabha Swayamdipta, Pengcheng Yin
In the static declaration strategy that is used in toolkits like Theano, CNTK, and TensorFlow, the user first defines a computation graph (a symbolic representation of the computation), and then examples are fed into an engine that executes this computation and computes its derivatives.
1 code implementation • WS 2016 • Naomi Saphra, Adam Lopez
Existing corpora for intrinsic evaluation are not targeted towards tasks in informal domains such as Twitter or news comment forums.
no code implementations • CVPR 2014 • Andrea Vedaldi, Siddharth Mahendran, Stavros Tsogkas, Subhransu Maji, Ross Girshick, Juho Kannala, Esa Rahtu, Iasonas Kokkinos, Matthew B. Blaschko, David Weiss, Ben Taskar, Karen Simonyan, Naomi Saphra, Sammy Mohamed
We show that the collected data can be used to study the relation between part detection and attribute prediction by diagnosing the performance of classifiers that pool information from different parts of an object.
1 code implementation • WS 2013 • Nathan Schneider, Brendan O'Connor, Naomi Saphra, David Bamman, Manaal Faruqui, Noah A. Smith, Chris Dyer, Jason Baldridge
We introduce a framework for lightweight dependency syntax annotation.