1 code implementation • Findings (EMNLP) 2021 • Prasanna Parthasarathi, Koustuv Sinha, Joelle Pineau, Adina Williams
Rapid progress in Neural Machine Translation (NMT) systems over the last few years has focused primarily on improving translation quality, and as a secondary focus, improving robustness to perturbations (e. g. spelling).
no code implementations • EMNLP 2020 • Arya D. McCarthy, Adina Williams, Shijia Liu, David Yarowsky, Ryan Cotterell
Of particular interest, languages on the same branch of our phylogenetic tree are notably similar, whereas languages from separate branches are no more similar than chance.
1 code implementation • CoNLL (EMNLP) 2021 • Verna Dankers, Anna Langedijk, Kate McCurdy, Adina Williams, Dieuwke Hupkes
Inflectional morphology has since long been a useful testing ground for broader questions about generalisation in language and the viability of neural network models as cognitive models of language.
no code implementations • NAACL 2022 • Zeerak Talat, Hagen Blix, Josef Valvoda, Maya Indira Ganesh, Ryan Cotterell, Adina Williams
Ethics is one of the longest standing intellectual endeavors of humanity.
no code implementations • 30 Nov 2023 • Karolina Stańczak, Kevin Du, Adina Williams, Isabelle Augenstein, Ryan Cotterell
However, when we control for the meaning of the noun, we find that grammatical gender has a near-zero effect on adjective choice, thereby calling the neo-Whorfian hypothesis into question.
no code implementations • 29 Nov 2023 • David Esiobu, Xiaoqing Tan, Saghar Hosseini, Megan Ung, Yuchen Zhang, Jude Fernandes, Jane Dwivedi-Yu, Eleonora Presani, Adina Williams, Eric Michael Smith
In this work, our focus is two-fold: (1) Benchmarking: a comparison of 6 different prompt-based bias and toxicity metrics across 12 demographic axes and 5 families of generative LLMs.
1 code implementation • 26 Oct 2023 • Kaiser Sun, Adina Williams, Dieuwke Hupkes
NLP models have progressed drastically in recent years, according to numerous datasets proposed to evaluate performance.
1 code implementation • 31 Aug 2023 • Benjamin Muller, Belen Alastruey, Prangthip Hansanti, Elahe Kalbassi, Christophe Ropers, Eric Michael Smith, Adina Williams, Luke Zettlemoyer, Pierre Andrews, Marta R. Costa-jussà
We showcase it to report gender representation in WMT training data and development data for the News task, confirming that current data is skewed towards masculine representation.
no code implementations • 11 Aug 2023 • Melissa Hall, Candace Ross, Adina Williams, Nicolas Carion, Michal Drozdzal, Adriana Romero Soriano
The unprecedented photorealistic results achieved by recent text-to-image generative systems and their increasing use as plug-and-play content creation solutions make it crucial to understand their potential biases.
11 code implementations • 18 Jul 2023 • Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor Kerkez, Madian Khabsa, Isabel Kloumann, Artem Korenev, Punit Singh Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Diana Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor Mihaylov, Pushkar Mishra, Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subramanian, Xiaoqing Ellen Tan, Binh Tang, Ross Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Rodriguez, Robert Stojnic, Sergey Edunov, Thomas Scialom
In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters.
Ranked #1 on
on
1 code implementation • 11 Jul 2023 • Arjun Subramonian, Adina Williams, Maximilian Nickel, Yizhou Sun, Levent Sagun
The expressive power of graph neural networks is usually measured by comparing how many pairs of graphs or nodes an architecture can possibly distinguish as non-isomorphic to those distinguishable by the $k$-dimensional Weisfeiler-Lehman ($k$-WL) test.
1 code implementation • 27 Jan 2023 • Alex Warstadt, Leshem Choshen, Aaron Mueller, Adina Williams, Ethan Wilcox, Chengxu Zhuang
In partnership with CoNLL and CMCL, we provide a platform for approaches to pretraining with a limited-size corpus sourced from data inspired by the input to children.
no code implementations • 18 Dec 2022 • Koustuv Sinha, Jon Gauthier, Aaron Mueller, Kanishka Misra, Keren Fuentes, Roger Levy, Adina Williams
In this paper, we investigate the stability of language models' performance on targeted syntactic evaluations as we vary properties of the input context: the length of the context, the types of syntactic phenomena it contains, and whether or not there are violations of grammaticality.
1 code implementation • 23 Oct 2022 • Koustuv Sinha, Amirhossein Kazemnejad, Siva Reddy, Joelle Pineau, Dieuwke Hupkes, Adina Williams
Transformer language models encode the notion of word order using positional information.
1 code implementation • COLING 2022 • Josef Valvoda, Naomi Saphra, Jonathan Rawski, Adina Williams, Ryan Cotterell
Recombining known primitive concepts into larger novel combinations is a quintessentially human cognitive capability.
1 code implementation • 25 May 2022 • Rebecca Qian, Candace Ross, Jude Fernandes, Eric Smith, Douwe Kiela, Adina Williams
Unwanted and often harmful social biases are becoming ever more salient in NLP research, affecting both models and datasets.
2 code implementations • 18 May 2022 • Eric Michael Smith, Melissa Hall, Melanie Kambadur, Eleonora Presani, Adina Williams
As language models grow in popularity, it becomes increasingly important to clearly measure all possible markers of demographic identity in order to avoid perpetuating existing societal harms.
2 code implementations • CVPR 2022 • Tristan Thrush, Ryan Jiang, Max Bartolo, Amanpreet Singh, Adina Williams, Douwe Kiela, Candace Ross
We present a novel task and dataset for evaluating the ability of vision and language models to conduct visio-linguistic compositional reasoning, which we call Winoground.
Ranked #28 on
Visual Reasoning
on Winoground
1 code implementation • ACL 2022 • Tristan Thrush, Kushal Tirumala, Anmol Gupta, Max Bartolo, Pedro Rodriguez, Tariq Kane, William Gaviria Rojas, Peter Mattson, Adina Williams, Douwe Kiela
We introduce Dynatask: an open source system for setting up custom NLP tasks that aims to greatly lower the technical knowledge and effort required for hosting and evaluating state-of-the-art NLP models, as well as for conducting model in the loop data collection with crowdworkers.
2 code implementations • 20 Jan 2022 • Karolina Stańczak, Lucas Torroba Hennigen, Adina Williams, Ryan Cotterell, Isabelle Augenstein
The success of pre-trained contextualized representations has prompted researchers to analyze them for the presence of linguistic information.
no code implementations • 7 Nov 2021 • Zeerak Talat, Hagen Blix, Josef Valvoda, Maya Indira Ganesh, Ryan Cotterell, Adina Williams
Ethics is one of the longest standing intellectual endeavors of humanity.
1 code implementation • Findings (ACL) 2022 • Eric Wallace, Adina Williams, Robin Jia, Douwe Kiela
To create models that are robust across a wide range of test inputs, training datasets should include diverse examples that span numerous phenomena.
no code implementations • 7 Sep 2021 • Eric Michael Smith, Adina Williams
All AI models are susceptible to learning biases in data that they are trained on.
no code implementations • NeurIPS 2021 • Zhiyi Ma, Kawin Ethayarajh, Tristan Thrush, Somya Jain, Ledell Wu, Robin Jia, Christopher Potts, Adina Williams, Douwe Kiela
We introduce Dynaboard, an evaluation-as-a-service framework for hosting benchmarks and conducting holistic model comparison, integrated with the Dynabench platform.
no code implementations • ACL 2022 • Adithya Renduchintala, Adina Williams
Transformer based models are the modern work horses for neural machine translation (NMT), reaching state of the art across several benchmarks.
no code implementations • 15 Apr 2021 • Prasanna Parthasarathi, Koustuv Sinha, Joelle Pineau, Adina Williams
Rapid progress in Neural Machine Translation (NMT) systems over the last few years has been driven primarily towards improving translation quality, and as a secondary focus, improved robustness to input perturbations (e. g. spelling and grammatical mistakes).
no code implementations • EMNLP 2021 • Koustuv Sinha, Robin Jia, Dieuwke Hupkes, Joelle Pineau, Adina Williams, Douwe Kiela
A possible explanation for the impressive performance of masked language model (MLM) pre-training is that such models have learned to represent the syntactic structures prevalent in classical NLP pipelines.
no code implementations • NAACL 2021 • Douwe Kiela, Max Bartolo, Yixin Nie, Divyansh Kaushik, Atticus Geiger, Zhengxuan Wu, Bertie Vidgen, Grusha Prasad, Amanpreet Singh, Pratik Ringshia, Zhiyi Ma, Tristan Thrush, Sebastian Riedel, Zeerak Waseem, Pontus Stenetorp, Robin Jia, Mohit Bansal, Christopher Potts, Adina Williams
We introduce Dynabench, an open-source platform for dynamic dataset creation and model benchmarking.
1 code implementation • ACL 2021 • Koustuv Sinha, Prasanna Parthasarathi, Joelle Pineau, Adina Williams
We provide novel evidence that complicates this claim: we find that state-of-the-art Natural Language Inference (NLI) models assign the same labels to permuted examples as they do to the original, i. e. they are largely invariant to random word-order permutations.
no code implementations • EMNLP (BlackboxNLP) 2021 • Grusha Prasad, Yixin Nie, Mohit Bansal, Robin Jia, Douwe Kiela, Adina Williams
Given the increasingly prominent role NLP models (will) play in our lives, it is important for human expectations of model behavior to align with actual model behavior.
1 code implementation • SCiL 2022 • Adina Williams, Tristan Thrush, Douwe Kiela
We perform an in-depth error analysis of Adversarial NLI (ANLI), a recently introduced large-scale human-and-model-in-the-loop natural language inference dataset collected over multiple rounds.
1 code implementation • EMNLP 2020 • Lucas Torroba Hennigen, Adina Williams, Ryan Cotterell
Most modern NLP systems make use of pre-trained contextual representations that attain astonishingly high performance on a variety of tasks.
1 code implementation • EMNLP 2020 • Tiago Pimentel, Naomi Saphra, Adina Williams, Ryan Cotterell
In our contribution to this discussion, we argue for a probe metric that reflects the fundamental trade-off between probe complexity and performance: the Pareto hypervolume.
1 code implementation • WS 2020 • Ekaterina Vylomova, Jennifer White, Elizabeth Salesky, Sabrina J. Mielke, Shijie Wu, Edoardo Ponti, Rowan Hall Maudslay, Ran Zmigrod, Josef Valvoda, Svetlana Toldova, Francis Tyers, Elena Klyachko, Ilya Yegorov, Natalia Krizhanovsky, Paula Czarnowska, Irene Nikkarinen, Andrew Krizhanovsky, Tiago Pimentel, Lucas Torroba Hennigen, Christo Kirov, Garrett Nicolai, Adina Williams, Antonios Anastasopoulos, Hilaria Cruz, Eleanor Chodroff, Ryan Cotterell, Miikka Silfverberg, Mans Hulden
Systems were developed using data from 45 languages and just 5 language families, fine-tuned with data from an additional 45 languages and 10 language families (13 in total), and evaluated on all 90 languages.
1 code implementation • ACL 2020 • Rowan Hall Maudslay, Josef Valvoda, Tiago Pimentel, Adina Williams, Ryan Cotterell
One such probe is the structural probe (Hewitt and Manning, 2019), designed to quantify the extent to which syntactic information is encoded in contextualised word representations.
no code implementations • 3 May 2020 • Adina Williams, Ryan Cotterell, Lawrence Wolf-Sonkin, Damián Blasi, Hanna Wallach
We also find that there are statistically significant relationships between the grammatical genders of inanimate nouns and the verbs that take those nouns as direct objects, as indirect objects, and as subjects.
1 code implementation • ACL 2020 • Adina Williams, Tiago Pimentel, Arya D. McCarthy, Hagen Blix, Eleanor Chodroff, Ryan Cotterell
We find for two Indo-European languages (Czech and German) that form and meaning respectively share significant amounts of information with class (and contribute additional information above and beyond gender).
no code implementations • EMNLP 2020 • Emily Dinan, Angela Fan, Ledell Wu, Jason Weston, Douwe Kiela, Adina Williams
We show our classifiers prove valuable for a variety of important applications, such as controlling for gender bias in generative models, detecting gender bias in arbitrary text, and shed light on offensive language in terms of genderedness.
1 code implementation • ACL 2020 • Tiago Pimentel, Josef Valvoda, Rowan Hall Maudslay, Ran Zmigrod, Adina Williams, Ryan Cotterell
The success of neural networks on a diverse set of NLP tasks has led researchers to question how much these networks actually ``know'' about natural language.
1 code implementation • ACL 2020 • Paloma Jeretic, Alex Warstadt, Suvrat Bhooshan, Adina Williams
We use IMPPRES to evaluate whether BERT, InferSent, and BOW NLI models trained on MultiNLI (Williams et al., 2018) learn to make pragmatic inferences.
no code implementations • EMNLP 2020 • Emily Dinan, Angela Fan, Adina Williams, Jack Urbanek, Douwe Kiela, Jason Weston
Models often easily learn biases present in the training data, and their predictions directly reflect this bias.
2 code implementations • ACL 2020 • Yixin Nie, Adina Williams, Emily Dinan, Mohit Bansal, Jason Weston, Douwe Kiela
We introduce a new large-scale NLI benchmark dataset, collected via an iterative, adversarial human-and-model-in-the-loop procedure.
no code implementations • IJCNLP 2019 • Adina Williams, Ryan Cotterell, Lawrence Wolf-Sonkin, Damián Blasi, Hanna Wallach
To that end, we use canonical correlation analysis to correlate the grammatical gender of inanimate nouns with an externally grounded definition of their lexical semantics.
no code implementations • NAACL 2019 • Shijia Liu, Hongyuan Mei, Adina Williams, Ryan Cotterell
While idiosyncrasies of the Chinese classifier system have been a richly studied topic among linguists (Adams and Conklin, 1973; Erbaugh, 1986; Lakoff, 1986), not much work has been done to quantify them with statistical methods.
no code implementations • WS 2019 • Katharina Kann, Alex Warstadt, Adina Williams, Samuel R. Bowman
For converging evidence, we further construct LaVA, a corresponding word-level dataset, and investigate whether the same syntactic features can be extracted from word embeddings.
10 code implementations • EMNLP 2018 • Alexis Conneau, Guillaume Lample, Ruty Rinott, Adina Williams, Samuel R. Bowman, Holger Schwenk, Veselin Stoyanov
State-of-the-art natural language processing systems rely on supervision in the form of annotated data to learn competent models.
Cross-Lingual Natural Language Inference
Machine Translation
+2
1 code implementation • TACL 2018 • Adina Williams, Andrew Drozdov, Samuel R. Bowman
Recent work on the problem of latent tree learning has made it possible to train neural networks that learn to both parse a sentence and use the resulting parse to interpret the sentence, all without exposure to ground-truth parse trees at training time.
no code implementations • WS 2017 • Nikita Nangia, Adina Williams, Angeliki Lazaridou, Samuel R. Bowman
This paper presents the results of the RepEval 2017 Shared Task, which evaluated neural network sentence representation learning models on the Multi-Genre Natural Language Inference corpus (MultiNLI) recently introduced by Williams et al. (2017).
3 code implementations • NAACL 2018 • Adina Williams, Nikita Nangia, Samuel R. Bowman
This paper introduces the Multi-Genre Natural Language Inference (MultiNLI) corpus, a dataset designed for use in the development and evaluation of machine learning models for sentence understanding.