Search Results for author: Ryan Cotterell

Found 191 papers, 90 papers with code

Efficient Sampling of Dependency Structure

1 code implementation EMNLP 2021 Ran Zmigrod, Tim Vieira, Ryan Cotterell

In this paper, we adapt two spanning tree sampling algorithms to faithfully sample dependency trees from a graph subject to the root constraint.

SIGMORPHON–UniMorph 2022 Shared Task 0: Generalization and Typologically Diverse Morphological Inflection

1 code implementation NAACL (SIGMORPHON) 2022 Jordan Kodner, Salam Khalifa, Khuyagbaatar Batsuren, Hossep Dolatian, Ryan Cotterell, Faruk Akkus, Antonios Anastasopoulos, Taras Andrushko, Aryaman Arora, Nona Atanalov, Gábor Bella, Elena Budianskaya, Yustinus Ghanggo Ate, Omer Goldman, David Guriel, Simon Guriel, Silvia Guriel-Agiashvili, Witold Kieraś, Andrew Krizhanovsky, Natalia Krizhanovsky, Igor Marchenko, Magdalena Markowska, Polina Mashkovtseva, Maria Nepomniashchaya, Daria Rodionova, Karina Scheifer, Alexandra Sorova, Anastasia Yemelina, Jeremiah Young, Ekaterina Vylomova

The 2022 SIGMORPHON–UniMorph shared task on large scale morphological inflection generation included a wide range of typologically diverse languages: 33 languages from 11 top-level language families: Arabic (Modern Standard), Assamese, Braj, Chukchi, Eastern Armenian, Evenki, Georgian, Gothic, Gujarati, Hebrew, Hungarian, Itelmen, Karelian, Kazakh, Ket, Khalkha Mongolian, Kholosi, Korean, Lamahalot, Low German, Ludic, Magahi, Middle Low German, Old English, Old High German, Old Norse, Polish, Pomak, Slovak, Turkish, Upper Sorbian, Veps, and Xibe.

Morphological Inflection

Measuring the Similarity of Grammatical Gender Systems by Comparing Partitions

no code implementations EMNLP 2020 Arya D. McCarthy, Adina Williams, Shijia Liu, David Yarowsky, Ryan Cotterell

Of particular interest, languages on the same branch of our phylogenetic tree are notably similar, whereas languages from separate branches are no more similar than chance.

Community Detection

A surprisal–duration trade-off across and within the world’s languages

1 code implementation EMNLP 2021 Tiago Pimentel, Clara Meister, Elizabeth Salesky, Simone Teufel, Damián Blasi, Ryan Cotterell

We thus conclude that there is strong evidence of a surprisal–duration trade-off in operation, both across and within the world’s languages.

Conditional Poisson Stochastic Beams

no code implementations EMNLP 2021 Clara Meister, Afra Amini, Tim Vieira, Ryan Cotterell

Beam search is the default decoding strategy for many sequence generation tasks in NLP.

The SIGTYP 2022 Shared Task on the Prediction of Cognate Reflexes

1 code implementation NAACL (SIGTYP) 2022 Johann-Mattis List, Ekaterina Vylomova, Robert Forkel, Nathan Hill, Ryan Cotterell

This study describes the structure and the results of the SIGTYP 2022 shared task on the prediction of cognate reflexes from multilingual wordlists.

Image Restoration

High probability or low information? The probability–quality paradox in language generation

no code implementations ACL 2022 Clara Meister, Gian Wiher, Tiago Pimentel, Ryan Cotterell

When generating natural language from neural probabilistic models, high probability does not always coincide with high quality.

Text Generation

An Analysis of On-the-fly Determinization of Finite-state Automata

no code implementations27 Aug 2023 Ivan Baburin, Ryan Cotterell

In this paper we establish an abstraction of on-the-fly determinization of finite-state automata using transition monoids and demonstrate how it can be applied to bound the asymptotics.

A Geometric Notion of Causal Probing

no code implementations27 Jul 2023 Clément Guerner, Anej Svete, Tianyu Liu, Alexander Warstadt, Ryan Cotterell

We show that our counterfactual notion of information in a subspace is optimized by a $\textit{causal}$ concept subspace.

On the Efficacy of Sampling Adapters

1 code implementation7 Jul 2023 Clara Meister, Tiago Pimentel, Luca Malagutti, Ethan G. Wilcox, Ryan Cotterell

While this trade-off is not reflected in standard metrics of distribution quality (such as perplexity), we find that several precision-emphasizing measures indeed indicate that sampling adapters can lead to probability distributions more aligned with the true distribution.

Text Generation

Testing the Predictions of Surprisal Theory in 11 Languages

no code implementations7 Jul 2023 Ethan Gotlieb Wilcox, Tiago Pimentel, Clara Meister, Ryan Cotterell, Roger P. Levy

We address this gap in the current literature by investigating the relationship between surprisal and reading times in eleven different languages, distributed across five language families.

Generalizing Backpropagation for Gradient-Based Interpretability

1 code implementation6 Jul 2023 Kevin Du, Lucas Torroba Hennigen, Niklas Stoehr, Alexander Warstadt, Ryan Cotterell

Many popular feature-attribution methods for interpreting deep neural networks rely on computing the gradients of a model's output with respect to its inputs.

Efficient Semiring-Weighted Earley Parsing

1 code implementation6 Jul 2023 Andreas Opedal, Ran Zmigrod, Tim Vieira, Ryan Cotterell, Jason Eisner

This paper provides a reference description, in the form of a deduction system, of Earley's (1970) context-free parsing algorithm with various speed-ups.

A Formal Perspective on Byte-Pair Encoding

1 code implementation29 Jun 2023 Vilém Zouhar, Clara Meister, Juan Luis Gastaldi, Li Du, Tim Vieira, Mrinmaya Sachan, Ryan Cotterell

Via submodular functions, we prove that the iterative greedy version is a $\frac{1}{{\sigma(\boldsymbol{\mu}^\star)}}(1-e^{-{\sigma(\boldsymbol{\mu}^\star)}})$-approximation of an optimal merge sequence, where ${\sigma(\boldsymbol{\mu}^\star)}$ is the total backward curvature with respect to the optimal merge sequence $\boldsymbol{\mu}^\star$.

Combinatorial Optimization

Hexatagging: Projective Dependency Parsing as Tagging

1 code implementation8 Jun 2023 Afra Amini, Tianyu Liu, Ryan Cotterell

We introduce a novel dependency parser, the hexatagger, that constructs dependency trees by tagging the words in a sentence with elements from a finite set of possible tags.

Dependency Parsing Language Modelling

Convergence and Diversity in the Control Hierarchy

no code implementations6 Jun 2023 Alexandra Butoi, Ryan Cotterell, David Chiang

Furthermore, using an even stricter notion of equivalence called d-strong equivalence, we make precise the intuition that a CFG controlling a CFG is a TAG, a PDA controlling a PDA is an embedded PDA, and a PDA controlling a CFG is a LIG.


Structured Voronoi Sampling

no code implementations5 Jun 2023 Afra Amini, Li Du, Ryan Cotterell

In this paper, we take an important step toward building a principled approach for sampling from language models with gradient-based methods.

Text Generation

Learning the String Partial Order

no code implementations24 May 2023 Tianyu Liu, Afra Amini, Mrinmaya Sachan, Ryan Cotterell

We show that most structured prediction problems can be solved in linear time and space by considering them as partial orderings of the tokens in the input string.

coreference-resolution Dependency Parsing +1

All Roads Lead to Rome? Exploring the Invariance of Transformers' Representations

1 code implementation23 May 2023 Yuxin Ren, Qipeng Guo, Zhijing Jin, Shauli Ravfogel, Mrinmaya Sachan, Bernhard Schölkopf, Ryan Cotterell

Transformer models bring propelling advances in various NLP tasks, thus inducing lots of interpretability research on the learned representations of the models.

RecurrentGPT: Interactive Generation of (Arbitrarily) Long Text

2 code implementations22 May 2023 Wangchunshu Zhou, Yuchen Eleanor Jiang, Peng Cui, Tiannan Wang, Zhenxin Xiao, Yifan Hou, Ryan Cotterell, Mrinmaya Sachan

In addition to producing AI-generated content (AIGC), we also demonstrate the possibility of using RecurrentGPT as an interactive fiction that directly interacts with consumers.

Language Modelling Large Language Model

Efficient Prompting via Dynamic In-Context Learning

no code implementations18 May 2023 Wangchunshu Zhou, Yuchen Eleanor Jiang, Ryan Cotterell, Mrinmaya Sachan

To achieve this, we train a meta controller that predicts the number of in-context examples suitable for the generalist model to make a good prediction based on the performance-efficiency trade-off for a specific input.

Controlled Text Generation with Natural Language Instructions

1 code implementation27 Apr 2023 Wangchunshu Zhou, Yuchen Eleanor Jiang, Ethan Wilcox, Ryan Cotterell, Mrinmaya Sachan

Large language models generate fluent texts and can follow natural language instructions to solve a wide range of tasks without task-specific training.

Language Modelling Text Generation

Discriminative Class Tokens for Text-to-Image Diffusion Models

1 code implementation30 Mar 2023 Idan Schwartz, Vésteinn Snæbjarnarson, Hila Chefer, Ryan Cotterell, Serge Belongie, Lior Wolf, Sagie Benaim

This approach has two disadvantages: (i) supervised datasets are generally small compared to large-scale scraped text-image datasets on which text-to-image models are trained, affecting the quality and diversity of the generated images, or (ii) the input is a hard-coded label, as opposed to free-form text, limiting the control over the generated images.

Algorithms for Acyclic Weighted Finite-State Automata with Failure Arcs

1 code implementation17 Jan 2023 Anej Svete, Benjamin Dayan, Tim Vieira, Ryan Cotterell, Jason Eisner

The pathsum in ordinary acyclic WFSAs is efficiently computed by the backward algorithm in time $O(|E|)$, where $E$ is the set of transitions.

A Measure-Theoretic Characterization of Tight Language Models

no code implementations20 Dec 2022 Li Du, Lucas Torroba Hennigen, Tiago Pimentel, Clara Meister, Jason Eisner, Ryan Cotterell

Language modeling, a central task in natural language processing, involves estimating a probability distribution over strings.

Language Modelling

The Ordered Matrix Dirichlet for State-Space Models

1 code implementation8 Dec 2022 Niklas Stoehr, Benjamin J. Radford, Ryan Cotterell, Aaron Schein

For discrete data, SSMs commonly do so through a state-to-action emission matrix and a state-to-state transition matrix.

On the Effect of Anticipation on Reading Times

1 code implementation25 Nov 2022 Tiago Pimentel, Clara Meister, Ethan G. Wilcox, Roger Levy, Ryan Cotterell

We assess the effect of anticipation on reading by comparing how well surprisal and contextual entropy predict reading times on four naturalistic reading datasets: two self-paced and two eye-tracking.

Schrödinger's Bat: Diffusion Models Sometimes Generate Polysemous Words in Superposition

1 code implementation23 Nov 2022 Jennifer C. White, Ryan Cotterell

Recent work has shown that despite their impressive capabilities, text-to-image diffusion models such as DALL-E 2 (Ramesh et al., 2022) can display strange behaviours when a prompt contains a word with multiple possible meanings, often generating images containing both senses of the word (Rassin et al., 2022).

On Parsing as Tagging

1 code implementation14 Nov 2022 Afra Amini, Ryan Cotterell

There have been many proposals to reduce constituency parsing to tagging in the literature.

Constituency Parsing

The Architectural Bottleneck Principle

no code implementations11 Nov 2022 Tiago Pimentel, Josef Valvoda, Niklas Stoehr, Ryan Cotterell

This shift in perspective leads us to propose a new principle for probing, the architectural bottleneck principle: In order to estimate how much information a given component could extract, a probe should look exactly like the component.

Open-Ended Question Answering

Autoregressive Structured Prediction with Language Models

1 code implementation26 Oct 2022 Tianyu Liu, Yuchen Jiang, Nicholas Monath, Ryan Cotterell, Mrinmaya Sachan

Recent years have seen a paradigm shift in NLP towards using pretrained language models ({PLM}) for a wide range of tasks.

 Ranked #1 on Relation Extraction on CoNLL04 (RE+ Micro F1 metric)

Named Entity Recognition Named Entity Recognition (NER) +2

Investigating the Role of Centering Theory in the Context of Neural Coreference Resolution Systems

no code implementations26 Oct 2022 Yuchen Eleanor Jiang, Ryan Cotterell, Mrinmaya Sachan

Our analysis further shows that contextualized embeddings contain much of the coherence information, which helps explain why CT can only provide little gains to modern neural coreference resolvers which make use of pretrained representations.


A Bilingual Parallel Corpus with Discourse Annotations

no code implementations26 Oct 2022 Yuchen Eleanor Jiang, Tianyu Liu, Shuming Ma, Dongdong Zhang, Mrinmaya Sachan, Ryan Cotterell

The BWB corpus consists of Chinese novels translated by experts into English, and the annotated test set is designed to probe the ability of machine translation systems to model various discourse phenomena.

Document Level Machine Translation Machine Translation +1

Mutual Information Alleviates Hallucinations in Abstractive Summarization

1 code implementation24 Oct 2022 Liam van der Poel, Ryan Cotterell, Clara Meister

Despite significant progress in the quality of language generated from abstractive summarization models, these models still exhibit the tendency to hallucinate, i. e., output content not supported by the source document.

Abstractive Text Summarization

Log-linear Guardedness and its Implications

no code implementations18 Oct 2022 Shauli Ravfogel, Yoav Goldberg, Ryan Cotterell

Methods for erasing human-interpretable concepts from neural representations that assume linearity have been found to be tractable and useful.

Algorithms for Weighted Pushdown Automata

1 code implementation13 Oct 2022 Alexandra Butoi, Brian DuSell, Tim Vieira, Ryan Cotterell, David Chiang

Weighted pushdown automata (WPDAs) are at the core of many natural language processing tasks, like syntax-based statistical machine translation and transition-based dependency parsing.

Machine Translation Transition-Based Dependency Parsing

An Ordinal Latent Variable Model of Conflict Intensity

1 code implementation8 Oct 2022 Niklas Stoehr, Lucas Torroba Hennigen, Josef Valvoda, Robert West, Ryan Cotterell, Aaron Schein

It is based only on the action category ("what") and disregards the subject ("who") and object ("to whom") of an event, as well as contextual information, like associated casualty count, that should contribute to the perception of an event's "intensity".

Event Extraction

Equivariant Transduction through Invariant Alignment

1 code implementation COLING 2022 Jennifer C. White, Ryan Cotterell

The ability to generalize compositionally is key to understanding the potentially infinite number of sentences that can be constructed in a human language from only a finite number of words.

Inductive Bias

On the Intersection of Context-Free and Regular Languages

1 code implementation14 Sep 2022 Clemente Pasti, Andreas Opedal, Tiago Pimentel, Tim Vieira, Jason Eisner, Ryan Cotterell

It shows, by a simple construction, that the intersection of a context-free language and a regular language is itself context-free.

On the Role of Negative Precedent in Legal Outcome Prediction

1 code implementation17 Aug 2022 Josef Valvoda, Ryan Cotterell, Simone Teufel

In contrast, we turn our focus to negative outcomes here, and introduce a new task of negative outcome prediction.

Visual Comparison of Language Model Adaptation

no code implementations17 Aug 2022 Rita Sevastjanova, Eren Cakmak, Shauli Ravfogel, Ryan Cotterell, Mennatallah El-Assady

The simplicity of adapter training and composition comes along with new challenges, such as maintaining an overview of adapter properties and effectively comparing their produced embedding spaces.

Language Modelling

Probing via Prompting

1 code implementation NAACL 2022 Jiaoda Li, Ryan Cotterell, Mrinmaya Sachan

We then examine the usefulness of a specific linguistic property for pre-training by removing the heads that are essential to that property and evaluating the resulting model's performance on language modeling.

Language Modelling

Naturalistic Causal Probing for Morpho-Syntax

1 code implementation14 May 2022 Afra Amini, Tiago Pimentel, Clara Meister, Ryan Cotterell

Probing has become a go-to methodology for interpreting and analyzing deep neural models in natural language processing.

A Structured Span Selector

1 code implementation NAACL 2022 Tianyu Liu, Yuchen Eleanor Jiang, Ryan Cotterell, Mrinmaya Sachan

Many natural language processing tasks, e. g., coreference resolution and semantic role labeling, require selecting text spans and making decisions about them.

coreference-resolution Inductive Bias +1

UniMorph 4.0: Universal Morphology

no code implementations LREC 2022 Khuyagbaatar Batsuren, Omer Goldman, Salam Khalifa, Nizar Habash, Witold Kieraś, Gábor Bella, Brian Leonard, Garrett Nicolai, Kyle Gorman, Yustinus Ghanggo Ate, Maria Ryskina, Sabrina J. Mielke, Elena Budianskaya, Charbel El-Khaissi, Tiago Pimentel, Michael Gasser, William Lane, Mohit Raj, Matt Coler, Jaime Rafael Montoya Samame, Delio Siticonatzi Camaiteri, Benoît Sagot, Esaú Zumaeta Rojas, Didier López Francis, Arturo Oncevay, Juan López Bautista, Gema Celeste Silva Villegas, Lucas Torroba Hennigen, Adam Ek, David Guriel, Peter Dirix, Jean-Philippe Bernardy, Andrey Scherbakov, Aziyana Bayyr-ool, Antonios Anastasopoulos, Roberto Zariquiey, Karina Sheifer, Sofya Ganieva, Hilaria Cruz, Ritván Karahóǧa, Stella Markantonatou, George Pavlidis, Matvey Plugaryov, Elena Klyachko, Ali Salehi, Candy Angulo, Jatayu Baxi, Andrew Krizhanovsky, Natalia Krizhanovskaya, Elizabeth Salesky, Clara Vania, Sardana Ivanova, Jennifer White, Rowan Hall Maudslay, Josef Valvoda, Ran Zmigrod, Paula Czarnowska, Irene Nikkarinen, Aelita Salchak, Brijesh Bhatt, Christopher Straughn, Zoey Liu, Jonathan North Washington, Yuval Pinter, Duygu Ataman, Marcin Wolinski, Totok Suhardijanto, Anna Yablonskaya, Niklas Stoehr, Hossep Dolatian, Zahroh Nuriah, Shyam Ratan, Francis M. Tyers, Edoardo M. Ponti, Grant Aiton, Aryaman Arora, Richard J. Hatcher, Ritesh Kumar, Jeremiah Young, Daria Rodionova, Anastasia Yemelina, Taras Andrushko, Igor Marchenko, Polina Mashkovtseva, Alexandra Serova, Emily Prud'hommeaux, Maria Nepomniashchaya, Fausto Giunchiglia, Eleanor Chodroff, Mans Hulden, Miikka Silfverberg, Arya D. McCarthy, David Yarowsky, Ryan Cotterell, Reut Tsarfaty, Ekaterina Vylomova

The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema.

Morphological Inflection

Same Neurons, Different Languages: Probing Morphosyntax in Multilingual Pre-trained Models

1 code implementation NAACL 2022 Karolina Stańczak, Edoardo Ponti, Lucas Torroba Hennigen, Ryan Cotterell, Isabelle Augenstein

The success of multilingual pre-trained models is underpinned by their ability to learn representations shared by multiple languages even in absence of any explicit supervision.

Exact Paired-Permutation Testing for Structured Test Statistics

1 code implementation NAACL 2022 Ran Zmigrod, Tim Vieira, Ryan Cotterell

However, practitioners rely on Monte Carlo approximation to perform this test due to a lack of a suitable exact algorithm.

Probing for the Usage of Grammatical Number

no code implementations ACL 2022 Karim Lasri, Tiago Pimentel, Alessandro Lenci, Thierry Poibeau, Ryan Cotterell

We also find that BERT uses a separate encoding of grammatical number for nouns and verbs.

Estimating the Entropy of Linguistic Distributions

no code implementations ACL 2022 Aryaman Arora, Clara Meister, Ryan Cotterell

Shannon entropy is often a quantity of interest to linguists studying the communicative capacity of human language.

On the probability-quality paradox in language generation

no code implementations31 Mar 2022 Clara Meister, Gian Wiher, Tiago Pimentel, Ryan Cotterell

Specifically, we posit that human-like language should contain an amount of information (quantified as negative log-probability) that is close to the entropy of the distribution over natural strings.

Text Generation

Analyzing Wrap-Up Effects through an Information-Theoretic Lens

no code implementations ACL 2022 Clara Meister, Tiago Pimentel, Thomas Hikaru Clark, Ryan Cotterell, Roger Levy

Numerous analyses of reading time (RT) data have been implemented -- all in an effort to better understand the cognitive processes driving reading comprehension.

Reading Comprehension

On Decoding Strategies for Neural Text Generators

no code implementations29 Mar 2022 Gian Wiher, Clara Meister, Ryan Cotterell

For example, the nature of the diversity-quality trade-off in language generation is very task-specific; the length bias often attributed to beam search is not constant across tasks.

Machine Translation Story Generation

Locally Typical Sampling

2 code implementations1 Feb 2022 Clara Meister, Tiago Pimentel, Gian Wiher, Ryan Cotterell

Automatic and human evaluations show that, in comparison to nucleus and top-k sampling, locally typical sampling offers competitive performance (in both abstractive summarization and story generation) in terms of quality while consistently reducing degenerate repetitions.

Abstractive Text Summarization Story Generation

Linear Adversarial Concept Erasure

2 code implementations28 Jan 2022 Shauli Ravfogel, Michael Twiton, Yoav Goldberg, Ryan Cotterell

Modern neural models trained on textual data rely on pre-trained representations that emerge without direct supervision.

Kernelized Concept Erasure

1 code implementation28 Jan 2022 Shauli Ravfogel, Francisco Vargas, Yoav Goldberg, Ryan Cotterell

One prominent approach for the identification of concepts in neural representations is searching for a linear subspace whose erasure prevents the prediction of the concept from the representations.

A Latent-Variable Model for Intrinsic Probing

2 code implementations20 Jan 2022 Karolina Stańczak, Lucas Torroba Hennigen, Adina Williams, Ryan Cotterell, Isabelle Augenstein

The success of pre-trained contextualized representations has prompted researchers to analyze them for the presence of linguistic information.

Probing as Quantifying Inductive Bias

1 code implementation ACL 2022 Alexander Immer, Lucas Torroba Hennigen, Vincent Fortuin, Ryan Cotterell

Such performance improvements have motivated researchers to quantify and understand the linguistic information encoded in these representations.

Bayesian Inference Inductive Bias

A surprisal--duration trade-off across and within the world's languages

1 code implementation30 Sep 2021 Tiago Pimentel, Clara Meister, Elizabeth Salesky, Simone Teufel, Damián Blasi, Ryan Cotterell

We thus conclude that there is strong evidence of a surprisal--duration trade-off in operation, both across and within the world's languages.

On Homophony and Rényi Entropy

1 code implementation EMNLP 2021 Tiago Pimentel, Clara Meister, Simone Teufel, Ryan Cotterell

Homophony's widespread presence in natural languages is a controversial topic.

Revisiting the Uniform Information Density Hypothesis

no code implementations EMNLP 2021 Clara Meister, Tiago Pimentel, Patrick Haller, Lena Jäger, Ryan Cotterell, Roger Levy

The uniform information density (UID) hypothesis posits a preference among language users for utterances structured such that information is distributed uniformly across a signal.

Linguistic Acceptability

Conditional Poisson Stochastic Beam Search

1 code implementation22 Sep 2021 Clara Meister, Afra Amini, Tim Vieira, Ryan Cotterell

In this work, we propose a new method for turning beam search into a stochastic process: Conditional Poisson stochastic beam search.

Efficient Sampling of Dependency Structures

no code implementations14 Sep 2021 Ran Zmigrod, Tim Vieira, Ryan Cotterell

Colbourn (1996)'s sampling algorithm has a running time of $\mathcal{O}(N^3)$, which is often greater than the mean hitting time of a directed graph.

Searching for More Efficient Dynamic Programs

no code implementations Findings (EMNLP) 2021 Tim Vieira, Ryan Cotterell, Jason Eisner

To this end, we describe a set of program transformations, a simple metric for assessing the efficiency of a transformed program, and a heuristic search procedure to improve this metric.

A Bayesian Framework for Information-Theoretic Probing

1 code implementation EMNLP 2021 Tiago Pimentel, Ryan Cotterell

Pimentel et al. (2020) recently analysed probing from an information-theoretic perspective.

Differentiable Subset Pruning of Transformer Heads

2 code implementations10 Aug 2021 Jiaoda Li, Ryan Cotterell, Mrinmaya Sachan

Multi-head attention, a collection of several attention mechanisms that independently attend to different parts of the input, is the key ingredient in the Transformer.

Machine Translation Natural Language Inference +1

Towards Zero-shot Language Modeling

no code implementations IJCNLP 2019 Edoardo Maria Ponti, Ivan Vulić, Ryan Cotterell, Roi Reichart, Anna Korhonen

Motivated by this question, we aim at constructing an informative prior over neural weights, in order to adapt quickly to held-out languages in the task of character-level language modeling.

Language Modelling

On Finding the K-best Non-projective Dependency Trees

1 code implementation ACL 2021 Ran Zmigrod, Tim Vieira, Ryan Cotterell

Furthermore, we present a novel extension of the algorithm for decoding the K-best dependency trees of a graph which are subject to a root constraint.

Dependency Parsing

Do Syntactic Probes Probe Syntax? Experiments with Jabberwocky Probing

no code implementations NAACL 2021 Rowan Hall Maudslay, Ryan Cotterell

One method of doing so, which is frequently cited to support the claim that models like BERT encode syntax, is called probing; probes are small supervised models trained to extract linguistic information from another model's output.

Modeling the Unigram Distribution

1 code implementation Findings (ACL) 2021 Irene Nikkarinen, Tiago Pimentel, Damián E. Blasi, Ryan Cotterell

The unigram distribution is the non-contextual probability of finding a specific word form in a corpus.

Is Sparse Attention more Interpretable?

no code implementations ACL 2021 Clara Meister, Stefan Lazov, Isabelle Augenstein, Ryan Cotterell

Sparse attention has been claimed to increase model interpretability under the assumption that it highlights influential inputs.

text-classification Text Classification

Examining the Inductive Bias of Neural Language Models with Artificial Languages

1 code implementation ACL 2021 Jennifer C. White, Ryan Cotterell

Since language models are used to model a wide variety of languages, it is natural to ask whether the neural architectures used for the task have inductive biases towards modeling particular types of languages.

Inductive Bias

Higher-order Derivatives of Weighted Finite-state Machines

no code implementations ACL 2021 Ran Zmigrod, Tim Vieira, Ryan Cotterell

In the case of second-order derivatives, our scheme runs in the optimal $\mathcal{O}(A^2 N^4)$ time where $A$ is the alphabet size and $N$ is the number of states.

On Finding the $K$-best Non-projective Dependency Trees

1 code implementation1 Jun 2021 Ran Zmigrod, Tim Vieira, Ryan Cotterell

Furthermore, we present a novel extension of the algorithm for decoding the $K$-best dependency trees of a graph which are subject to a root constraint.

Dependency Parsing

Language Model Evaluation Beyond Perplexity

no code implementations ACL 2021 Clara Meister, Ryan Cotterell

As concrete examples, text generated under the nucleus sampling scheme adheres more closely to the type--token relationship of natural language than text produced using standard ancestral sampling; text from LSTMs reflects the natural language distributions over length, stopwords, and symbols surprisingly well.

Language Modelling

A Non-Linear Structural Probe

no code implementations NAACL 2021 Jennifer C. White, Tiago Pimentel, Naomi Saphra, Ryan Cotterell

Probes are models devised to investigate the encoding of knowledge -- e. g. syntactic structure -- in contextual representations.

A Cognitive Regularizer for Language Modeling

no code implementations ACL 2021 Jason Wei, Clara Meister, Ryan Cotterell

The uniform information density (UID) hypothesis, which posits that speakers behaving optimally tend to distribute information uniformly across a linguistic signal, has gained traction in psycholinguistics as an explanation for certain syntactic, morphological, and prosodic choices.

Inductive Bias Language Modelling

How (Non-)Optimal is the Lexicon?

no code implementations NAACL 2021 Tiago Pimentel, Irene Nikkarinen, Kyle Mahowald, Ryan Cotterell, Damián Blasi

Examining corpora from 7 typologically diverse languages, we use those upper bounds to quantify the lexicon's optimality and to explore the relative costs of major constraints on natural codes.

Quantifying Gender Bias Towards Politicians in Cross-Lingual Language Models

no code implementations15 Apr 2021 Karolina Stańczak, Sagnik Ray Choudhury, Tiago Pimentel, Ryan Cotterell, Isabelle Augenstein

While the prevalence of large pre-trained language models has led to significant improvements in the performance of NLP systems, recent research has demonstrated that these models inherit societal biases extant in natural language.

Language Modelling

Finding Concept-specific Biases in Form--Meaning Associations

2 code implementations NAACL 2021 Tiago Pimentel, Brian Roark, Søren Wichmann, Ryan Cotterell, Damián Blasi

It is not a new idea that there are small, cross-linguistic associations between the forms and meanings of words.

Differentiable Generative Phonology

1 code implementation10 Feb 2021 Shijie Wu, Edoardo Maria Ponti, Ryan Cotterell

As the main contribution of our work, we implement the phonological generative system as a neural model differentiable end-to-end, rather than as a set of rules or constraints.

Disambiguatory Signals are Stronger in Word-initial Positions

1 code implementation EACL 2021 Tiago Pimentel, Ryan Cotterell, Brian Roark

Psycholinguistic studies of human word processing and lexical access provide ample evidence of the preferred nature of word-initial versus word-final segments, e. g., in terms of attention paid by listeners (greater) or the likelihood of reduction by speakers (lower).


Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework of Vision-and-Language BERTs

2 code implementations30 Nov 2020 Emanuele Bugliarello, Ryan Cotterell, Naoaki Okazaki, Desmond Elliott

Large-scale pretraining and task-specific fine-tuning is now the standard methodology for many tasks in computer vision and natural language processing.

Morphologically Aware Word-Level Translation

no code implementations COLING 2020 Paula Czarnowska, Sebastian Ruder, Ryan Cotterell, Ann Copestake

We propose a novel morphologically aware probability model for bilingual lexicon induction, which jointly models lexeme translation and inflectional morphology in a structured way.

Bilingual Lexicon Induction Translation

Investigating Cross-Linguistic Adjective Ordering Tendencies with a Latent-Variable Model

no code implementations EMNLP 2020 Jun Yen Leung, Guy Emerson, Ryan Cotterell

Across languages, multiple consecutive adjectives modifying a noun (e. g. "the big red dog") follow certain unmarked ordering rules.

If beam search is the answer, what was the question?

1 code implementation EMNLP 2020 Clara Meister, Tim Vieira, Ryan Cotterell

This implies that the MAP objective alone does not express the properties we desire in text, which merits the question: if beam search is the answer, what was the question?

Machine Translation Text Generation +1

Please Mind the Root: Decoding Arborescences for Dependency Parsing

1 code implementation EMNLP 2020 Ran Zmigrod, Tim Vieira, Ryan Cotterell

The connection between dependency trees and spanning trees is exploited by the NLP community to train and to decode graph-based dependency parsers.

Dependency Parsing

Intrinsic Probing through Dimension Selection

1 code implementation EMNLP 2020 Lucas Torroba Hennigen, Adina Williams, Ryan Cotterell

Most modern NLP systems make use of pre-trained contextual representations that attain astonishingly high performance on a variety of tasks.

Word Embeddings

Speakers Fill Lexical Semantic Gaps with Context

1 code implementation EMNLP 2020 Tiago Pimentel, Rowan Hall Maudslay, Damián Blasi, Ryan Cotterell

For a language to be clear and efficiently encoded, we posit that the lexical ambiguity of a word type should correlate with how much information context provides about it, on average.

Pareto Probing: Trading Off Accuracy for Complexity

1 code implementation EMNLP 2020 Tiago Pimentel, Naomi Saphra, Adina Williams, Ryan Cotterell

In our contribution to this discussion, we argue for a probe metric that reflects the fundamental trade-off between probe complexity and performance: the Pareto hypervolume.

Dependency Parsing

Exploring the Linear Subspace Hypothesis in Gender Bias Mitigation

1 code implementation EMNLP 2020 Francisco Vargas, Ryan Cotterell

Their method takes pre-trained word embeddings as input and attempts to isolate a linear subspace that captures most of the gender bias in the embeddings.

Word Embeddings

Efficient Computation of Expectations under Spanning Tree Distributions

no code implementations29 Aug 2020 Ran Zmigrod, Tim Vieira, Ryan Cotterell

We propose unified algorithms for the important cases of first-order expectations and second-order expectations in edge-factored, non-projective spanning-tree models.

Best-First Beam Search

1 code implementation8 Jul 2020 Clara Meister, Tim Vieira, Ryan Cotterell

Decoding for many NLP tasks requires an effective heuristic algorithm for approximating exact search since the problem of searching the full output space is often intractable, or impractical in many settings.

Metaphor Detection using Context and Concreteness

no code implementations WS 2020 Rowan Hall Maudslay, Tiago Pimentel, Ryan Cotterell, Simone Teufel

We report the results of our system on the Metaphor Detection Shared Task at the Second Workshop on Figurative Language Processing 2020.

A Corpus for Large-Scale Phonetic Typology

no code implementations ACL 2020 Elizabeth Salesky, Eleanor Chodroff, Tiago Pimentel, Matthew Wiesner, Ryan Cotterell, Alan W. black, Jason Eisner

A major hurdle in data-driven research on typology is having sufficient data in many languages to draw meaningful conclusions.

Applying the Transformer to Character-level Transduction

2 code implementations EACL 2021 Shijie Wu, Ryan Cotterell, Mans Hulden

The transformer has been shown to outperform recurrent neural network-based sequence-to-sequence models in various word-level NLP tasks.

Morphological Inflection Transliteration

Phonotactic Complexity and its Trade-offs

1 code implementation TACL 2020 Tiago Pimentel, Brian Roark, Ryan Cotterell

We present methods for calculating a measure of phonotactic complexity---bits per phoneme---that permits a straightforward cross-linguistic comparison.

A Tale of a Probe and a Parser

1 code implementation ACL 2020 Rowan Hall Maudslay, Josef Valvoda, Tiago Pimentel, Adina Williams, Ryan Cotterell

One such probe is the structural probe (Hewitt and Manning, 2019), designed to quantify the extent to which syntactic information is encoded in contextualised word representations.

Contextualised Word Representations

The Paradigm Discovery Problem

1 code implementation ACL 2020 Alexander Erdmann, Micha Elsner, Shijie Wu, Ryan Cotterell, Nizar Habash

Our benchmark system first makes use of word embeddings and string similarity to cluster forms by cell and by paradigm.

Clustering Word Embeddings

On the Relationships Between the Grammatical Genders of Inanimate Nouns and Their Co-Occurring Adjectives and Verbs

no code implementations3 May 2020 Adina Williams, Ryan Cotterell, Lawrence Wolf-Sonkin, Damián Blasi, Hanna Wallach

We also find that there are statistically significant relationships between the grammatical genders of inanimate nouns and the verbs that take those nouns as direct objects, as indirect objects, and as subjects.

Generalized Entropy Regularization or: There's Nothing Special about Label Smoothing

no code implementations ACL 2020 Clara Meister, Elizabeth Salesky, Ryan Cotterell

Prior work has explored directly regularizing the output distributions of probabilistic models to alleviate peaky (i. e. over-confident) predictions, a common sign of overfitting.

Text Generation

Predicting Declension Class from Form and Meaning

1 code implementation ACL 2020 Adina Williams, Tiago Pimentel, Arya D. McCarthy, Hagen Blix, Eleanor Chodroff, Ryan Cotterell

We find for two Indo-European languages (Czech and German) that form and meaning respectively share significant amounts of information with class (and contribute additional information above and beyond gender).

Information-Theoretic Probing for Linguistic Structure

1 code implementation ACL 2020 Tiago Pimentel, Josef Valvoda, Rowan Hall Maudslay, Ran Zmigrod, Adina Williams, Ryan Cotterell

The success of neural networks on a diverse set of NLP tasks has led researchers to question how much these networks actually ``know'' about natural language.

Word Embeddings

Morphological Segmentation Inside-Out

no code implementations EMNLP 2016 Ryan Cotterell, Arun Kumar, Hinrich Schütze

Morphological segmentation has traditionally been modeled with non-hierarchical models, which yield flat segmentations as output.

Morphological Analysis

Quantifying the Semantic Core of Gender Systems

no code implementations IJCNLP 2019 Adina Williams, Ryan Cotterell, Lawrence Wolf-Sonkin, Damián Blasi, Hanna Wallach

To that end, we use canonical correlation analysis to correlate the grammatical gender of inanimate nouns with an externally grounded definition of their lexical semantics.

The SIGMORPHON 2019 Shared Task: Morphological Analysis in Context and Cross-Lingual Transfer for Inflection

no code implementations WS 2019 Arya D. McCarthy, Ekaterina Vylomova, Shijie Wu, Chaitanya Malaviya, Lawrence Wolf-Sonkin, Garrett Nicolai, Christo Kirov, Miikka Silfverberg, Sabrina J. Mielke, Jeffrey Heinz, Ryan Cotterell, Mans Hulden

The SIGMORPHON 2019 shared task on cross-lingual transfer and contextual analysis in morphology examined transfer learning of inflection between 100 language pairs, as well as contextual lemmatization and morphosyntactic description in 66 languages.

Cross-Lingual Transfer Lemmatization +3

It's All in the Name: Mitigating Gender Bias with Name-Based Counterfactual Data Substitution

no code implementations IJCNLP 2019 Rowan Hall Maudslay, Hila Gonen, Ryan Cotterell, Simone Teufel

An alternative approach is Counterfactual Data Augmentation (CDA), in which a corpus is duplicated and augmented to remove bias, e. g. by swapping all inherently-gendered words in the copy.

Data Augmentation Word Embeddings

Rethinking Phonotactic Complexity

no code implementations WS 2019 Tiago Pimentel, Brian Roark, Ryan Cotterell

In this work, we propose the use of phone-level language models to estimate phonotactic complexity{---}measured in bits per phoneme{---}which makes cross-linguistic comparison straightforward.

On the Distribution of Deep Clausal Embeddings: A Large Cross-linguistic Study

no code implementations ACL 2019 Damian Blasi, Ryan Cotterell, Lawrence Wolf-Sonkin, Sabine Stoll, Balthasar Bickel, Marco Baroni

Embedding a clause inside another ({``}the girl [who likes cars [that run fast]] has arrived{''}) is a fundamental resource that has been argued to be a key driver of linguistic expressiveness.

Uncovering Probabilistic Implications in Typological Knowledge Bases

no code implementations ACL 2019 Johannes Bjerva, Yova Kementchedjhieva, Ryan Cotterell, Isabelle Augenstein

The study of linguistic typology is rooted in the implications we find between linguistic features, such as the fact that languages with object-verb word ordering tend to have post-positions.

Knowledge Base Population

Meaning to Form: Measuring Systematicity as Information

1 code implementation ACL 2019 Tiago Pimentel, Arya D. McCarthy, Damián E. Blasi, Brian Roark, Ryan Cotterell

A longstanding debate in semiotics centers on the relationship between linguistic signs and their corresponding semantics: is there an arbitrary relationship between a word form and its meaning, or does some systematic phenomenon pervade?

What Kind of Language Is Hard to Language-Model?

no code implementations ACL 2019 Sabrina J. Mielke, Ryan Cotterell, Kyle Gorman, Brian Roark, Jason Eisner

Trying to answer the question of what features difficult languages have in common, we try and fail to reproduce our earlier (Cotterell et al., 2018) observation about morphological complexity and instead reveal far simpler statistics of the data that seem to drive complexity in a much larger sample.

Language Modelling

Gender Bias in Contextualized Word Embeddings

1 code implementation NAACL 2019 Jieyu Zhao, Tianlu Wang, Mark Yatskar, Ryan Cotterell, Vicente Ordonez, Kai-Wei Chang

In this paper, we quantify, analyze and mitigate gender bias exhibited in ELMo's contextualized word vectors.

Word Embeddings

A Probabilistic Generative Model of Linguistic Typology

1 code implementation NAACL 2019 Johannes Bjerva, Yova Kementchedjhieva, Ryan Cotterell, Isabelle Augenstein

In the principles-and-parameters framework, the structural features of languages depend on parameters that may be toggled on or off, with a single parameter often dictating the status of multiple features.

On the Idiosyncrasies of the Mandarin Chinese Classifier System

no code implementations NAACL 2019 Shijia Liu, Hongyuan Mei, Adina Williams, Ryan Cotterell

While idiosyncrasies of the Chinese classifier system have been a richly studied topic among linguists (Adams and Conklin, 1973; Erbaugh, 1986; Lakoff, 1986), not much work has been done to quantify them with statistical methods.

The CoNLL--SIGMORPHON 2018 Shared Task: Universal Morphological Reinflection

no code implementations CONLL 2018 Ryan Cotterell, Christo Kirov, John Sylak-Glassman, Géraldine Walther, Ekaterina Vylomova, Arya D. McCarthy, Katharina Kann, Sabrina J. Mielke, Garrett Nicolai, Miikka Silfverberg, David Yarowsky, Jason Eisner, Mans Hulden

Apart from extending the number of languages involved in earlier supervised tasks of generating inflected forms, this year the shared task also featured a new second task which asked participants to inflect words in sentential context, similar to a cloze task.


Marrying Universal Dependencies and Universal Morphology

no code implementations WS 2018 Arya D. McCarthy, Miikka Silfverberg, Ryan Cotterell, Mans Hulden, David Yarowsky

The Universal Dependencies (UD) and Universal Morphology (UniMorph) projects each present schemata for annotating the morphosyntactic details of language.

Generalizing Procrustes Analysis for Better Bilingual Dictionary Induction

1 code implementation CONLL 2018 Yova Kementchedjhieva, Sebastian Ruder, Ryan Cotterell, Anders Søgaard

Most recent approaches to bilingual dictionary induction find a linear alignment between the word vector spaces of two languages.

Hard Non-Monotonic Attention for Character-Level Transduction

2 code implementations EMNLP 2018 Shijie Wu, Pamela Shapiro, Ryan Cotterell

We compare soft and hard non-monotonic attention experimentally and find that the exact algorithm significantly improves performance over the stochastic approximation and outperforms soft attention.

Hard Attention Image Captioning

Recurrent Neural Networks in Linguistic Theory: Revisiting Pinker and Prince (1988) and the Past Tense Debate

3 code implementations TACL 2018 Christo Kirov, Ryan Cotterell

We suggest that the empirical performance of modern networks warrants a re-examination of their utility in linguistic and cognitive modeling.

Explaining and Generalizing Back-Translation through Wake-Sleep

no code implementations12 Jun 2018 Ryan Cotterell, Julia Kreutzer

Back-translation has become a commonly employed heuristic for semi-supervised neural machine translation.

Machine Translation Translation

Are All Languages Equally Hard to Language-Model?

no code implementations NAACL 2018 Ryan Cotterell, Sabrina J. Mielke, Jason Eisner, Brian Roark

For general modeling methods applied to diverse languages, a natural question is: how well should we expect our models to work on languages with differing typological profiles?

Language Modelling

On the Diachronic Stability of Irregularity in Inflectional Morphology

no code implementations23 Apr 2018 Ryan Cotterell, Christo Kirov, Mans Hulden, Jason Eisner

Many languages' inflectional morphological systems are replete with irregulars, i. e., words that do not seem to follow standard inflectional rules.

Cross-lingual Character-Level Neural Morphological Tagging

no code implementations EMNLP 2017 Ryan Cotterell, Georg Heigold

Even for common NLP tasks, sufficient supervision is not available in many languages {--} morphological tagging is no exception.

Language Modelling Morphological Tagging +2

Paradigm Completion for Derivational Morphology

no code implementations EMNLP 2017 Ryan Cotterell, Ekaterina Vylomova, Huda Khayrallah, Christo Kirov, David Yarowsky

The generation of complex derived word forms has been an overlooked problem in NLP; we fill this gap by applying neural sequence-to-sequence models to the task.

Cross-lingual, Character-Level Neural Morphological Tagging

no code implementations30 Aug 2017 Ryan Cotterell, Georg Heigold

Even for common NLP tasks, sufficient supervision is not available in many languages -- morphological tagging is no exception.

Morphological Tagging Transfer Learning