Search Results for author: Ryan Cotterell

Found 242 papers, 119 papers with code

Conditional Poisson Stochastic Beams

no code implementations EMNLP 2021 Clara Meister, Afra Amini, Tim Vieira, Ryan Cotterell

Beam search is the default decoding strategy for many sequence generation tasks in NLP.

SIGMORPHON–UniMorph 2022 Shared Task 0: Generalization and Typologically Diverse Morphological Inflection

1 code implementation NAACL (SIGMORPHON) 2022 Jordan Kodner, Salam Khalifa, Khuyagbaatar Batsuren, Hossep Dolatian, Ryan Cotterell, Faruk Akkus, Antonios Anastasopoulos, Taras Andrushko, Aryaman Arora, Nona Atanalov, Gábor Bella, Elena Budianskaya, Yustinus Ghanggo Ate, Omer Goldman, David Guriel, Simon Guriel, Silvia Guriel-Agiashvili, Witold Kieraś, Andrew Krizhanovsky, Natalia Krizhanovsky, Igor Marchenko, Magdalena Markowska, Polina Mashkovtseva, Maria Nepomniashchaya, Daria Rodionova, Karina Scheifer, Alexandra Sorova, Anastasia Yemelina, Jeremiah Young, Ekaterina Vylomova

The 2022 SIGMORPHON–UniMorph shared task on large scale morphological inflection generation included a wide range of typologically diverse languages: 33 languages from 11 top-level language families: Arabic (Modern Standard), Assamese, Braj, Chukchi, Eastern Armenian, Evenki, Georgian, Gothic, Gujarati, Hebrew, Hungarian, Itelmen, Karelian, Kazakh, Ket, Khalkha Mongolian, Kholosi, Korean, Lamahalot, Low German, Ludic, Magahi, Middle Low German, Old English, Old High German, Old Norse, Polish, Pomak, Slovak, Turkish, Upper Sorbian, Veps, and Xibe.

Morphological Inflection

High probability or low information? The probability–quality paradox in language generation

no code implementations ACL 2022 Clara Meister, Gian Wiher, Tiago Pimentel, Ryan Cotterell

When generating natural language from neural probabilistic models, high probability does not always coincide with high quality.

Text Generation

A surprisal–duration trade-off across and within the world’s languages

1 code implementation EMNLP 2021 Tiago Pimentel, Clara Meister, Elizabeth Salesky, Simone Teufel, Damián Blasi, Ryan Cotterell

We thus conclude that there is strong evidence of a surprisal–duration trade-off in operation, both across and within the world’s languages.

Measuring the Similarity of Grammatical Gender Systems by Comparing Partitions

no code implementations EMNLP 2020 Arya D. McCarthy, Adina Williams, Shijia Liu, David Yarowsky, Ryan Cotterell

Of particular interest, languages on the same branch of our phylogenetic tree are notably similar, whereas languages from separate branches are no more similar than chance.

Community Detection

The SIGTYP 2022 Shared Task on the Prediction of Cognate Reflexes

1 code implementation NAACL (SIGTYP) 2022 Johann-Mattis List, Ekaterina Vylomova, Robert Forkel, Nathan Hill, Ryan Cotterell

This study describes the structure and the results of the SIGTYP 2022 shared task on the prediction of cognate reflexes from multilingual wordlists.

Image Restoration

Efficient Sampling of Dependency Structure

1 code implementation EMNLP 2021 Ran Zmigrod, Tim Vieira, Ryan Cotterell

In this paper, we adapt two spanning tree sampling algorithms to faithfully sample dependency trees from a graph subject to the root constraint.

Better Estimation of the KL Divergence Between Language Models

1 code implementation14 Apr 2025 Afra Amini, Tim Vieira, Ryan Cotterell

In this paper, we introduce a Rao--Blackwellized estimator that is also unbiased and provably has variance less than or equal to that of the standard Monte Carlo estimator.

Knowledge Distillation

Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora

1 code implementation10 Apr 2025 Alex Warstadt, Aaron Mueller, Leshem Choshen, Ethan Wilcox, Chengxu Zhuang, Juan Ciro, Rafael Mosquera, Bhargavi Paranjape, Adina Williams, Tal Linzen, Ryan Cotterell

These intensive resource demands limit the ability of researchers to train new models and use existing models as developmentally plausible cognitive models.

Unique Hard Attention: A Tale of Two Sides

no code implementations18 Mar 2025 Selim Jerad, Anej Svete, Jiaoda Li, Ryan Cotterell

We show that this no longer holds with only leftmost-hard attention -- in that case, they correspond to a \emph{strictly weaker} fragment of LTL.

Hard Attention

From Language Models over Tokens to Language Models over Characters

no code implementations4 Dec 2024 Tim Vieira, Ben LeBrun, Mario Giulianelli, Juan Luis Gastaldi, Brian DuSell, John Terilla, Timothy J. O'Donnell, Ryan Cotterell

Modern language models are internally -- and mathematically -- distributions over token strings rather than \emph{character} strings, posing numerous challenges for programmers building user applications on top of them.

Language Modeling Language Modelling

Likelihood as a Performance Gauge for Retrieval-Augmented Generation

1 code implementation12 Nov 2024 Tianyu Liu, Jirui Qi, Paul He, Arianna Bisazza, Mrinmaya Sachan, Ryan Cotterell

Based on these findings, we propose two methods that use question likelihood as a gauge for selecting and constructing prompts that lead to better performance.

Language Modeling Language Modelling +3

Controllable Context Sensitivity and the Knob Behind It

1 code implementation11 Nov 2024 Julian Minder, Kevin Du, Niklas Stoehr, Giovanni Monea, Chris Wendler, Robert West, Ryan Cotterell

In this paper, we search for a knob which controls this sensitivity, determining whether language models answer from the context or their prior knowledge.

Question Answering

Training Neural Networks as Recognizers of Formal Languages

2 code implementations11 Nov 2024 Alexandra Butoi, Ghazal Khalighinejad, Anej Svete, Josef Valvoda, Ryan Cotterell, Brian DuSell

We provide results on a variety of languages across the Chomsky hierarchy for three neural architectures: a simple RNN, an LSTM, and a causally-masked transformer.

Language Modeling Language Modelling

Gumbel Counterfactual Generation From Language Models

1 code implementation11 Nov 2024 Shauli Ravfogel, Anej Svete, Vésteinn Snæbjarnarson, Ryan Cotterell

Based on this observation, we propose a framework for generating true string counterfactuals by reformulating language models as a structural equation model using the Gumbel-max trick, which we called Gumbel counterfactual generation.

counterfactual Counterfactual Reasoning +1

An $\mathbf{L^*}$ Algorithm for Deterministic Weighted Regular Languages

no code implementations9 Nov 2024 Clemente Pasti, Talu Karagöz, Anej Svete, Franz Nowak, Reda Boumasmoud, Ryan Cotterell

Extracting finite state automata (FSAs) from black-box models offers a powerful approach to gaining interpretable insights into complex model behaviors.

Surprise! Uniform Information Density Isn't the Whole Story: Predicting Surprisal Contours in Long-form Discourse

no code implementations21 Oct 2024 Eleftheria Tsipidi, Franz Nowak, Ryan Cotterell, Ethan Wilcox, Mario Giulianelli, Alex Warstadt

While these fluctuations can be viewed as theoretically uninteresting noise on top of a uniform target, another explanation is that UID is not the only functional pressure regulating information content in a language.

Form

Efficiently Computing Susceptibility to Context in Language Models

no code implementations18 Oct 2024 Tianyu Liu, Kevin Du, Mrinmaya Sachan, Ryan Cotterell

However, exactly computing susceptibility is difficult and, thus, Du et al. (2024) falls back on a Monte Carlo approximation.

Reverse-Engineering the Reader

1 code implementation16 Oct 2024 Samuel Kiegeland, Ethan Gotlieb Wilcox, Afra Amini, David Robert Reich, Ryan Cotterell

Numerous previous studies have sought to determine to what extent language models, pretrained on natural language text, can serve as useful models of human cognition.

Language Modeling Language Modelling

Activation Scaling for Steering and Interpreting Language Models

no code implementations7 Oct 2024 Niklas Stoehr, Kevin Du, Vésteinn Snæbjarnarson, Robert West, Ryan Cotterell, Aaron Schein

Given the prompt "Rome is in", can we steer a language model to flip its prediction of an incorrect token "France" to a correct token "Italy" by only multiplying a few relevant activation vectors with scalars?

Language Modeling Language Modelling

Can Transformers Learn $n$-gram Language Models?

no code implementations3 Oct 2024 Anej Svete, Nadav Borenstein, Mike Zhou, Isabelle Augenstein, Ryan Cotterell

Much theoretical work has described the ability of transformers to represent formal languages.

On the Proper Treatment of Tokenization in Psycholinguistics

1 code implementation3 Oct 2024 Mario Giulianelli, Luca Malagutti, Juan Luis Gastaldi, Brian DuSell, Tim Vieira, Ryan Cotterell

The paper argues that token-level language models should be (approximately) marginalized into character-level language models before they are used in psycholinguistic studies to compute the surprisal of a region of interest; then, the marginalized character-level language model can be used to compute the surprisal of an arbitrary character substring, which we term a focal area, that the experimenter may wish to use as a predictor.

Language Modeling Language Modelling

Generalized Measures of Anticipation and Responsivity in Online Language Processing

1 code implementation16 Sep 2024 Mario Giulianelli, Andreas Opedal, Ryan Cotterell

We introduce a generalization of classic information-theoretic measures of predictive uncertainty in online language processing, based on the simulation of expected continuations of incremental linguistic contexts.

On the Role of Context in Reading Time Prediction

1 code implementation12 Sep 2024 Andreas Opedal, Eleanor Chodroff, Ryan Cotterell, Ethan Gotlieb Wilcox

Another one is the pointwise mutual information (PMI) between a unit and its context, which turns out to yield the same predictive power as surprisal when controlling for unigram frequency.

Language Modeling Language Modelling +1

Investigating Critical Period Effects in Language Acquisition through Neural Language Models

1 code implementation27 Jul 2024 Ionut Constantinescu, Tiago Pimentel, Ryan Cotterell, Alex Warstadt

We vary the age of exposure by training LMs on language pairs in various experimental conditions, and find that LMs, which lack any direct analog to innate maturational stages, do not show CP effects when the age of exposure of L2 is delayed.

Language Acquisition

The Foundations of Tokenization: Statistical and Computational Concerns

no code implementations16 Jul 2024 Juan Luis Gastaldi, John Terilla, Luca Malagutti, Brian DuSell, Tim Vieira, Ryan Cotterell

The present paper contributes to addressing this theoretical gap by proposing a unified formal framework for representing and analyzing tokenizer models.

Language Modeling Language Modelling

Variational Best-of-N Alignment

no code implementations8 Jul 2024 Afra Amini, Tim Vieira, Ryan Cotterell

To the extent this fine-tuning is successful and we end up with a good approximation, we have reduced the inference cost by a factor of N. Our experiments on a controlled generation task suggest that while variational BoN is not as effective as BoN in aligning language models, it is close to BoN performance as vBoN appears more often on the Pareto frontier of reward and KL divergence compared to models trained with KL-constrained RL objective.

Language Modeling Language Modelling +1

On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning

no code implementations20 Jun 2024 Franz Nowak, Anej Svete, Alexandra Butoi, Ryan Cotterell

We present several results on the representational capacity of recurrent and transformer LMs with CoT reasoning, showing that they can represent the same family of distributions over strings as probabilistic Turing machines.

A Probability--Quality Trade-off in Aligned Language Models and its Relation to Sampling Adaptors

no code implementations14 Jun 2024 Naaman Tan, Josef Valvoda, Tianyu Liu, Anej Svete, Yanxia Qin, Kan Min-Yen, Ryan Cotterell

The relationship between the quality of a string, as judged by a human reader, and its probability, $p(\boldsymbol{y})$ under a language model undergirds the development of better language models.

Language Modeling Language Modelling +2

Correlation Does Not Imply Compensation: Complexity and Irregularity in the Lexicon

no code implementations7 Jun 2024 Amanda Doucette, Ryan Cotterell, Morgan Sonderegger, Timothy J. O'Donnell

It has been claimed that within a language, morphologically irregular words are more likely to be phonotactically simple and morphologically regular words are more likely to be phonotactically complex.

What Do Language Models Learn in Context? The Structured Task Hypothesis

1 code implementation6 Jun 2024 Jiaoda Li, Yifan Hou, Mrinmaya Sachan, Ryan Cotterell

Large language models (LLMs) exhibit an intriguing ability to learn a novel task from in-context examples presented in a demonstration, termed in-context learning (ICL).

In-Context Learning Meta-Learning +2

What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages

no code implementations6 Jun 2024 Nadav Borenstein, Anej Svete, Robin Chan, Josef Valvoda, Franz Nowak, Isabelle Augenstein, Eleanor Chodroff, Ryan Cotterell

We find that the RLM rank, which corresponds to the size of linear space spanned by the logits of its conditional distributions, and the expected length of sampled strings are strong and significant predictors of learnability for both RNNs and Transformers.

Language Modeling Language Modelling

On Affine Homotopy between Language Encoders

no code implementations4 Jun 2024 Robin SM Chan, Reda Boumasmoud, Anej Svete, Yuxin Ren, Qipeng Guo, Zhijing Jin, Shauli Ravfogel, Mrinmaya Sachan, Bernhard Schölkopf, Mennatallah El-Assady, Ryan Cotterell

In this spirit, we study the properties of \emph{affine} alignment of language encoders and its implications on extrinsic similarity.

Lower Bounds on the Expressivity of Recurrent Neural Language Models

1 code implementation29 May 2024 Anej Svete, Franz Nowak, Anisha Mohamed Sahabdeen, Ryan Cotterell

The recent successes and spread of large neural language models (LMs) call for a thorough understanding of their computational ability.

Joint Lemmatization and Morphological Tagging with LEMMING

no code implementations EMNLP 2015 Thomas Muller, Ryan Cotterell, Alexander Fraser, Hinrich Schütze

We present LEMMING, a modular log-linear model that jointly models lemmatization and tagging and supports the integration of arbitrary global features.

Lemmatization Morphological Tagging

Transformers Can Represent $n$-gram Language Models

no code implementations23 Apr 2024 Anej Svete, Ryan Cotterell

This provides a first step towards understanding the mechanisms that transformer LMs can use to represent probability distributions over strings.

Labeled Morphological Segmentation with Semi-Markov Models

no code implementations CONLL 2015 Ryan Cotterell, Thomas Müller, Alexander Fraser, Hinrich Schütze

We present labeled morphological segmentation, an alternative view of morphological processing that unifies several tasks.

Segmentation TAG

[Call for Papers] The 2nd BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus

1 code implementation9 Apr 2024 Leshem Choshen, Ryan Cotterell, Michael Y. Hu, Tal Linzen, Aaron Mueller, Candace Ross, Alex Warstadt, Ethan Wilcox, Adina Williams, Chengxu Zhuang

The big changes for this year's competition are as follows: First, we replace the loose track with a paper track, which allows (for example) non-model-based submissions, novel cognitively-inspired benchmarks, or analysis techniques.

Context versus Prior Knowledge in Language Models

no code implementations6 Apr 2024 Kevin Du, Vésteinn Snæbjarnarson, Niklas Stoehr, Jennifer C. White, Aaron Schein, Ryan Cotterell

To answer a question, language models often need to integrate prior knowledge learned during pretraining and new information presented in context.

Towards Explainability in Legal Outcome Prediction Models

1 code implementation25 Mar 2024 Josef Valvoda, Ryan Cotterell

Current legal outcome prediction models - a staple of legal NLP - do not explain their reasoning.

Prediction

The Role of $n$-gram Smoothing in the Age of Neural Networks

no code implementations25 Mar 2024 Luca Malagutti, Andrius Buinovskij, Anej Svete, Clara Meister, Afra Amini, Ryan Cotterell

For nearly three decades, language models derived from the $n$-gram assumption held the state of the art on the task.

Language Modeling Language Modelling +1

On Efficiently Representing Regular Languages as RNNs

no code implementations24 Feb 2024 Anej Svete, Robin Shing Moon Chan, Ryan Cotterell

However, a closer inspection of Hewitt et al.'s (2020) construction shows that it is not inherently limited to hierarchical structures.

Inductive Bias

A Practical Method for Generating String Counterfactuals

1 code implementation17 Feb 2024 Matan Avitan, Ryan Cotterell, Yoav Goldberg, Shauli Ravfogel

Interventions targeting the representation space of language models (LMs) have emerged as an effective means to influence model behavior.

counterfactual Data Augmentation

Direct Preference Optimization with an Offset

2 code implementations16 Feb 2024 Afra Amini, Tim Vieira, Ryan Cotterell

DPO, as originally formulated, relies on binary preference data and fine-tunes a language model to increase the likelihood of a preferred response over a dispreferred response.

Language Modelling

Representation Surgery: Theory and Practice of Affine Steering

1 code implementation15 Feb 2024 Shashwat Singh, Shauli Ravfogel, Jonathan Herzig, Roee Aharoni, Ryan Cotterell, Ponnurangam Kumaraguru

In the case of neural language models, an encoding of the undesirable behavior is often present in the model's representations.

Principled Gradient-based Markov Chain Monte Carlo for Text Generation

no code implementations29 Dec 2023 Li Du, Afra Amini, Lucas Torroba Hennigen, Xinyan Velocity Yu, Jason Eisner, Holden Lee, Ryan Cotterell

Recent papers have demonstrated the possibility of energy-based text generation by adapting gradient-based sampling algorithms, a paradigm of MCMC algorithms that promises fast convergence.

Language Modeling Language Modelling +1

Revisiting the Optimality of Word Lengths

1 code implementation6 Dec 2023 Tiago Pimentel, Clara Meister, Ethan Gotlieb Wilcox, Kyle Mahowald, Ryan Cotterell

Under this method, we find that a language's word lengths should instead be proportional to the surprisal's expectation plus its variance-to-mean ratio.

The Ethics of Automating Legal Actors

no code implementations1 Dec 2023 Josef Valvoda, Alec Thompson, Ryan Cotterell, Simone Teufel

The introduction of large public legal datasets has brought about a renaissance in legal NLP.

Ethics

The Causal Influence of Grammatical Gender on Distributional Semantics

1 code implementation30 Nov 2023 Karolina Stańczak, Kevin Du, Adina Williams, Isabelle Augenstein, Ryan Cotterell

However, when we control for the meaning of the noun, the relationship between grammatical gender and adjective choice is near zero and insignificant.

Quantifying the redundancy between prosody and text

1 code implementation28 Nov 2023 Lukas Wolf, Tiago Pimentel, Evelina Fedorenko, Ryan Cotterell, Alex Warstadt, Ethan Wilcox, Tamar Regev

Using a large spoken corpus of English audiobooks, we extract prosodic features aligned to individual words and test how well they can be predicted from LLM embeddings, compared to non-contextual word embeddings.

Word Embeddings

An Exploration of Left-Corner Transformations

no code implementations27 Nov 2023 Andreas Opedal, Eleftheria Tsipidi, Tiago Pimentel, Ryan Cotterell, Tim Vieira

The left-corner transformation (Rosenkrantz and Lewis, 1970) is used to remove left recursion from context-free grammars, which is an important step towards making the grammar parsable top-down with simple techniques.

Formal Aspects of Language Modeling

no code implementations7 Nov 2023 Ryan Cotterell, Anej Svete, Clara Meister, Tianyu Liu, Li Du

Large language models have become one of the most commonly deployed NLP inventions.

Language Modeling Language Modelling

Efficient Algorithms for Recognizing Weighted Tree-Adjoining Languages

no code implementations23 Oct 2023 Alexandra Butoi, Tim Vieira, Ryan Cotterell, David Chiang

From these, we also immediately obtain stringsum and allsum algorithms for TAG, LIG, PAA, and EPDA.

TAG

On the Representational Capacity of Recurrent Neural Language Models

1 code implementation19 Oct 2023 Franz Nowak, Anej Svete, Li Du, Ryan Cotterell

We extend the Turing completeness result to the probabilistic case, showing how a rationally weighted RLM with unbounded computation time can simulate any deterministic probabilistic Turing machine (PTM) with rationally weighted transitions.

Recurrent Neural Language Models as Probabilistic Finite-state Automata

1 code implementation8 Oct 2023 Anej Svete, Ryan Cotterell

These results present a first step towards characterizing the classes of distributions RNN LMs can represent and thus help us understand their capabilities and limitations.

An Analysis of On-the-fly Determinization of Finite-state Automata

no code implementations27 Aug 2023 Ivan Baburin, Ryan Cotterell

In this paper we establish an abstraction of on-the-fly determinization of finite-state automata using transition monoids and demonstrate how it can be applied to bound the asymptotics.

A Geometric Notion of Causal Probing

1 code implementation27 Jul 2023 Clément Guerner, Tianyu Liu, Anej Svete, Alexander Warstadt, Ryan Cotterell

The linear subspace hypothesis (Bolukbasi et al., 2016) states that, in a language model's representation space, all information about a concept such as verbal number is encoded in a linear subspace.

counterfactual Language Modeling +1

Testing the Predictions of Surprisal Theory in 11 Languages

no code implementations7 Jul 2023 Ethan Gotlieb Wilcox, Tiago Pimentel, Clara Meister, Ryan Cotterell, Roger P. Levy

We address this gap in the current literature by investigating the relationship between surprisal and reading times in eleven different languages, distributed across five language families.

On the Efficacy of Sampling Adapters

1 code implementation7 Jul 2023 Clara Meister, Tiago Pimentel, Luca Malagutti, Ethan G. Wilcox, Ryan Cotterell

While this trade-off is not reflected in standard metrics of distribution quality (such as perplexity), we find that several precision-emphasizing measures indeed indicate that sampling adapters can lead to probability distributions more aligned with the true distribution.

Text Generation

Efficient Semiring-Weighted Earley Parsing

1 code implementation6 Jul 2023 Andreas Opedal, Ran Zmigrod, Tim Vieira, Ryan Cotterell, Jason Eisner

This paper provides a reference description, in the form of a deduction system, of Earley's (1970) context-free parsing algorithm with various speed-ups.

Sentence

Generalizing Backpropagation for Gradient-Based Interpretability

1 code implementation6 Jul 2023 Kevin Du, Lucas Torroba Hennigen, Niklas Stoehr, Alexander Warstadt, Ryan Cotterell

Many popular feature-attribution methods for interpreting deep neural networks rely on computing the gradients of a model's output with respect to its inputs.

A Formal Perspective on Byte-Pair Encoding

1 code implementation29 Jun 2023 Vilém Zouhar, Clara Meister, Juan Luis Gastaldi, Li Du, Tim Vieira, Mrinmaya Sachan, Ryan Cotterell

Via submodular functions, we prove that the iterative greedy version is a $\frac{1}{{\sigma(\boldsymbol{\mu}^\star)}}(1-e^{-{\sigma(\boldsymbol{\mu}^\star)}})$-approximation of an optimal merge sequence, where ${\sigma(\boldsymbol{\mu}^\star)}$ is the total backward curvature with respect to the optimal merge sequence $\boldsymbol{\mu}^\star$.

Combinatorial Optimization

Hexatagging: Projective Dependency Parsing as Tagging

1 code implementation8 Jun 2023 Afra Amini, Tianyu Liu, Ryan Cotterell

We introduce a novel dependency parser, the hexatagger, that constructs dependency trees by tagging the words in a sentence with elements from a finite set of possible tags.

Computational Efficiency Dependency Parsing +3

Convergence and Diversity in the Control Hierarchy

no code implementations6 Jun 2023 Alexandra Butoi, Ryan Cotterell, David Chiang

Furthermore, using an even stricter notion of equivalence called d-strong equivalence, we make precise the intuition that a CFG controlling a CFG is a TAG, a PDA controlling a PDA is an embedded PDA, and a PDA controlling a CFG is a LIG.

Diversity TAG

Structured Voronoi Sampling

1 code implementation NeurIPS 2023 Afra Amini, Li Du, Ryan Cotterell

In this paper, we take an important step toward building a principled approach for sampling from language models with gradient-based methods.

Text Generation

Linear-Time Modeling of Linguistic Structure: An Order-Theoretic Perspective

no code implementations24 May 2023 Tianyu Liu, Afra Amini, Mrinmaya Sachan, Ryan Cotterell

We show that these exhaustive comparisons can be avoided, and, moreover, the complexity of such tasks can be reduced to linear by casting the relation between tokens as a partial order over the string.

coreference-resolution Dependency Parsing +1

All Roads Lead to Rome? Exploring the Invariance of Transformers' Representations

1 code implementation23 May 2023 Yuxin Ren, Qipeng Guo, Zhijing Jin, Shauli Ravfogel, Mrinmaya Sachan, Bernhard Schölkopf, Ryan Cotterell

Transformer models bring propelling advances in various NLP tasks, thus inducing lots of interpretability research on the learned representations of the models.

All

RecurrentGPT: Interactive Generation of (Arbitrarily) Long Text

2 code implementations22 May 2023 Wangchunshu Zhou, Yuchen Eleanor Jiang, Peng Cui, Tiannan Wang, Zhenxin Xiao, Yifan Hou, Ryan Cotterell, Mrinmaya Sachan

In addition to producing AI-generated content (AIGC), we also demonstrate the possibility of using RecurrentGPT as an interactive fiction that directly interacts with consumers.

Language Modelling Large Language Model

Efficient Prompting via Dynamic In-Context Learning

no code implementations18 May 2023 Wangchunshu Zhou, Yuchen Eleanor Jiang, Ryan Cotterell, Mrinmaya Sachan

To achieve this, we train a meta controller that predicts the number of in-context examples suitable for the generalist model to make a good prediction based on the performance-efficiency trade-off for a specific input.

In-Context Learning

Controlled Text Generation with Natural Language Instructions

no code implementations27 Apr 2023 Wangchunshu Zhou, Yuchen Eleanor Jiang, Ethan Wilcox, Ryan Cotterell, Mrinmaya Sachan

Large language models generate fluent texts and can follow natural language instructions to solve a wide range of tasks without task-specific training.

In-Context Learning Language Modelling +1

Discriminative Class Tokens for Text-to-Image Diffusion Models

1 code implementation ICCV 2023 Idan Schwartz, Vésteinn Snæbjarnarson, Hila Chefer, Ryan Cotterell, Serge Belongie, Lior Wolf, Sagie Benaim

This approach has two disadvantages: (i) supervised datasets are generally small compared to large-scale scraped text-image datasets on which text-to-image models are trained, affecting the quality and diversity of the generated images, or (ii) the input is a hard-coded label, as opposed to free-form text, limiting the control over the generated images.

Algorithms for Acyclic Weighted Finite-State Automata with Failure Arcs

1 code implementation17 Jan 2023 Anej Svete, Benjamin Dayan, Tim Vieira, Ryan Cotterell, Jason Eisner

The pathsum in ordinary acyclic WFSAs is efficiently computed by the backward algorithm in time $O(|E|)$, where $E$ is the set of transitions.

A Measure-Theoretic Characterization of Tight Language Models

no code implementations20 Dec 2022 Li Du, Lucas Torroba Hennigen, Tiago Pimentel, Clara Meister, Jason Eisner, Ryan Cotterell

Language modeling, a central task in natural language processing, involves estimating a probability distribution over strings.

Language Modeling Language Modelling

The Ordered Matrix Dirichlet for State-Space Models

1 code implementation8 Dec 2022 Niklas Stoehr, Benjamin J. Radford, Ryan Cotterell, Aaron Schein

For discrete data, SSMs commonly do so through a state-to-action emission matrix and a state-to-state transition matrix.

State Space Models

On the Effect of Anticipation on Reading Times

1 code implementation25 Nov 2022 Tiago Pimentel, Clara Meister, Ethan G. Wilcox, Roger Levy, Ryan Cotterell

We assess the effect of anticipation on reading by comparing how well surprisal and contextual entropy predict reading times on four naturalistic reading datasets: two self-paced and two eye-tracking.

Schrödinger's Bat: Diffusion Models Sometimes Generate Polysemous Words in Superposition

1 code implementation23 Nov 2022 Jennifer C. White, Ryan Cotterell

Recent work has shown that despite their impressive capabilities, text-to-image diffusion models such as DALL-E 2 (Ramesh et al., 2022) can display strange behaviours when a prompt contains a word with multiple possible meanings, often generating images containing both senses of the word (Rassin et al., 2022).

On Parsing as Tagging

1 code implementation14 Nov 2022 Afra Amini, Ryan Cotterell

There have been many proposals to reduce constituency parsing to tagging in the literature.

Constituency Parsing

The Architectural Bottleneck Principle

no code implementations11 Nov 2022 Tiago Pimentel, Josef Valvoda, Niklas Stoehr, Ryan Cotterell

This shift in perspective leads us to propose a new principle for probing, the architectural bottleneck principle: In order to estimate how much information a given component could extract, a probe should look exactly like the component.

Open-Ended Question Answering

Autoregressive Structured Prediction with Language Models

1 code implementation26 Oct 2022 Tianyu Liu, Yuchen Jiang, Nicholas Monath, Ryan Cotterell, Mrinmaya Sachan

Recent years have seen a paradigm shift in NLP towards using pretrained language models ({PLM}) for a wide range of tasks.

 Ranked #1 on Relation Extraction on CoNLL04 (NER Micro F1 metric)

Named Entity Recognition Named Entity Recognition (NER) +3

A Bilingual Parallel Corpus with Discourse Annotations

1 code implementation26 Oct 2022 Yuchen Eleanor Jiang, Tianyu Liu, Shuming Ma, Dongdong Zhang, Mrinmaya Sachan, Ryan Cotterell

The BWB corpus consists of Chinese novels translated by experts into English, and the annotated test set is designed to probe the ability of machine translation systems to model various discourse phenomena.

Document Level Machine Translation Machine Translation +2

Investigating the Role of Centering Theory in the Context of Neural Coreference Resolution Systems

no code implementations26 Oct 2022 Yuchen Eleanor Jiang, Ryan Cotterell, Mrinmaya Sachan

Our analysis further shows that contextualized embeddings contain much of the coherence information, which helps explain why CT can only provide little gains to modern neural coreference resolvers which make use of pretrained representations.

coreference-resolution World Knowledge

Mutual Information Alleviates Hallucinations in Abstractive Summarization

3 code implementations24 Oct 2022 Liam van der Poel, Ryan Cotterell, Clara Meister

Despite significant progress in the quality of language generated from abstractive summarization models, these models still exhibit the tendency to hallucinate, i. e., output content not supported by the source document.

Abstractive Text Summarization

Log-linear Guardedness and its Implications

no code implementations18 Oct 2022 Shauli Ravfogel, Yoav Goldberg, Ryan Cotterell

Methods for erasing human-interpretable concepts from neural representations that assume linearity have been found to be tractable and useful.

Algorithms for Weighted Pushdown Automata

1 code implementation13 Oct 2022 Alexandra Butoi, Brian DuSell, Tim Vieira, Ryan Cotterell, David Chiang

Weighted pushdown automata (WPDAs) are at the core of many natural language processing tasks, like syntax-based statistical machine translation and transition-based dependency parsing.

Machine Translation Transition-Based Dependency Parsing

An Ordinal Latent Variable Model of Conflict Intensity

1 code implementation8 Oct 2022 Niklas Stoehr, Lucas Torroba Hennigen, Josef Valvoda, Robert West, Ryan Cotterell, Aaron Schein

It is based only on the action category ("what") and disregards the subject ("who") and object ("to whom") of an event, as well as contextual information, like associated casualty count, that should contribute to the perception of an event's "intensity".

Event Extraction

Equivariant Transduction through Invariant Alignment

1 code implementation COLING 2022 Jennifer C. White, Ryan Cotterell

The ability to generalize compositionally is key to understanding the potentially infinite number of sentences that can be constructed in a human language from only a finite number of words.

Inductive Bias

On the Intersection of Context-Free and Regular Languages

1 code implementation14 Sep 2022 Clemente Pasti, Andreas Opedal, Tiago Pimentel, Tim Vieira, Jason Eisner, Ryan Cotterell

It shows, by a simple construction, that the intersection of a context-free language and a regular language is itself context-free.

Visual Comparison of Language Model Adaptation

no code implementations17 Aug 2022 Rita Sevastjanova, Eren Cakmak, Shauli Ravfogel, Ryan Cotterell, Mennatallah El-Assady

The simplicity of adapter training and composition comes along with new challenges, such as maintaining an overview of adapter properties and effectively comparing their produced embedding spaces.

Language Modeling Language Modelling +1

On the Role of Negative Precedent in Legal Outcome Prediction

1 code implementation17 Aug 2022 Josef Valvoda, Ryan Cotterell, Simone Teufel

In contrast, we turn our focus to negative outcomes here, and introduce a new task of negative outcome prediction.

Prediction

Probing via Prompting

1 code implementation NAACL 2022 Jiaoda Li, Ryan Cotterell, Mrinmaya Sachan

We then examine the usefulness of a specific linguistic property for pre-training by removing the heads that are essential to that property and evaluating the resulting model's performance on language modeling.

Diagnostic Language Modeling +1

Naturalistic Causal Probing for Morpho-Syntax

1 code implementation14 May 2022 Afra Amini, Tiago Pimentel, Clara Meister, Ryan Cotterell

Probing has become a go-to methodology for interpreting and analyzing deep neural models in natural language processing.

Sentence

A Structured Span Selector

1 code implementation NAACL 2022 Tianyu Liu, Yuchen Eleanor Jiang, Ryan Cotterell, Mrinmaya Sachan

Many natural language processing tasks, e. g., coreference resolution and semantic role labeling, require selecting text spans and making decisions about them.

coreference-resolution Inductive Bias +1

UniMorph 4.0: Universal Morphology

no code implementations LREC 2022 Khuyagbaatar Batsuren, Omer Goldman, Salam Khalifa, Nizar Habash, Witold Kieraś, Gábor Bella, Brian Leonard, Garrett Nicolai, Kyle Gorman, Yustinus Ghanggo Ate, Maria Ryskina, Sabrina J. Mielke, Elena Budianskaya, Charbel El-Khaissi, Tiago Pimentel, Michael Gasser, William Lane, Mohit Raj, Matt Coler, Jaime Rafael Montoya Samame, Delio Siticonatzi Camaiteri, Benoît Sagot, Esaú Zumaeta Rojas, Didier López Francis, Arturo Oncevay, Juan López Bautista, Gema Celeste Silva Villegas, Lucas Torroba Hennigen, Adam Ek, David Guriel, Peter Dirix, Jean-Philippe Bernardy, Andrey Scherbakov, Aziyana Bayyr-ool, Antonios Anastasopoulos, Roberto Zariquiey, Karina Sheifer, Sofya Ganieva, Hilaria Cruz, Ritván Karahóǧa, Stella Markantonatou, George Pavlidis, Matvey Plugaryov, Elena Klyachko, Ali Salehi, Candy Angulo, Jatayu Baxi, Andrew Krizhanovsky, Natalia Krizhanovskaya, Elizabeth Salesky, Clara Vania, Sardana Ivanova, Jennifer White, Rowan Hall Maudslay, Josef Valvoda, Ran Zmigrod, Paula Czarnowska, Irene Nikkarinen, Aelita Salchak, Brijesh Bhatt, Christopher Straughn, Zoey Liu, Jonathan North Washington, Yuval Pinter, Duygu Ataman, Marcin Wolinski, Totok Suhardijanto, Anna Yablonskaya, Niklas Stoehr, Hossep Dolatian, Zahroh Nuriah, Shyam Ratan, Francis M. Tyers, Edoardo M. Ponti, Grant Aiton, Aryaman Arora, Richard J. Hatcher, Ritesh Kumar, Jeremiah Young, Daria Rodionova, Anastasia Yemelina, Taras Andrushko, Igor Marchenko, Polina Mashkovtseva, Alexandra Serova, Emily Prud'hommeaux, Maria Nepomniashchaya, Fausto Giunchiglia, Eleanor Chodroff, Mans Hulden, Miikka Silfverberg, Arya D. McCarthy, David Yarowsky, Ryan Cotterell, Reut Tsarfaty, Ekaterina Vylomova

The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema.

Morphological Inflection

Same Neurons, Different Languages: Probing Morphosyntax in Multilingual Pre-trained Models

2 code implementations NAACL 2022 Karolina Stańczak, Edoardo Ponti, Lucas Torroba Hennigen, Ryan Cotterell, Isabelle Augenstein

The success of multilingual pre-trained models is underpinned by their ability to learn representations shared by multiple languages even in absence of any explicit supervision.

Exact Paired-Permutation Testing for Structured Test Statistics

1 code implementation NAACL 2022 Ran Zmigrod, Tim Vieira, Ryan Cotterell

However, practitioners rely on Monte Carlo approximation to perform this test due to a lack of a suitable exact algorithm.

Probing for the Usage of Grammatical Number

no code implementations ACL 2022 Karim Lasri, Tiago Pimentel, Alessandro Lenci, Thierry Poibeau, Ryan Cotterell

We also find that BERT uses a separate encoding of grammatical number for nouns and verbs.

Estimating the Entropy of Linguistic Distributions

no code implementations ACL 2022 Aryaman Arora, Clara Meister, Ryan Cotterell

Shannon entropy is often a quantity of interest to linguists studying the communicative capacity of human language.

On the probability-quality paradox in language generation

no code implementations31 Mar 2022 Clara Meister, Gian Wiher, Tiago Pimentel, Ryan Cotterell

Specifically, we posit that human-like language should contain an amount of information (quantified as negative log-probability) that is close to the entropy of the distribution over natural strings.

Text Generation

Analyzing Wrap-Up Effects through an Information-Theoretic Lens

no code implementations ACL 2022 Clara Meister, Tiago Pimentel, Thomas Hikaru Clark, Ryan Cotterell, Roger Levy

Numerous analyses of reading time (RT) data have been implemented -- all in an effort to better understand the cognitive processes driving reading comprehension.

Reading Comprehension Sentence

On Decoding Strategies for Neural Text Generators

no code implementations29 Mar 2022 Gian Wiher, Clara Meister, Ryan Cotterell

For example, the nature of the diversity-quality trade-off in language generation is very task-specific; the length bias often attributed to beam search is not constant across tasks.

Diversity Machine Translation +1

Locally Typical Sampling

3 code implementations1 Feb 2022 Clara Meister, Tiago Pimentel, Gian Wiher, Ryan Cotterell

Automatic and human evaluations show that, in comparison to nucleus and top-k sampling, locally typical sampling offers competitive performance (in both abstractive summarization and story generation) in terms of quality while consistently reducing degenerate repetitions.

Abstractive Text Summarization Story Generation

Linear Adversarial Concept Erasure

2 code implementations28 Jan 2022 Shauli Ravfogel, Michael Twiton, Yoav Goldberg, Ryan Cotterell

Modern neural models trained on textual data rely on pre-trained representations that emerge without direct supervision.

Kernelized Concept Erasure

1 code implementation28 Jan 2022 Shauli Ravfogel, Francisco Vargas, Yoav Goldberg, Ryan Cotterell

One prominent approach for the identification of concepts in neural representations is searching for a linear subspace whose erasure prevents the prediction of the concept from the representations.

A Latent-Variable Model for Intrinsic Probing

2 code implementations20 Jan 2022 Karolina Stańczak, Lucas Torroba Hennigen, Adina Williams, Ryan Cotterell, Isabelle Augenstein

The success of pre-trained contextualized representations has prompted researchers to analyze them for the presence of linguistic information.

Attribute model

Probing as Quantifying Inductive Bias

1 code implementation ACL 2022 Alexander Immer, Lucas Torroba Hennigen, Vincent Fortuin, Ryan Cotterell

Such performance improvements have motivated researchers to quantify and understand the linguistic information encoded in these representations.

Bayesian Inference Inductive Bias

A surprisal--duration trade-off across and within the world's languages

1 code implementation30 Sep 2021 Tiago Pimentel, Clara Meister, Elizabeth Salesky, Simone Teufel, Damián Blasi, Ryan Cotterell

We thus conclude that there is strong evidence of a surprisal--duration trade-off in operation, both across and within the world's languages.

On Homophony and Rényi Entropy

1 code implementation EMNLP 2021 Tiago Pimentel, Clara Meister, Simone Teufel, Ryan Cotterell

Homophony's widespread presence in natural languages is a controversial topic.

Revisiting the Uniform Information Density Hypothesis

no code implementations EMNLP 2021 Clara Meister, Tiago Pimentel, Patrick Haller, Lena Jäger, Ryan Cotterell, Roger Levy

The uniform information density (UID) hypothesis posits a preference among language users for utterances structured such that information is distributed uniformly across a signal.

Linguistic Acceptability Sentence

Conditional Poisson Stochastic Beam Search

1 code implementation22 Sep 2021 Clara Meister, Afra Amini, Tim Vieira, Ryan Cotterell

In this work, we propose a new method for turning beam search into a stochastic process: Conditional Poisson stochastic beam search.

Efficient Sampling of Dependency Structures

no code implementations14 Sep 2021 Ran Zmigrod, Tim Vieira, Ryan Cotterell

Colbourn (1996)'s sampling algorithm has a running time of $\mathcal{O}(N^3)$, which is often greater than the mean hitting time of a directed graph.

Searching for More Efficient Dynamic Programs

no code implementations Findings (EMNLP) 2021 Tim Vieira, Ryan Cotterell, Jason Eisner

To this end, we describe a set of program transformations, a simple metric for assessing the efficiency of a transformed program, and a heuristic search procedure to improve this metric.

A Bayesian Framework for Information-Theoretic Probing

1 code implementation EMNLP 2021 Tiago Pimentel, Ryan Cotterell

Pimentel et al. (2020) recently analysed probing from an information-theoretic perspective.

Differentiable Subset Pruning of Transformer Heads

2 code implementations10 Aug 2021 Jiaoda Li, Ryan Cotterell, Mrinmaya Sachan

Multi-head attention, a collection of several attention mechanisms that independently attend to different parts of the input, is the key ingredient in the Transformer.

Machine Translation Natural Language Inference +1

Towards Zero-shot Language Modeling

no code implementations IJCNLP 2019 Edoardo Maria Ponti, Ivan Vulić, Ryan Cotterell, Roi Reichart, Anna Korhonen

Motivated by this question, we aim at constructing an informative prior over neural weights, in order to adapt quickly to held-out languages in the task of character-level language modeling.

Language Modeling Language Modelling

On Finding the K-best Non-projective Dependency Trees

1 code implementation ACL 2021 Ran Zmigrod, Tim Vieira, Ryan Cotterell

Furthermore, we present a novel extension of the algorithm for decoding the K-best dependency trees of a graph which are subject to a root constraint.

Dependency Parsing Sentence

Modeling the Unigram Distribution

1 code implementation Findings (ACL) 2021 Irene Nikkarinen, Tiago Pimentel, Damián E. Blasi, Ryan Cotterell

The unigram distribution is the non-contextual probability of finding a specific word form in a corpus.

Form

Do Syntactic Probes Probe Syntax? Experiments with Jabberwocky Probing

no code implementations NAACL 2021 Rowan Hall Maudslay, Ryan Cotterell

One method of doing so, which is frequently cited to support the claim that models like BERT encode syntax, is called probing; probes are small supervised models trained to extract linguistic information from another model's output.

Is Sparse Attention more Interpretable?

no code implementations ACL 2021 Clara Meister, Stefan Lazov, Isabelle Augenstein, Ryan Cotterell

Sparse attention has been claimed to increase model interpretability under the assumption that it highlights influential inputs.

text-classification Text Classification

Examining the Inductive Bias of Neural Language Models with Artificial Languages

1 code implementation ACL 2021 Jennifer C. White, Ryan Cotterell

Since language models are used to model a wide variety of languages, it is natural to ask whether the neural architectures used for the task have inductive biases towards modeling particular types of languages.

Inductive Bias

On Finding the $K$-best Non-projective Dependency Trees

1 code implementation1 Jun 2021 Ran Zmigrod, Tim Vieira, Ryan Cotterell

Furthermore, we present a novel extension of the algorithm for decoding the $K$-best dependency trees of a graph which are subject to a root constraint.

Dependency Parsing Sentence

Higher-order Derivatives of Weighted Finite-state Machines

1 code implementation ACL 2021 Ran Zmigrod, Tim Vieira, Ryan Cotterell

In the case of second-order derivatives, our scheme runs in the optimal $\mathcal{O}(A^2 N^4)$ time where $A$ is the alphabet size and $N$ is the number of states.

Language Model Evaluation Beyond Perplexity

no code implementations ACL 2021 Clara Meister, Ryan Cotterell

As concrete examples, text generated under the nucleus sampling scheme adheres more closely to the type--token relationship of natural language than text produced using standard ancestral sampling; text from LSTMs reflects the natural language distributions over length, stopwords, and symbols surprisingly well.

Language Modeling Language Modelling +1

A Non-Linear Structural Probe

no code implementations NAACL 2021 Jennifer C. White, Tiago Pimentel, Naomi Saphra, Ryan Cotterell

Probes are models devised to investigate the encoding of knowledge -- e. g. syntactic structure -- in contextual representations.

A Cognitive Regularizer for Language Modeling

no code implementations ACL 2021 Jason Wei, Clara Meister, Ryan Cotterell

The uniform information density (UID) hypothesis, which posits that speakers behaving optimally tend to distribute information uniformly across a linguistic signal, has gained traction in psycholinguistics as an explanation for certain syntactic, morphological, and prosodic choices.

Inductive Bias Language Modeling +1

How (Non-)Optimal is the Lexicon?

no code implementations NAACL 2021 Tiago Pimentel, Irene Nikkarinen, Kyle Mahowald, Ryan Cotterell, Damián Blasi

Examining corpora from 7 typologically diverse languages, we use those upper bounds to quantify the lexicon's optimality and to explore the relative costs of major constraints on natural codes.

Finding Concept-specific Biases in Form--Meaning Associations

2 code implementations NAACL 2021 Tiago Pimentel, Brian Roark, Søren Wichmann, Ryan Cotterell, Damián Blasi

It is not a new idea that there are small, cross-linguistic associations between the forms and meanings of words.

Form

Differentiable Generative Phonology

1 code implementation10 Feb 2021 Shijie Wu, Edoardo Maria Ponti, Ryan Cotterell

As the main contribution of our work, we implement the phonological generative system as a neural model differentiable end-to-end, rather than as a set of rules or constraints.

Disambiguatory Signals are Stronger in Word-initial Positions

1 code implementation EACL 2021 Tiago Pimentel, Ryan Cotterell, Brian Roark

Psycholinguistic studies of human word processing and lexical access provide ample evidence of the preferred nature of word-initial versus word-final segments, e. g., in terms of attention paid by listeners (greater) or the likelihood of reduction by speakers (lower).

Informativeness

Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework of Vision-and-Language BERTs

3 code implementations30 Nov 2020 Emanuele Bugliarello, Ryan Cotterell, Naoaki Okazaki, Desmond Elliott

Large-scale pretraining and task-specific fine-tuning is now the standard methodology for many tasks in computer vision and natural language processing.

Morphologically Aware Word-Level Translation

no code implementations COLING 2020 Paula Czarnowska, Sebastian Ruder, Ryan Cotterell, Ann Copestake

We propose a novel morphologically aware probability model for bilingual lexicon induction, which jointly models lexeme translation and inflectional morphology in a structured way.

Bilingual Lexicon Induction Translation

Investigating Cross-Linguistic Adjective Ordering Tendencies with a Latent-Variable Model

no code implementations EMNLP 2020 Jun Yen Leung, Guy Emerson, Ryan Cotterell

Across languages, multiple consecutive adjectives modifying a noun (e. g. "the big red dog") follow certain unmarked ordering rules.

If beam search is the answer, what was the question?

1 code implementation EMNLP 2020 Clara Meister, Tim Vieira, Ryan Cotterell

This implies that the MAP objective alone does not express the properties we desire in text, which merits the question: if beam search is the answer, what was the question?

Machine Translation Text Generation +1

Intrinsic Probing through Dimension Selection

1 code implementation EMNLP 2020 Lucas Torroba Hennigen, Adina Williams, Ryan Cotterell

Most modern NLP systems make use of pre-trained contextual representations that attain astonishingly high performance on a variety of tasks.

Word Embeddings

Please Mind the Root: Decoding Arborescences for Dependency Parsing

1 code implementation EMNLP 2020 Ran Zmigrod, Tim Vieira, Ryan Cotterell

The connection between dependency trees and spanning trees is exploited by the NLP community to train and to decode graph-based dependency parsers.

Dependency Parsing

Pareto Probing: Trading Off Accuracy for Complexity

1 code implementation EMNLP 2020 Tiago Pimentel, Naomi Saphra, Adina Williams, Ryan Cotterell

In our contribution to this discussion, we argue for a probe metric that reflects the fundamental trade-off between probe complexity and performance: the Pareto hypervolume.

ARC Dependency Parsing

Speakers Fill Lexical Semantic Gaps with Context

1 code implementation EMNLP 2020 Tiago Pimentel, Rowan Hall Maudslay, Damián Blasi, Ryan Cotterell

For a language to be clear and efficiently encoded, we posit that the lexical ambiguity of a word type should correlate with how much information context provides about it, on average.

Exploring the Linear Subspace Hypothesis in Gender Bias Mitigation

1 code implementation EMNLP 2020 Francisco Vargas, Ryan Cotterell

Their method takes pre-trained word representations as input and attempts to isolate a linear subspace that captures most of the gender bias in the representations.

Word Embeddings

Efficient Computation of Expectations under Spanning Tree Distributions

no code implementations29 Aug 2020 Ran Zmigrod, Tim Vieira, Ryan Cotterell

We propose unified algorithms for the important cases of first-order expectations and second-order expectations in edge-factored, non-projective spanning-tree models.

Sentence

Best-First Beam Search

1 code implementation8 Jul 2020 Clara Meister, Tim Vieira, Ryan Cotterell

Decoding for many NLP tasks requires an effective heuristic algorithm for approximating exact search since the problem of searching the full output space is often intractable, or impractical in many settings.

Metaphor Detection using Context and Concreteness

no code implementations WS 2020 Rowan Hall Maudslay, Tiago Pimentel, Ryan Cotterell, Simone Teufel

We report the results of our system on the Metaphor Detection Shared Task at the Second Workshop on Figurative Language Processing 2020.

A Corpus for Large-Scale Phonetic Typology

no code implementations ACL 2020 Elizabeth Salesky, Eleanor Chodroff, Tiago Pimentel, Matthew Wiesner, Ryan Cotterell, Alan W. black, Jason Eisner

A major hurdle in data-driven research on typology is having sufficient data in many languages to draw meaningful conclusions.

Applying the Transformer to Character-level Transduction

3 code implementations EACL 2021 Shijie Wu, Ryan Cotterell, Mans Hulden

The transformer has been shown to outperform recurrent neural network-based sequence-to-sequence models in various word-level NLP tasks.

Morphological Inflection Transliteration

Phonotactic Complexity and its Trade-offs

1 code implementation TACL 2020 Tiago Pimentel, Brian Roark, Ryan Cotterell

We present methods for calculating a measure of phonotactic complexity---bits per phoneme---that permits a straightforward cross-linguistic comparison.

The Paradigm Discovery Problem

1 code implementation ACL 2020 Alexander Erdmann, Micha Elsner, Shijie Wu, Ryan Cotterell, Nizar Habash

Our benchmark system first makes use of word embeddings and string similarity to cluster forms by cell and by paradigm.

Clustering Word Embeddings

A Tale of a Probe and a Parser

1 code implementation ACL 2020 Rowan Hall Maudslay, Josef Valvoda, Tiago Pimentel, Adina Williams, Ryan Cotterell

One such probe is the structural probe (Hewitt and Manning, 2019), designed to quantify the extent to which syntactic information is encoded in contextualised word representations.

Contextualised Word Representations

On the Relationships Between the Grammatical Genders of Inanimate Nouns and Their Co-Occurring Adjectives and Verbs

no code implementations3 May 2020 Adina Williams, Ryan Cotterell, Lawrence Wolf-Sonkin, Damián Blasi, Hanna Wallach

We also find that there are statistically significant relationships between the grammatical genders of inanimate nouns and the verbs that take those nouns as direct objects, as indirect objects, and as subjects.

Generalized Entropy Regularization or: There's Nothing Special about Label Smoothing

no code implementations ACL 2020 Clara Meister, Elizabeth Salesky, Ryan Cotterell

Prior work has explored directly regularizing the output distributions of probabilistic models to alleviate peaky (i. e. over-confident) predictions, a common sign of overfitting.

Text Generation

Predicting Declension Class from Form and Meaning

1 code implementation ACL 2020 Adina Williams, Tiago Pimentel, Arya D. McCarthy, Hagen Blix, Eleanor Chodroff, Ryan Cotterell

We find for two Indo-European languages (Czech and German) that form and meaning respectively share significant amounts of information with class (and contribute additional information above and beyond gender).

Form

Information-Theoretic Probing for Linguistic Structure

1 code implementation ACL 2020 Tiago Pimentel, Josef Valvoda, Rowan Hall Maudslay, Ran Zmigrod, Adina Williams, Ryan Cotterell

The success of neural networks on a diverse set of NLP tasks has led researchers to question how much these networks actually ``know'' about natural language.

Word Embeddings

Morphological Segmentation Inside-Out

no code implementations EMNLP 2016 Ryan Cotterell, Arun Kumar, Hinrich Schütze

Morphological segmentation has traditionally been modeled with non-hierarchical models, which yield flat segmentations as output.

Morphological Analysis Segmentation

Quantifying the Semantic Core of Gender Systems

no code implementations IJCNLP 2019 Adina Williams, Ryan Cotterell, Lawrence Wolf-Sonkin, Damián Blasi, Hanna Wallach

To that end, we use canonical correlation analysis to correlate the grammatical gender of inanimate nouns with an externally grounded definition of their lexical semantics.

The SIGMORPHON 2019 Shared Task: Morphological Analysis in Context and Cross-Lingual Transfer for Inflection

no code implementations WS 2019 Arya D. McCarthy, Ekaterina Vylomova, Shijie Wu, Chaitanya Malaviya, Lawrence Wolf-Sonkin, Garrett Nicolai, Christo Kirov, Miikka Silfverberg, Sabrina J. Mielke, Jeffrey Heinz, Ryan Cotterell, Mans Hulden

The SIGMORPHON 2019 shared task on cross-lingual transfer and contextual analysis in morphology examined transfer learning of inflection between 100 language pairs, as well as contextual lemmatization and morphosyntactic description in 66 languages.

Cross-Lingual Transfer Lemmatization +3

It's All in the Name: Mitigating Gender Bias with Name-Based Counterfactual Data Substitution

no code implementations IJCNLP 2019 Rowan Hall Maudslay, Hila Gonen, Ryan Cotterell, Simone Teufel

An alternative approach is Counterfactual Data Augmentation (CDA), in which a corpus is duplicated and augmented to remove bias, e. g. by swapping all inherently-gendered words in the copy.

All counterfactual +2

Rethinking Phonotactic Complexity

no code implementations WS 2019 Tiago Pimentel, Brian Roark, Ryan Cotterell

In this work, we propose the use of phone-level language models to estimate phonotactic complexity{---}measured in bits per phoneme{---}which makes cross-linguistic comparison straightforward.

On the Distribution of Deep Clausal Embeddings: A Large Cross-linguistic Study

no code implementations ACL 2019 Damian Blasi, Ryan Cotterell, Lawrence Wolf-Sonkin, Sabine Stoll, Balthasar Bickel, Marco Baroni

Embedding a clause inside another ({``}the girl [who likes cars [that run fast]] has arrived{''}) is a fundamental resource that has been argued to be a key driver of linguistic expressiveness.

Uncovering Probabilistic Implications in Typological Knowledge Bases

no code implementations ACL 2019 Johannes Bjerva, Yova Kementchedjhieva, Ryan Cotterell, Isabelle Augenstein

The study of linguistic typology is rooted in the implications we find between linguistic features, such as the fact that languages with object-verb word ordering tend to have post-positions.

Knowledge Base Population

Meaning to Form: Measuring Systematicity as Information

1 code implementation ACL 2019 Tiago Pimentel, Arya D. McCarthy, Damián E. Blasi, Brian Roark, Ryan Cotterell

A longstanding debate in semiotics centers on the relationship between linguistic signs and their corresponding semantics: is there an arbitrary relationship between a word form and its meaning, or does some systematic phenomenon pervade?

Form

What Kind of Language Is Hard to Language-Model?

no code implementations ACL 2019 Sabrina J. Mielke, Ryan Cotterell, Kyle Gorman, Brian Roark, Jason Eisner

Trying to answer the question of what features difficult languages have in common, we try and fail to reproduce our earlier (Cotterell et al., 2018) observation about morphological complexity and instead reveal far simpler statistics of the data that seem to drive complexity in a much larger sample.

Language Modeling Language Modelling +1

Gender Bias in Contextualized Word Embeddings

2 code implementations NAACL 2019 Jieyu Zhao, Tianlu Wang, Mark Yatskar, Ryan Cotterell, Vicente Ordonez, Kai-Wei Chang

In this paper, we quantify, analyze and mitigate gender bias exhibited in ELMo's contextualized word vectors.

Word Embeddings

A Probabilistic Generative Model of Linguistic Typology

1 code implementation NAACL 2019 Johannes Bjerva, Yova Kementchedjhieva, Ryan Cotterell, Isabelle Augenstein

In the principles-and-parameters framework, the structural features of languages depend on parameters that may be toggled on or off, with a single parameter often dictating the status of multiple features.

model

Cannot find the paper you are looking for? You can Submit a new open access paper.