Search Results for author: Tim Vieira

Found 33 papers, 20 papers with code

Conditional Poisson Stochastic Beams

no code implementations EMNLP 2021 Clara Meister, Afra Amini, Tim Vieira, Ryan Cotterell

Beam search is the default decoding strategy for many sequence generation tasks in NLP.

Efficient Sampling of Dependency Structure

1 code implementation EMNLP 2021 Ran Zmigrod, Tim Vieira, Ryan Cotterell

In this paper, we adapt two spanning tree sampling algorithms to faithfully sample dependency trees from a graph subject to the root constraint.

From Language Models over Tokens to Language Models over Characters

no code implementations4 Dec 2024 Tim Vieira, Ben LeBrun, Mario Giulianelli, Juan Luis Gastaldi, Brian DuSell, John Terilla, Timothy J. O'Donnell, Ryan Cotterell

Modern language models are internally -- and mathematically -- distributions over token strings rather than \emph{character} strings, posing numerous challenges for programmers building user applications on top of them.

Language Modelling

On the Proper Treatment of Tokenization in Psycholinguistics

1 code implementation3 Oct 2024 Mario Giulianelli, Luca Malagutti, Juan Luis Gastaldi, Brian DuSell, Tim Vieira, Ryan Cotterell

The paper argues that token-level language models should be (approximately) marginalized into character-level language models before they are used in psycholinguistic studies to compute the surprisal of a region of interest; then, the marginalized character-level language model can be used to compute the surprisal of an arbitrary character substring, which we term a focal area, that the experimenter may wish to use as a predictor.

Language Modelling

The Foundations of Tokenization: Statistical and Computational Concerns

no code implementations16 Jul 2024 Juan Luis Gastaldi, John Terilla, Luca Malagutti, Brian DuSell, Tim Vieira, Ryan Cotterell

The present paper contributes to addressing this theoretical gap by proposing a unified formal framework for representing and analyzing tokenizer models.

Language Modelling

Variational Best-of-N Alignment

no code implementations8 Jul 2024 Afra Amini, Tim Vieira, Ryan Cotterell

To the extent this fine-tuning is successful and we end up with a good approximation, we have reduced the inference cost by a factor of N. Our experiments on a controlled generation task suggest that while variational BoN is not as effective as BoN in aligning language models, it is close to BoN performance as vBoN appears more often on the Pareto frontier of reward and KL divergence compared to models trained with KL-constrained RL objective.

Language Modelling Variational Inference

Direct Preference Optimization with an Offset

1 code implementation16 Feb 2024 Afra Amini, Tim Vieira, Ryan Cotterell

DPO, as originally formulated, relies on binary preference data and fine-tunes a language model to increase the likelihood of a preferred response over a dispreferred response.

Language Modelling

An Exploration of Left-Corner Transformations

no code implementations27 Nov 2023 Andreas Opedal, Eleftheria Tsipidi, Tiago Pimentel, Ryan Cotterell, Tim Vieira

The left-corner transformation (Rosenkrantz and Lewis, 1970) is used to remove left recursion from context-free grammars, which is an important step towards making the grammar parsable top-down with simple techniques.

Efficient Algorithms for Recognizing Weighted Tree-Adjoining Languages

no code implementations23 Oct 2023 Alexandra Butoi, Tim Vieira, Ryan Cotterell, David Chiang

From these, we also immediately obtain stringsum and allsum algorithms for TAG, LIG, PAA, and EPDA.

TAG

Efficient Semiring-Weighted Earley Parsing

1 code implementation6 Jul 2023 Andreas Opedal, Ran Zmigrod, Tim Vieira, Ryan Cotterell, Jason Eisner

This paper provides a reference description, in the form of a deduction system, of Earley's (1970) context-free parsing algorithm with various speed-ups.

Sentence

A Formal Perspective on Byte-Pair Encoding

1 code implementation29 Jun 2023 Vilém Zouhar, Clara Meister, Juan Luis Gastaldi, Li Du, Tim Vieira, Mrinmaya Sachan, Ryan Cotterell

Via submodular functions, we prove that the iterative greedy version is a $\frac{1}{{\sigma(\boldsymbol{\mu}^\star)}}(1-e^{-{\sigma(\boldsymbol{\mu}^\star)}})$-approximation of an optimal merge sequence, where ${\sigma(\boldsymbol{\mu}^\star)}$ is the total backward curvature with respect to the optimal merge sequence $\boldsymbol{\mu}^\star$.

Combinatorial Optimization

Algorithms for Acyclic Weighted Finite-State Automata with Failure Arcs

1 code implementation17 Jan 2023 Anej Svete, Benjamin Dayan, Tim Vieira, Ryan Cotterell, Jason Eisner

The pathsum in ordinary acyclic WFSAs is efficiently computed by the backward algorithm in time $O(|E|)$, where $E$ is the set of transitions.

Algorithms for Weighted Pushdown Automata

1 code implementation13 Oct 2022 Alexandra Butoi, Brian DuSell, Tim Vieira, Ryan Cotterell, David Chiang

Weighted pushdown automata (WPDAs) are at the core of many natural language processing tasks, like syntax-based statistical machine translation and transition-based dependency parsing.

Machine Translation Transition-Based Dependency Parsing

On the Intersection of Context-Free and Regular Languages

1 code implementation14 Sep 2022 Clemente Pasti, Andreas Opedal, Tiago Pimentel, Tim Vieira, Jason Eisner, Ryan Cotterell

It shows, by a simple construction, that the intersection of a context-free language and a regular language is itself context-free.

Exact Paired-Permutation Testing for Structured Test Statistics

1 code implementation NAACL 2022 Ran Zmigrod, Tim Vieira, Ryan Cotterell

However, practitioners rely on Monte Carlo approximation to perform this test due to a lack of a suitable exact algorithm.

Conditional Poisson Stochastic Beam Search

1 code implementation22 Sep 2021 Clara Meister, Afra Amini, Tim Vieira, Ryan Cotterell

In this work, we propose a new method for turning beam search into a stochastic process: Conditional Poisson stochastic beam search.

Searching for More Efficient Dynamic Programs

no code implementations Findings (EMNLP) 2021 Tim Vieira, Ryan Cotterell, Jason Eisner

To this end, we describe a set of program transformations, a simple metric for assessing the efficiency of a transformed program, and a heuristic search procedure to improve this metric.

Efficient Sampling of Dependency Structures

no code implementations14 Sep 2021 Ran Zmigrod, Tim Vieira, Ryan Cotterell

Colbourn (1996)'s sampling algorithm has a running time of $\mathcal{O}(N^3)$, which is often greater than the mean hitting time of a directed graph.

On Finding the K-best Non-projective Dependency Trees

1 code implementation ACL 2021 Ran Zmigrod, Tim Vieira, Ryan Cotterell

Furthermore, we present a novel extension of the algorithm for decoding the K-best dependency trees of a graph which are subject to a root constraint.

Dependency Parsing Sentence

Higher-order Derivatives of Weighted Finite-state Machines

1 code implementation ACL 2021 Ran Zmigrod, Tim Vieira, Ryan Cotterell

In the case of second-order derivatives, our scheme runs in the optimal $\mathcal{O}(A^2 N^4)$ time where $A$ is the alphabet size and $N$ is the number of states.

On Finding the $K$-best Non-projective Dependency Trees

1 code implementation1 Jun 2021 Ran Zmigrod, Tim Vieira, Ryan Cotterell

Furthermore, we present a novel extension of the algorithm for decoding the $K$-best dependency trees of a graph which are subject to a root constraint.

Dependency Parsing Sentence

Evaluation of Logic Programs with Built-Ins and Aggregation: A Calculus for Bag Relations

1 code implementation20 Oct 2020 Matthew Francis-Landau, Tim Vieira, Jason Eisner

We present a scheme for translating logic programs, which may use aggregation and arithmetic, into algebraic expressions that denote bag relations over ground terms of the Herbrand universe.

Programming Languages Symbolic Computation

Please Mind the Root: Decoding Arborescences for Dependency Parsing

1 code implementation EMNLP 2020 Ran Zmigrod, Tim Vieira, Ryan Cotterell

The connection between dependency trees and spanning trees is exploited by the NLP community to train and to decode graph-based dependency parsers.

Dependency Parsing

If beam search is the answer, what was the question?

1 code implementation EMNLP 2020 Clara Meister, Tim Vieira, Ryan Cotterell

This implies that the MAP objective alone does not express the properties we desire in text, which merits the question: if beam search is the answer, what was the question?

Machine Translation Text Generation +1

Efficient Computation of Expectations under Spanning Tree Distributions

no code implementations29 Aug 2020 Ran Zmigrod, Tim Vieira, Ryan Cotterell

We propose unified algorithms for the important cases of first-order expectations and second-order expectations in edge-factored, non-projective spanning-tree models.

Sentence

Best-First Beam Search

1 code implementation8 Jul 2020 Clara Meister, Tim Vieira, Ryan Cotterell

Decoding for many NLP tasks requires an effective heuristic algorithm for approximating exact search since the problem of searching the full output space is often intractable, or impractical in many settings.

Reasoning about Quantities in Natural Language

no code implementations TACL 2015 Subhro Roy, Tim Vieira, Dan Roth

In order to address these quantitative reasoning problems we first develop a computational approach which we show to successfully recognize and normalize textual expressions of quantities.

Math Natural Language Inference +1

Cannot find the paper you are looking for? You can Submit a new open access paper.