Search Results for author: Tiago Pimentel

Found 50 papers, 25 papers with code

Fast Node Embeddings: Learning Ego-Centric Representations

no code implementations ICLR 2018 Tiago Pimentel, Adriano Veloso, Nivio Ziviani

Representation learning is one of the foundations of Deep Learning and allowed important improvements on several Machine Learning tasks, such as Neural Machine Translation, Question Answering and Speech Recognition.

Link Prediction Machine Translation +6

Deep Active Learning for Anomaly Detection

no code implementations23 May 2018 Tiago Pimentel, Marianne Monteiro, Adriano Veloso, Nivio Ziviani

Anomalies are intuitively easy for human experts to understand, but they are hard to define mathematically.

Active Learning Unsupervised Anomaly Detection

UaiNets: From Unsupervised to Active Deep Anomaly Detection

no code implementations ICLR 2019 Tiago Pimentel, Marianne Monteiro, Juliano Viana, Adriano Veloso, Nivio Ziviani

This work presents a method for active anomaly detection which can be built upon existing deep learning solutions for unsupervised anomaly detection.

Unsupervised Anomaly Detection

Meaning to Form: Measuring Systematicity as Information

1 code implementation ACL 2019 Tiago Pimentel, Arya D. McCarthy, Damián E. Blasi, Brian Roark, Ryan Cotterell

A longstanding debate in semiotics centers on the relationship between linguistic signs and their corresponding semantics: is there an arbitrary relationship between a word form and its meaning, or does some systematic phenomenon pervade?

Rethinking Phonotactic Complexity

no code implementations WS 2019 Tiago Pimentel, Brian Roark, Ryan Cotterell

In this work, we propose the use of phone-level language models to estimate phonotactic complexity{---}measured in bits per phoneme{---}which makes cross-linguistic comparison straightforward.

Information-Theoretic Probing for Linguistic Structure

1 code implementation ACL 2020 Tiago Pimentel, Josef Valvoda, Rowan Hall Maudslay, Ran Zmigrod, Adina Williams, Ryan Cotterell

The success of neural networks on a diverse set of NLP tasks has led researchers to question how much these networks actually ``know'' about natural language.

Word Embeddings

Assessing the Reliability of Visual Explanations of Deep Models with Adversarial Perturbations

no code implementations22 Apr 2020 Dan Valle, Tiago Pimentel, Adriano Veloso

Thus, in this work we propose an objective measure to evaluate the reliability of explanations of deep models.

Feature Importance

Predicting Declension Class from Form and Meaning

1 code implementation ACL 2020 Adina Williams, Tiago Pimentel, Arya D. McCarthy, Hagen Blix, Eleanor Chodroff, Ryan Cotterell

We find for two Indo-European languages (Czech and German) that form and meaning respectively share significant amounts of information with class (and contribute additional information above and beyond gender).

A Tale of a Probe and a Parser

1 code implementation ACL 2020 Rowan Hall Maudslay, Josef Valvoda, Tiago Pimentel, Adina Williams, Ryan Cotterell

One such probe is the structural probe (Hewitt and Manning, 2019), designed to quantify the extent to which syntactic information is encoded in contextualised word representations.

Contextualised Word Representations

Phonotactic Complexity and its Trade-offs

1 code implementation TACL 2020 Tiago Pimentel, Brian Roark, Ryan Cotterell

We present methods for calculating a measure of phonotactic complexity---bits per phoneme---that permits a straightforward cross-linguistic comparison.

A Corpus for Large-Scale Phonetic Typology

no code implementations ACL 2020 Elizabeth Salesky, Eleanor Chodroff, Tiago Pimentel, Matthew Wiesner, Ryan Cotterell, Alan W. black, Jason Eisner

A major hurdle in data-driven research on typology is having sufficient data in many languages to draw meaningful conclusions.

Metaphor Detection using Context and Concreteness

no code implementations WS 2020 Rowan Hall Maudslay, Tiago Pimentel, Ryan Cotterell, Simone Teufel

We report the results of our system on the Metaphor Detection Shared Task at the Second Workshop on Figurative Language Processing 2020.

Pareto Probing: Trading Off Accuracy for Complexity

1 code implementation EMNLP 2020 Tiago Pimentel, Naomi Saphra, Adina Williams, Ryan Cotterell

In our contribution to this discussion, we argue for a probe metric that reflects the fundamental trade-off between probe complexity and performance: the Pareto hypervolume.

Dependency Parsing

Speakers Fill Lexical Semantic Gaps with Context

1 code implementation EMNLP 2020 Tiago Pimentel, Rowan Hall Maudslay, Damián Blasi, Ryan Cotterell

For a language to be clear and efficiently encoded, we posit that the lexical ambiguity of a word type should correlate with how much information context provides about it, on average.

Disambiguatory Signals are Stronger in Word-initial Positions

1 code implementation EACL 2021 Tiago Pimentel, Ryan Cotterell, Brian Roark

Psycholinguistic studies of human word processing and lexical access provide ample evidence of the preferred nature of word-initial versus word-final segments, e. g., in terms of attention paid by listeners (greater) or the likelihood of reduction by speakers (lower).

Informativeness

Finding Concept-specific Biases in Form--Meaning Associations

2 code implementations NAACL 2021 Tiago Pimentel, Brian Roark, Søren Wichmann, Ryan Cotterell, Damián Blasi

It is not a new idea that there are small, cross-linguistic associations between the forms and meanings of words.

How (Non-)Optimal is the Lexicon?

no code implementations NAACL 2021 Tiago Pimentel, Irene Nikkarinen, Kyle Mahowald, Ryan Cotterell, Damián Blasi

Examining corpora from 7 typologically diverse languages, we use those upper bounds to quantify the lexicon's optimality and to explore the relative costs of major constraints on natural codes.

A Non-Linear Structural Probe

no code implementations NAACL 2021 Jennifer C. White, Tiago Pimentel, Naomi Saphra, Ryan Cotterell

Probes are models devised to investigate the encoding of knowledge -- e. g. syntactic structure -- in contextual representations.

Modeling the Unigram Distribution

1 code implementation Findings (ACL) 2021 Irene Nikkarinen, Tiago Pimentel, Damián E. Blasi, Ryan Cotterell

The unigram distribution is the non-contextual probability of finding a specific word form in a corpus.

A Bayesian Framework for Information-Theoretic Probing

1 code implementation EMNLP 2021 Tiago Pimentel, Ryan Cotterell

Pimentel et al. (2020) recently analysed probing from an information-theoretic perspective.

Revisiting the Uniform Information Density Hypothesis

no code implementations EMNLP 2021 Clara Meister, Tiago Pimentel, Patrick Haller, Lena Jäger, Ryan Cotterell, Roger Levy

The uniform information density (UID) hypothesis posits a preference among language users for utterances structured such that information is distributed uniformly across a signal.

Linguistic Acceptability Sentence

On Homophony and Rényi Entropy

1 code implementation EMNLP 2021 Tiago Pimentel, Clara Meister, Simone Teufel, Ryan Cotterell

Homophony's widespread presence in natural languages is a controversial topic.

A surprisal--duration trade-off across and within the world's languages

1 code implementation30 Sep 2021 Tiago Pimentel, Clara Meister, Elizabeth Salesky, Simone Teufel, Damián Blasi, Ryan Cotterell

We thus conclude that there is strong evidence of a surprisal--duration trade-off in operation, both across and within the world's languages.

Locally Typical Sampling

3 code implementations1 Feb 2022 Clara Meister, Tiago Pimentel, Gian Wiher, Ryan Cotterell

Automatic and human evaluations show that, in comparison to nucleus and top-k sampling, locally typical sampling offers competitive performance (in both abstractive summarization and story generation) in terms of quality while consistently reducing degenerate repetitions.

Abstractive Text Summarization Story Generation

Analyzing Wrap-Up Effects through an Information-Theoretic Lens

no code implementations ACL 2022 Clara Meister, Tiago Pimentel, Thomas Hikaru Clark, Ryan Cotterell, Roger Levy

Numerous analyses of reading time (RT) data have been implemented -- all in an effort to better understand the cognitive processes driving reading comprehension.

Reading Comprehension Sentence

On the probability-quality paradox in language generation

no code implementations31 Mar 2022 Clara Meister, Gian Wiher, Tiago Pimentel, Ryan Cotterell

Specifically, we posit that human-like language should contain an amount of information (quantified as negative log-probability) that is close to the entropy of the distribution over natural strings.

Text Generation

Probing for the Usage of Grammatical Number

no code implementations ACL 2022 Karim Lasri, Tiago Pimentel, Alessandro Lenci, Thierry Poibeau, Ryan Cotterell

We also find that BERT uses a separate encoding of grammatical number for nouns and verbs.

UniMorph 4.0: Universal Morphology

no code implementations LREC 2022 Khuyagbaatar Batsuren, Omer Goldman, Salam Khalifa, Nizar Habash, Witold Kieraś, Gábor Bella, Brian Leonard, Garrett Nicolai, Kyle Gorman, Yustinus Ghanggo Ate, Maria Ryskina, Sabrina J. Mielke, Elena Budianskaya, Charbel El-Khaissi, Tiago Pimentel, Michael Gasser, William Lane, Mohit Raj, Matt Coler, Jaime Rafael Montoya Samame, Delio Siticonatzi Camaiteri, Benoît Sagot, Esaú Zumaeta Rojas, Didier López Francis, Arturo Oncevay, Juan López Bautista, Gema Celeste Silva Villegas, Lucas Torroba Hennigen, Adam Ek, David Guriel, Peter Dirix, Jean-Philippe Bernardy, Andrey Scherbakov, Aziyana Bayyr-ool, Antonios Anastasopoulos, Roberto Zariquiey, Karina Sheifer, Sofya Ganieva, Hilaria Cruz, Ritván Karahóǧa, Stella Markantonatou, George Pavlidis, Matvey Plugaryov, Elena Klyachko, Ali Salehi, Candy Angulo, Jatayu Baxi, Andrew Krizhanovsky, Natalia Krizhanovskaya, Elizabeth Salesky, Clara Vania, Sardana Ivanova, Jennifer White, Rowan Hall Maudslay, Josef Valvoda, Ran Zmigrod, Paula Czarnowska, Irene Nikkarinen, Aelita Salchak, Brijesh Bhatt, Christopher Straughn, Zoey Liu, Jonathan North Washington, Yuval Pinter, Duygu Ataman, Marcin Wolinski, Totok Suhardijanto, Anna Yablonskaya, Niklas Stoehr, Hossep Dolatian, Zahroh Nuriah, Shyam Ratan, Francis M. Tyers, Edoardo M. Ponti, Grant Aiton, Aryaman Arora, Richard J. Hatcher, Ritesh Kumar, Jeremiah Young, Daria Rodionova, Anastasia Yemelina, Taras Andrushko, Igor Marchenko, Polina Mashkovtseva, Alexandra Serova, Emily Prud'hommeaux, Maria Nepomniashchaya, Fausto Giunchiglia, Eleanor Chodroff, Mans Hulden, Miikka Silfverberg, Arya D. McCarthy, David Yarowsky, Ryan Cotterell, Reut Tsarfaty, Ekaterina Vylomova

The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema.

Morphological Inflection

Naturalistic Causal Probing for Morpho-Syntax

1 code implementation14 May 2022 Afra Amini, Tiago Pimentel, Clara Meister, Ryan Cotterell

Probing has become a go-to methodology for interpreting and analyzing deep neural models in natural language processing.

Sentence

Rethinking Reinforcement Learning for Recommendation: A Prompt Perspective

no code implementations15 Jun 2022 Xin Xin, Tiago Pimentel, Alexandros Karatzoglou, Pengjie Ren, Konstantina Christakopoulou, Zhaochun Ren

As reinforcement learning (RL) naturally fits this objective -- maximizing an user's reward per session -- it has become an emerging topic in recommender systems.

Recommendation Systems reinforcement-learning +1

On the Intersection of Context-Free and Regular Languages

1 code implementation14 Sep 2022 Clemente Pasti, Andreas Opedal, Tiago Pimentel, Tim Vieira, Jason Eisner, Ryan Cotterell

It shows, by a simple construction, that the intersection of a context-free language and a regular language is itself context-free.

The Architectural Bottleneck Principle

no code implementations11 Nov 2022 Tiago Pimentel, Josef Valvoda, Niklas Stoehr, Ryan Cotterell

This shift in perspective leads us to propose a new principle for probing, the architectural bottleneck principle: In order to estimate how much information a given component could extract, a probe should look exactly like the component.

Open-Ended Question Answering

On the Effect of Anticipation on Reading Times

1 code implementation25 Nov 2022 Tiago Pimentel, Clara Meister, Ethan G. Wilcox, Roger Levy, Ryan Cotterell

We assess the effect of anticipation on reading by comparing how well surprisal and contextual entropy predict reading times on four naturalistic reading datasets: two self-paced and two eye-tracking.

A Natural Bias for Language Generation Models

no code implementations19 Dec 2022 Clara Meister, Wojciech Stokowiec, Tiago Pimentel, Lei Yu, Laura Rimell, Adhiguna Kuncoro

After just a few hundred training updates, a standard probabilistic model for language generation has likely not yet learnt many semantic or syntactic rules of natural language, making it difficult to estimate the probability distribution over next tokens.

Machine Translation Text Generation

A Measure-Theoretic Characterization of Tight Language Models

no code implementations20 Dec 2022 Li Du, Lucas Torroba Hennigen, Tiago Pimentel, Clara Meister, Jason Eisner, Ryan Cotterell

Language modeling, a central task in natural language processing, involves estimating a probability distribution over strings.

Language Modelling

Few-shot Fine-tuning vs. In-context Learning: A Fair Comparison and Evaluation

1 code implementation26 May 2023 Marius Mosbach, Tiago Pimentel, Shauli Ravfogel, Dietrich Klakow, Yanai Elazar

In this paper, we compare the generalization of few-shot fine-tuning and in-context learning to challenge datasets, while controlling for the models used, the number of examples, and the number of parameters, ranging from 125M to 30B.

Domain Generalization In-Context Learning

On the Efficacy of Sampling Adapters

1 code implementation7 Jul 2023 Clara Meister, Tiago Pimentel, Luca Malagutti, Ethan G. Wilcox, Ryan Cotterell

While this trade-off is not reflected in standard metrics of distribution quality (such as perplexity), we find that several precision-emphasizing measures indeed indicate that sampling adapters can lead to probability distributions more aligned with the true distribution.

Text Generation

Testing the Predictions of Surprisal Theory in 11 Languages

no code implementations7 Jul 2023 Ethan Gotlieb Wilcox, Tiago Pimentel, Clara Meister, Ryan Cotterell, Roger P. Levy

We address this gap in the current literature by investigating the relationship between surprisal and reading times in eleven different languages, distributed across five language families.

An Exploration of Left-Corner Transformations

no code implementations27 Nov 2023 Andreas Opedal, Eleftheria Tsipidi, Tiago Pimentel, Ryan Cotterell, Tim Vieira

The left-corner transformation (Rosenkrantz and Lewis, 1970) is used to remove left recursion from context-free grammars, which is an important step towards making the grammar parsable top-down with simple techniques.

Quantifying the redundancy between prosody and text

1 code implementation28 Nov 2023 Lukas Wolf, Tiago Pimentel, Evelina Fedorenko, Ryan Cotterell, Alex Warstadt, Ethan Wilcox, Tamar Regev

Using a large spoken corpus of English audiobooks, we extract prosodic features aligned to individual words and test how well they can be predicted from LLM embeddings, compared to non-contextual word embeddings.

Word Embeddings

Revisiting the Optimality of Word Lengths

no code implementations6 Dec 2023 Tiago Pimentel, Clara Meister, Ethan Gotlieb Wilcox, Kyle Mahowald, Ryan Cotterell

Under this method, we find that a language's word lengths should instead be proportional to the surprisal's expectation plus its variance-to-mean ratio.

High probability or low information? The probability–quality paradox in language generation

no code implementations ACL 2022 Clara Meister, Gian Wiher, Tiago Pimentel, Ryan Cotterell

When generating natural language from neural probabilistic models, high probability does not always coincide with high quality.

Text Generation

A surprisal–duration trade-off across and within the world’s languages

1 code implementation EMNLP 2021 Tiago Pimentel, Clara Meister, Elizabeth Salesky, Simone Teufel, Damián Blasi, Ryan Cotterell

We thus conclude that there is strong evidence of a surprisal–duration trade-off in operation, both across and within the world’s languages.

Cannot find the paper you are looking for? You can Submit a new open access paper.