Search Results for author: Tiago Pimentel

Found 37 papers, 16 papers with code

High probability or low information? The probability–quality paradox in language generation

no code implementations ACL 2022 Clara Meister, Gian Wiher, Tiago Pimentel, Ryan Cotterell

When generating natural language from neural probabilistic models, high probability does not always coincide with high quality.

Text Generation

A surprisal–duration trade-off across and within the world’s languages

1 code implementation EMNLP 2021 Tiago Pimentel, Clara Meister, Elizabeth Salesky, Simone Teufel, Damián Blasi, Ryan Cotterell

We thus conclude that there is strong evidence of a surprisal–duration trade-off in operation, both across and within the world’s languages.

Rethinking Reinforcement Learning for Recommendation: A Prompt Perspective

no code implementations15 Jun 2022 Xin Xin, Tiago Pimentel, Alexandros Karatzoglou, Pengjie Ren, Konstantina Christakopoulou, Zhaochun Ren

As reinforcement learning (RL) naturally fits this objective -- maximizing an user's reward per session -- it has become an emerging topic in recommender systems.

Recommendation Systems reinforcement-learning

Cluster-based Evaluation of Automatically Generated Text

no code implementations31 May 2022 Tiago Pimentel, Clara Meister, Ryan Cotterell

We first discuss the computational and qualitative issues with using automatic evaluation metrics that operate on probability distributions over strings, the backbone of most language generators.

Language Modelling Text Generation

Naturalistic Causal Probing for Morpho-Syntax

no code implementations14 May 2022 Afra Amini, Tiago Pimentel, Clara Meister, Ryan Cotterell

In this work, we suggest a naturalistic strategy for input-level intervention on real world data in Spanish, which is a language with gender marking.

Natural Language Processing

UniMorph 4.0: Universal Morphology

no code implementations7 May 2022 Khuyagbaatar Batsuren, Omer Goldman, Salam Khalifa, Nizar Habash, Witold Kieraś, Gábor Bella, Brian Leonard, Garrett Nicolai, Kyle Gorman, Yustinus Ghanggo Ate, Maria Ryskina, Sabrina J. Mielke, Elena Budianskaya, Charbel El-Khaissi, Tiago Pimentel, Michael Gasser, William Lane, Mohit Raj, Matt Coler, Jaime Rafael Montoya Samame, Delio Siticonatzi Camaiteri, Benoît Sagot, Esaú Zumaeta Rojas, Didier López Francis, Arturo Oncevay, Juan López Bautista, Gema Celeste Silva Villegas, Lucas Torroba Hennigen, Adam Ek, David Guriel, Peter Dirix, Jean-Philippe Bernardy, Andrey Scherbakov, Aziyana Bayyr-ool, Antonios Anastasopoulos, Roberto Zariquiey, Karina Sheifer, Sofya Ganieva, Hilaria Cruz, Ritván Karahóǧa, Stella Markantonatou, George Pavlidis, Matvey Plugaryov, Elena Klyachko, Ali Salehi, Candy Angulo, Jatayu Baxi, Andrew Krizhanovsky, Natalia Krizhanovskaya, Elizabeth Salesky, Clara Vania, Sardana Ivanova, Jennifer White, Rowan Hall Maudslay, Josef Valvoda, Ran Zmigrod, Paula Czarnowska, Irene Nikkarinen, Aelita Salchak, Brijesh Bhatt, Christopher Straughn, Zoey Liu, Jonathan North Washington, Yuval Pinter, Duygu Ataman, Marcin Wolinski, Totok Suhardijanto, Anna Yablonskaya, Niklas Stoehr, Hossep Dolatian, Zahroh Nuriah, Shyam Ratan, Francis M. Tyers, Edoardo M. Ponti, Grant Aiton, Aryaman Arora, Richard J. Hatcher, Ritesh Kumar, Jeremiah Young, Daria Rodionova, Anastasia Yemelina, Taras Andrushko, Igor Marchenko, Polina Mashkovtseva, Alexandra Serova, Emily Prud'hommeaux, Maria Nepomniashchaya, Fausto Giunchiglia, Eleanor Chodroff, Mans Hulden, Miikka Silfverberg, Arya D. McCarthy, David Yarowsky, Ryan Cotterell, Reut Tsarfaty, Ekaterina Vylomova

The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema.

Morphological Inflection

Probing for the Usage of Grammatical Number

no code implementations ACL 2022 Karim Lasri, Tiago Pimentel, Alessandro Lenci, Thierry Poibeau, Ryan Cotterell

We also find that BERT uses a separate encoding of grammatical number for nouns and verbs.

On the probability-quality paradox in language generation

no code implementations31 Mar 2022 Clara Meister, Gian Wiher, Tiago Pimentel, Ryan Cotterell

Specifically, we posit that human-like language should contain an amount of information (quantified as negative log-probability) that is close to the entropy of the distribution over natural strings.

Text Generation

Analyzing Wrap-Up Effects through an Information-Theoretic Lens

no code implementations ACL 2022 Clara Meister, Tiago Pimentel, Thomas Hikaru Clark, Ryan Cotterell, Roger Levy

Numerous analyses of reading time (RT) data have been implemented -- all in an effort to better understand the cognitive processes driving reading comprehension.

Reading Comprehension

Typical Decoding for Natural Language Generation

1 code implementation1 Feb 2022 Clara Meister, Tiago Pimentel, Gian Wiher, Ryan Cotterell

Despite achieving incredibly low perplexities on myriad natural language corpora, today's language models still often underperform when used to generate text.

Text Generation

A surprisal--duration trade-off across and within the world's languages

1 code implementation30 Sep 2021 Tiago Pimentel, Clara Meister, Elizabeth Salesky, Simone Teufel, Damián Blasi, Ryan Cotterell

We thus conclude that there is strong evidence of a surprisal--duration trade-off in operation, both across and within the world's languages.

On Homophony and Rényi Entropy

1 code implementation EMNLP 2021 Tiago Pimentel, Clara Meister, Simone Teufel, Ryan Cotterell

Homophony's widespread presence in natural languages is a controversial topic.

Revisiting the Uniform Information Density Hypothesis

no code implementations EMNLP 2021 Clara Meister, Tiago Pimentel, Patrick Haller, Lena Jäger, Ryan Cotterell, Roger Levy

The uniform information density (UID) hypothesis posits a preference among language users for utterances structured such that information is distributed uniformly across a signal.

Linguistic Acceptability

A Bayesian Framework for Information-Theoretic Probing

1 code implementation EMNLP 2021 Tiago Pimentel, Ryan Cotterell

Pimentel et al. (2020) recently analysed probing from an information-theoretic perspective.

Modeling the Unigram Distribution

1 code implementation Findings (ACL) 2021 Irene Nikkarinen, Tiago Pimentel, Damián E. Blasi, Ryan Cotterell

The unigram distribution is the non-contextual probability of finding a specific word form in a corpus.

Natural Language Processing

A Non-Linear Structural Probe

no code implementations NAACL 2021 Jennifer C. White, Tiago Pimentel, Naomi Saphra, Ryan Cotterell

Probes are models devised to investigate the encoding of knowledge -- e. g. syntactic structure -- in contextual representations.

How (Non-)Optimal is the Lexicon?

no code implementations NAACL 2021 Tiago Pimentel, Irene Nikkarinen, Kyle Mahowald, Ryan Cotterell, Damián Blasi

Examining corpora from 7 typologically diverse languages, we use those upper bounds to quantify the lexicon's optimality and to explore the relative costs of major constraints on natural codes.

Quantifying Gender Bias Towards Politicians in Cross-Lingual Language Models

no code implementations15 Apr 2021 Karolina Stańczak, Sagnik Ray Choudhury, Tiago Pimentel, Ryan Cotterell, Isabelle Augenstein

While the prevalence of large pre-trained language models has led to significant improvements in the performance of NLP systems, recent research has demonstrated that these models inherit societal biases extant in natural language.

Language Modelling

Finding Concept-specific Biases in Form--Meaning Associations

2 code implementations NAACL 2021 Tiago Pimentel, Brian Roark, Søren Wichmann, Ryan Cotterell, Damián Blasi

It is not a new idea that there are small, cross-linguistic associations between the forms and meanings of words.

Disambiguatory Signals are Stronger in Word-initial Positions

1 code implementation EACL 2021 Tiago Pimentel, Ryan Cotterell, Brian Roark

Psycholinguistic studies of human word processing and lexical access provide ample evidence of the preferred nature of word-initial versus word-final segments, e. g., in terms of attention paid by listeners (greater) or the likelihood of reduction by speakers (lower).

Informativeness

Speakers Fill Lexical Semantic Gaps with Context

1 code implementation EMNLP 2020 Tiago Pimentel, Rowan Hall Maudslay, Damián Blasi, Ryan Cotterell

For a language to be clear and efficiently encoded, we posit that the lexical ambiguity of a word type should correlate with how much information context provides about it, on average.

Pareto Probing: Trading Off Accuracy for Complexity

1 code implementation EMNLP 2020 Tiago Pimentel, Naomi Saphra, Adina Williams, Ryan Cotterell

In our contribution to this discussion, we argue for a probe metric that reflects the fundamental trade-off between probe complexity and performance: the Pareto hypervolume.

Dependency Parsing

Metaphor Detection using Context and Concreteness

no code implementations WS 2020 Rowan Hall Maudslay, Tiago Pimentel, Ryan Cotterell, Simone Teufel

We report the results of our system on the Metaphor Detection Shared Task at the Second Workshop on Figurative Language Processing 2020.

A Corpus for Large-Scale Phonetic Typology

no code implementations ACL 2020 Elizabeth Salesky, Eleanor Chodroff, Tiago Pimentel, Matthew Wiesner, Ryan Cotterell, Alan W. black, Jason Eisner

A major hurdle in data-driven research on typology is having sufficient data in many languages to draw meaningful conclusions.

Phonotactic Complexity and its Trade-offs

1 code implementation TACL 2020 Tiago Pimentel, Brian Roark, Ryan Cotterell

We present methods for calculating a measure of phonotactic complexity---bits per phoneme---that permits a straightforward cross-linguistic comparison.

A Tale of a Probe and a Parser

1 code implementation ACL 2020 Rowan Hall Maudslay, Josef Valvoda, Tiago Pimentel, Adina Williams, Ryan Cotterell

One such probe is the structural probe (Hewitt and Manning, 2019), designed to quantify the extent to which syntactic information is encoded in contextualised word representations.

Contextualised Word Representations

Predicting Declension Class from Form and Meaning

1 code implementation ACL 2020 Adina Williams, Tiago Pimentel, Arya D. McCarthy, Hagen Blix, Eleanor Chodroff, Ryan Cotterell

We find for two Indo-European languages (Czech and German) that form and meaning respectively share significant amounts of information with class (and contribute additional information above and beyond gender).

Assessing the Reliability of Visual Explanations of Deep Models with Adversarial Perturbations

no code implementations22 Apr 2020 Dan Valle, Tiago Pimentel, Adriano Veloso

Thus, in this work we propose an objective measure to evaluate the reliability of explanations of deep models.

Feature Importance

Information-Theoretic Probing for Linguistic Structure

1 code implementation ACL 2020 Tiago Pimentel, Josef Valvoda, Rowan Hall Maudslay, Ran Zmigrod, Adina Williams, Ryan Cotterell

The success of neural networks on a diverse set of NLP tasks has led researchers to question how much these networks actually ``know'' about natural language.

Word Embeddings

Rethinking Phonotactic Complexity

no code implementations WS 2019 Tiago Pimentel, Brian Roark, Ryan Cotterell

In this work, we propose the use of phone-level language models to estimate phonotactic complexity{---}measured in bits per phoneme{---}which makes cross-linguistic comparison straightforward.

Meaning to Form: Measuring Systematicity as Information

1 code implementation ACL 2019 Tiago Pimentel, Arya D. McCarthy, Damián E. Blasi, Brian Roark, Ryan Cotterell

A longstanding debate in semiotics centers on the relationship between linguistic signs and their corresponding semantics: is there an arbitrary relationship between a word form and its meaning, or does some systematic phenomenon pervade?

UaiNets: From Unsupervised to Active Deep Anomaly Detection

no code implementations ICLR 2019 Tiago Pimentel, Marianne Monteiro, Juliano Viana, Adriano Veloso, Nivio Ziviani

This work presents a method for active anomaly detection which can be built upon existing deep learning solutions for unsupervised anomaly detection.

Unsupervised Anomaly Detection

Deep Active Learning for Anomaly Detection

no code implementations23 May 2018 Tiago Pimentel, Marianne Monteiro, Adriano Veloso, Nivio Ziviani

Anomalies are intuitively easy for human experts to understand, but they are hard to define mathematically.

Active Learning Unsupervised Anomaly Detection

Fast Node Embeddings: Learning Ego-Centric Representations

no code implementations ICLR 2018 Tiago Pimentel, Adriano Veloso, Nivio Ziviani

Representation learning is one of the foundations of Deep Learning and allowed important improvements on several Machine Learning tasks, such as Neural Machine Translation, Question Answering and Speech Recognition.

Link Prediction Machine Translation +6

Cannot find the paper you are looking for? You can Submit a new open access paper.