Search Results for author: Aline Villavicencio

Found 53 papers, 14 papers with code

CogNLP-Sheffield at CMCL 2021 Shared Task: Blending Cognitively Inspired Features with Transformer-based Language Models for Predicting Eye Tracking Patterns

no code implementations • NAACL (CMCL) 2021 • Peter Vickers, Rosa Wainwright, Harish Tayyar Madabushi, Aline Villavicencio

The CogNLP-Sheffield submissions to the CMCL 2021 Shared Task examine the value of a variety of cognitively and linguistically inspired features for predicting eye tracking patterns, as both standalone model inputs and as supplements to contextual word embeddings (XLNet).

Word Embeddings

Paper
Add Code

Leveraging Contextual Embeddings and Idiom Principle for Detecting Idiomaticity in Potentially Idiomatic Expressions

no code implementations • COLING (CogALex) 2020 • REYHANEH HASHEMPOUR, Aline Villavicencio

In this work, we leverage the Idiom Principle (Sinclair et al., 1991) and contextualized word embeddings (CWEs), focusing on Context2Vec (Melamud et al., 2016) and BERT (Devlin et al., 2019) to distinguish between literal and idiomatic senses of such expressions in context.

Text Simplification Word Embeddings

Paper
Add Code

An Empirical Study on Cross-lingual Vocabulary Adaptation for Efficient Generative LLM Inference

1 code implementation • 16 Feb 2024 • Atsuki Yamaguchi, Aline Villavicencio, Nikolaos Aletras

We also show that adapting LLMs that have been pre-trained on more balanced multilingual data results in downstream performance comparable to the original models.

Natural Language Understanding

Paper
Code

Word Boundary Information Isn't Useful for Encoder Language Models

no code implementations • 15 Jan 2024 • Edward Gow-Smith, Dylan Phelps, Harish Tayyar Madabushi, Carolina Scarton, Aline Villavicencio

As such, removing these symbols has been shown to have a beneficial effect on the processing of morphologically complex words for transformer encoders in the pretrain-finetune paradigm.

NER Sentence

Paper
Add Code

Evaluating Open-Domain Dialogues in Latent Space with Next Sentence Prediction and Mutual Information

1 code implementation • 26 May 2023 • Kun Zhao, Bohao Yang, Chenghua Lin, Wenge Rong, Aline Villavicencio, Xiaohui Cui

The long-standing one-to-many issue of the open-domain dialogues poses significant challenges for automatic evaluation methods, i. e., there may be multiple suitable responses which differ in semantics for a given conversational context.

Paper
Code

Assessing Linguistic Generalisation in Language Models: A Dataset for Brazilian Portuguese

no code implementations • 23 May 2023 • Rodrigo Wilkens, Leonardo Zilio, Aline Villavicencio

These tasks are designed to evaluate how different language models generalise information related to grammatical structures and multiword expressions (MWEs), thus allowing for an assessment of whether the model has learned different linguistic phenomena.

Paper
Add Code

Effective Cross-Task Transfer Learning for Explainable Natural Language Inference with T5

1 code implementation • 31 Oct 2022 • Irina Bigoulaeva, Rachneet Sachdeva, Harish Tayyar Madabushi, Aline Villavicencio, Iryna Gurevych

We compare sequential fine-tuning with a model for multi-task learning in the context where we are interested in boosting performance on two tasks, one of which depends on the other.

Multi-Task Learning Natural Language Inference

Paper
Code

Sample Efficient Approaches for Idiomaticity Detection

no code implementations • LREC (MWE) 2022 • Dylan Phelps, Xuan-Rui Fan, Edward Gow-Smith, Harish Tayyar Madabushi, Carolina Scarton, Aline Villavicencio

In particular we study the impact of Pattern Exploit Training (PET), a few-shot method of classification, and BERTRAM, an efficient method of creating contextual embeddings, on the task of idiomaticity detection.

Paper
Add Code

SemEval-2022 Task 2: Multilingual Idiomaticity Detection and Sentence Embedding

1 code implementation • SemEval (NAACL) 2022 • Harish Tayyar Madabushi, Edward Gow-Smith, Marcos Garcia, Carolina Scarton, Marco Idiart, Aline Villavicencio

This paper presents the shared task on Multilingual Idiomaticity Detection and Sentence Embedding, which consists of two subtasks: (a) a binary classification task aimed at identifying whether a sentence contains an idiomatic expression, and (b) a task based on semantic text similarity which requires the model to adequately represent potentially idiomatic expressions in context.

Binary Classification Sentence +4

Paper
Code

Improving Tokenisation by Alternative Treatment of Spaces

1 code implementation • 8 Apr 2022 • Edward Gow-Smith, Harish Tayyar Madabushi, Carolina Scarton, Aline Villavicencio

We find that our modified algorithms lead to improved performance on downstream NLP tasks that involve handling complex words, whilst having no detrimental effect on performance in general natural language understanding tasks.

Natural Language Understanding

Paper
Code

AStitchInLanguageModels: Dataset and Methods for the Exploration of Idiomaticity in Pre-Trained Language Models

1 code implementation • Findings (EMNLP) 2021 • Harish Tayyar Madabushi, Edward Gow-Smith, Carolina Scarton, Aline Villavicencio

Despite their success in a variety of NLP tasks, pre-trained language models, due to their heavy reliance on compositionality, fail in effectively capturing the meanings of multiword expressions (MWEs), especially idioms.

Language Modelling

Paper
Code

Assessing the Representations of Idiomaticity in Vector Models with a Noun Compound Dataset Labeled at Type and Token Levels

1 code implementation • ACL 2021 • Marcos Garcia, Tiago Kramer Vieira, Carolina Scarton, Marco Idiart, Aline Villavicencio

This paper presents the Noun Compound Type and Token Idiomaticity (NCTTI) dataset, with human annotations for 280 noun compounds in English and 180 in Portuguese at both type and token level.

Vocal Bursts Type Prediction

Paper
Code

Unsupervised Word Segmentation from Discrete Speech Units in Low-Resource Settings

no code implementations • SIGUL (LREC) 2022 • Marcely Zanon Boito, Bolaji Yusuf, Lucas Ondel, Aline Villavicencio, Laurent Besacier

Our results suggest that neural models for speech discretization are difficult to exploit in our setting, and that it might be necessary to adapt them to limit sequence length.

Paper
Add Code

Probing for idiomaticity in vector space models

1 code implementation • EACL 2021 • Marcos Garcia, Tiago Kramer Vieira, Carolina Scarton, Marco Idiart, Aline Villavicencio

Contextualised word representation models have been successfully used for capturing different word usages and they may be an attractive alternative for representing idiomaticity in language.

Paper
Code

Token Level Identification of Multiword Expressions Using Contextual Information

no code implementations • WS 2020 • REYHANEH HASHEMPOUR, Aline Villavicencio

Studies on detecting idiomatic expressions mostly focus on discovering potentially idiomatic expressions disregarding the context.

Word Embeddings

Paper
Add Code

Investigating Language Impact in Bilingual Approaches for Computational Language Documentation

no code implementations • LREC 2020 • Marcely Zanon Boito, Aline Villavicencio, Laurent Besacier

For answering this question, we use the MaSS multilingual speech corpus (Boito et al., 2020) for creating 56 bilingual pairs that we apply to the task of low-resource unsupervised word segmentation and alignment.

Segmentation Translation

Paper
Add Code

How Does Language Influence Documentation Workflow? Unsupervised Word Discovery Using Translations in Multiple Languages

1 code implementation • 11 Oct 2019 • Marcely Zanon Boito, Aline Villavicencio, Laurent Besacier

For language documentation initiatives, transcription is an expensive resource: one minute of audio is estimated to take one hour and a half on average of a linguist's work (Austin and Sallabank, 2013).

Paper
Code

Why So Down? The Role of Negative (and Positive) Pointwise Mutual Information in Distributional Semantics

1 code implementation • 19 Aug 2019 • Alexandre Salle, Aline Villavicencio

In distributional semantics, the pointwise mutual information ($\mathit{PMI}$) weighting of the cooccurrence matrix performs far better than raw counts.

781

Paper
Code

When the whole is greater than the sum of its parts: Multiword expressions and idiomaticity

no code implementations • WS 2019 • Aline Villavicencio

Multiword expressions (MWEs) feature prominently in the mental lexicon of native speakers (Jackendoff, 1997) in all languages and domains, from informal to technical contexts (Biber et al., 1999) with about four MWEs being produced per minute of discourse (Glucksberg, 1989).

Sentence

Paper
Add Code

Empirical Evaluation of Sequence-to-Sequence Models for Word Discovery in Low-resource Settings

1 code implementation • 29 Jun 2019 • Marcely Zanon Boito, Aline Villavicencio, Laurent Besacier

This task consists in aligning word sequences in a source language with phoneme sequences in a target language, inferring from it word segmentation on the target side [5].

Machine Translation

Paper
Code

Unsupervised Compositionality Prediction of Nominal Compounds

no code implementations • CL 2019 • Silvio Cordeiro, Aline Villavicencio, Marco Idiart, Carlos Ramisch

General crosslingual analyses reveal the impact of morphological variation and corpus size in the ability of the model to predict compositionality, and of a uniform combination of the components for best results.

Paper
Add Code

A small Griko-Italian speech translation corpus

no code implementations • 27 Jul 2018 • Marcely Zanon Boito, Antonios Anastasopoulos, Marika Lekakou, Aline Villavicencio, Laurent Besacier

This paper presents an extension to a very low-resource parallel corpus collected in an endangered language, Griko, making it useful for computational research.

Translation

Paper
Add Code

Unsupervised Word Segmentation from Speech with Attention

no code implementations • 18 Jun 2018 • Pierre Godard, Marcely Zanon-Boito, Lucas Ondel, Alexandre Berard, François Yvon, Aline Villavicencio, Laurent Besacier

We present a first attempt to perform attentional word segmentation directly from the speech signal, with the final goal to automatically identify lexical units in a low-resource, unwritten language (UL).

Acoustic Unit Discovery Machine Translation +2

Paper
Add Code

Similarity Measures for the Detection of Clinical Conditions with Verbal Fluency Tasks

no code implementations • NAACL 2018 • Felipe Paula, Rodrigo Wilkens, Marco Idiart, Aline Villavicencio

Semantic Verbal Fluency tests have been used in the detection of certain clinical conditions, like Dementia.

Paper
Add Code

Incorporating Subword Information into Matrix Factorization Word Embeddings

1 code implementation • WS 2018 • Alexandre Salle, Aline Villavicencio

The positive effect of adding subword information to word embeddings has been demonstrated for predictive models.

Word Embeddings

781

Paper
Code

The brWaC Corpus: A New Open Resource for Brazilian Portuguese

no code implementations • LREC 2018 • Jorge A. Wagner Filho, Rodrigo Wilkens, Marco Idiart, Aline Villavicencio

Word Sense Induction

Paper
Add Code

Unwritten Languages Demand Attention Too! Word Discovery with Encoder-Decoder Models

no code implementations • 17 Sep 2017 • Marcely Zanon Boito, Alexandre Berard, Aline Villavicencio, Laurent Besacier

Word discovery is the task of extracting words from unsegmented text.

Machine Translation Translation

Paper
Add Code

Restricted Recurrent Neural Tensor Networks: Exploiting Word Frequency and Compositionality

no code implementations • ACL 2018 • Alexandre Salle, Aline Villavicencio

Increasing the capacity of recurrent neural networks (RNN) usually involves augmenting the size of the hidden layer, with significant increase of computational cost.

Language Modelling Tensor Networks

Paper
Add Code

LexSubNC: A Dataset of Lexical Substitution for Nominal Compounds

no code implementations • WS 2017 • Rodrigo Wilkens, Leonardo Zilio, Silvio Ricardo Cordeiro, Felipe Paula, Carlos Ramisch, Marco Idiart, Aline Villavicencio

Machine Translation Text Simplification

Paper
Add Code

Automatic Construction of Large Readability Corpora

no code implementations • WS 2016 • Jorge Alberto Wagner Filho, Rodrigo Wilkens, Aline Villavicencio

In a comparison between shallow and deeper features, the former already produce F-measures of over 0. 75 for Portuguese texts, but the use of additional features results in even better results, in most cases.

Text Classification Text Simplification

Paper
Add Code

How Naked is the Naked Truth? A Multilingual Lexicon of Nominal Compound Compositionality

no code implementations • ACL 2016 • Carlos Ramisch, Silvio Cordeiro, Leonardo Zilio, Marco Idiart, Aline Villavicencio

Paper
Add Code

Predicting the Compositionality of Nominal Compounds: Giving Word Embeddings a Hard Time

no code implementations • ACL 2016 • Silvio Cordeiro, Carlos Ramisch, Marco Idiart, Aline Villavicencio

Lemmatization Machine Translation +3

Paper
Add Code

Filtering and Measuring the Intrinsic Quality of Human Compositionality Judgments

no code implementations • WS 2016 • Carlos Ramisch, Silvio Cordeiro, Aline Villavicencio

Paper
Add Code

Enhancing the LexVec Distributed Word Representation Model Using Positional Contexts and External Memory

1 code implementation • 3 Jun 2016 • Alexandre Salle, Marco Idiart, Aline Villavicencio

The effectiveness of both modifications is shown using word similarity and analogy tasks.

Word Similarity

781

Paper
Code

Matrix Factorization using Window Sampling and Negative Sampling for Improved Word Representations

1 code implementation • ACL 2016 • Alexandre Salle, Marco Idiart, Aline Villavicencio

In this paper, we propose LexVec, a new method for generating distributed word representations that uses low-rank, weighted factorization of the Positive Point-wise Mutual Information matrix via stochastic gradient descent, employing a weighting scheme that assigns heavier penalties for errors on frequent co-occurrences while still accounting for negative co-occurrence.

Word Similarity

781

Paper
Code

UFRGS\&LIF at SemEval-2016 Task 10: Rule-Based MWE Identification and Predominant-Supersense Tagging

no code implementations • SEMEVAL 2016 • Silvio Cordeiro, Carlos Ramisch, Aline Villavicencio

Machine Translation

Paper
Add Code

B2SG: a TOEFL-like Task for Portuguese

no code implementations • LREC 2016 • Rodrigo Wilkens, Leonardo Zilio, Eduardo Ferreira, Aline Villavicencio

They can be used as the basis for evaluating the accuracy of the similarity relations on distributional thesauri by comparing the proximity of the target word with the related and unrelated options and observing if the related word has the highest similarity value among them.

Paper
Add Code

Multiword Expressions in Child Language

no code implementations • LREC 2016 • Rodrigo Wilkens, Marco Idiart, Aline Villavicencio

Focusing on compound nouns (CN), we then verify in a longitudinal study if there are differences in the distribution and compositionality of CNs in child-directed and child-produced sentences across ages.

Language Acquisition

Paper
Add Code

mwetoolkit+sem: Integrating Word Embeddings in the mwetoolkit for Semantic MWE Processing

no code implementations • LREC 2016 • Silvio Cordeiro, Carlos Ramisch, Aline Villavicencio

This paper presents mwetoolkit+sem: an extension of the mwetoolkit that estimates semantic compositionality scores for multiword expressions (MWEs) based on word embeddings.

Word Embeddings

Paper
Add Code

VerbLexPor: a lexical resource with semantic roles for Portuguese

no code implementations • LREC 2016 • Leonardo Zilio, Maria Jos{\'e} Bocorny Finatto, Aline Villavicencio

The sentences from both corpora were annotated separately, so that it is possible to access sentences either from the Cardiology or from the newspaper corpus.

Sentence

Paper
Add Code

VerbLexPor: um recurso l\'exico com anota\cc\~ao de pap\'eis sem\^anticos para o portugu\^es (VerbLexPor: a lexical resource annotated with semantic roles for Portuguese)

no code implementations • WS 2015 • Leonardo Zilio, Maria Jos{\'e} Bocorny Finatto, Aline Villavicencio

Semantic Role Labeling

Paper
Add Code

Distributional Thesauri for Portuguese: methodology evaluation

no code implementations • WS 2015 • Rodrigo Wilkens, Leonardo Zilio, Eduardo Ferreira, Gabriel Gon{\c{c}}alves, Aline Villavicencio

Paper
Add Code

Nothing like Good Old Frequency: Studying Context Filters for Distributional Thesauri

no code implementations • EMNLP 2014 • Muntsa Padr{\'o}, Marco Idiart, Aline Villavicencio, Carlos Ramisch

Paper
Add Code

Identification of Multiword Expressions in the brWaC

no code implementations • LREC 2014 • Rodrigo Boos, Kassius Prestes, Aline Villavicencio

To indirectly assess the quality of the resulting corpus we examined the impact of corpus origin in a specific task, the identification of Multiword Expressions with association measures, against a standard corpus.

Information Retrieval Machine Translation +1

Paper
Add Code

Comparing the Quality of Focused Crawlers and of the Translation Resources Obtained from them

no code implementations • LREC 2014 • Bruno Laranjeira, Viviane Moreira, Aline Villavicencio, Carlos Ramisch, Maria Jos{\'e} Finatto

Comparable corpora have been used as an alternative for parallel corpora as resources for computational tasks that involve domain-specific natural language processing.

Machine Translation Translation

Paper
Add Code

Comparing Similarity Measures for Distributional Thesauri

no code implementations • LREC 2014 • Muntsa Padr{\'o}, Marco Idiart, Aline Villavicencio, Carlos Ramisch

Distributional thesauri have been applied for a variety of tasks involving semantic relatedness.

Dimensionality Reduction

Paper
Add Code

Language Acquisition and Probabilistic Models: keeping it simple

no code implementations • ACL 2013 • Aline Villavicencio, Marco Idiart, Robert Berwick, Igor Malioutov

Language Acquisition

Paper
Add Code

A Broad Evaluation of Techniques for Automatic Acquisition of Multiword Expressions

no code implementations • WS 2012 • Carlos Ramisch, Vitor De Araujo, Aline Villavicencio

Information Retrieval Word Sense Disambiguation

Paper
Add Code

A Comparable Corpus Based on Aligned Multilingual Ontologies

no code implementations • WS 2012 • Roger Granada, Lucelene Lopes, Carlos Ramisch, Cassia Trojahn, Renata Vieira, Aline Villavicencio

Paper
Add Code

A large scale annotated child language construction database

no code implementations • LREC 2012 • Aline Villavicencio, Beracah Yankama, Marco Idiart, Robert Berwick

This paper describes such an initiative for combining information from various sources to extend the annotation of the English CHILDES corpora with linguistic, psycholinguistic and distributional information, along with an example illustrating an application of this approach to the extraction of verb alternation information.

Language Acquisition POS +1

Paper
Add Code

Get out but don't fall down: verb-particle constructions in child language

no code implementations • WS 2012 • Aline Villavicencio, Marco Idiart, Carlos Ramisch, V{\'\i}tor Ara{\'u}jo, Beracah Yankama, Robert Berwick

Language Acquisition

Paper
Add Code

An annotated English child language database

no code implementations • WS 2012 • Aline Villavicencio, Beracah Yankama, Rodrigo Wilkens, Marco Idiart, Robert Berwick

Language Acquisition Lemmatization

Paper
Add Code

I say have you say tem: profiling verbs in children data in English and Portuguese

no code implementations • WS 2012 • Rodrigo Wilkens, Aline Villavicencio

Language Acquisition

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.