Search Results for author: Marco Idiart

Found 19 papers, 5 papers with code

SemEval-2022 Task 2: Multilingual Idiomaticity Detection and Sentence Embedding

1 code implementation SemEval (NAACL) 2022 Harish Tayyar Madabushi, Edward Gow-Smith, Marcos Garcia, Carolina Scarton, Marco Idiart, Aline Villavicencio

This paper presents the shared task on Multilingual Idiomaticity Detection and Sentence Embedding, which consists of two subtasks: (a) a binary classification task aimed at identifying whether a sentence contains an idiomatic expression, and (b) a task based on semantic text similarity which requires the model to adequately represent potentially idiomatic expressions in context.

Binary Classification Sentence +4

Assessing the Representations of Idiomaticity in Vector Models with a Noun Compound Dataset Labeled at Type and Token Levels

1 code implementation ACL 2021 Marcos Garcia, Tiago Kramer Vieira, Carolina Scarton, Marco Idiart, Aline Villavicencio

This paper presents the Noun Compound Type and Token Idiomaticity (NCTTI) dataset, with human annotations for 280 noun compounds in English and 180 in Portuguese at both type and token level.

Vocal Bursts Type Prediction

Probing for idiomaticity in vector space models

1 code implementation EACL 2021 Marcos Garcia, Tiago Kramer Vieira, Carolina Scarton, Marco Idiart, Aline Villavicencio

Contextualised word representation models have been successfully used for capturing different word usages and they may be an attractive alternative for representing idiomaticity in language.

Theta, alpha and gamma traveling waves in a multi-item working memory model

no code implementations29 Mar 2021 Gustavo Soroka, Marco Idiart

Brain oscillations are believed to be involved in the different operations necessary to manipulate information during working memory tasks.

Blocking

Unsupervised Compositionality Prediction of Nominal Compounds

no code implementations CL 2019 Silvio Cordeiro, Aline Villavicencio, Marco Idiart, Carlos Ramisch

General crosslingual analyses reveal the impact of morphological variation and corpus size in the ability of the model to predict compositionality, and of a uniform combination of the components for best results.

Matrix Factorization using Window Sampling and Negative Sampling for Improved Word Representations

1 code implementation ACL 2016 Alexandre Salle, Marco Idiart, Aline Villavicencio

In this paper, we propose LexVec, a new method for generating distributed word representations that uses low-rank, weighted factorization of the Positive Point-wise Mutual Information matrix via stochastic gradient descent, employing a weighting scheme that assigns heavier penalties for errors on frequent co-occurrences while still accounting for negative co-occurrence.

Word Similarity

Multiword Expressions in Child Language

no code implementations LREC 2016 Rodrigo Wilkens, Marco Idiart, Aline Villavicencio

Focusing on compound nouns (CN), we then verify in a longitudinal study if there are differences in the distribution and compositionality of CNs in child-directed and child-produced sentences across ages.

Language Acquisition

A large scale annotated child language construction database

no code implementations LREC 2012 Aline Villavicencio, Beracah Yankama, Marco Idiart, Robert Berwick

This paper describes such an initiative for combining information from various sources to extend the annotation of the English CHILDES corpora with linguistic, psycholinguistic and distributional information, along with an example illustrating an application of this approach to the extraction of verb alternation information.

Language Acquisition POS +1

Cannot find the paper you are looking for? You can Submit a new open access paper.