Search Results for author: Massimo Poesio

Found 66 papers, 12 papers with code

Hard and Soft Evaluation of NLP models with BOOtSTrap SAmpling - BooStSa

no code implementations ACL 2022 Tommaso Fornaciari, Alexandra Uma, Massimo Poesio, Dirk Hovy

Natural Language Processing (NLP) ‘s applied nature makes it necessary to select the most effective and robust models.

Experimental Design

Patterns of Polysemy and Homonymy in Contextualised Language Models

no code implementations Findings (EMNLP) 2021 Janosch Haber, Massimo Poesio

One of the central aspects of contextualised language models is that they should be able to distinguish the meaning of lexically ambiguous words by their contexts.

The CODI-CRAC 2022 Shared Task on Anaphora, Bridging, and Discourse Deixis in Dialogue

no code implementations COLING (CODI, CRAC) 2022 Juntao Yu, Sopan Khosla, Ramesh Manuvinakurike, Lori Levin, Vincent Ng, Massimo Poesio, Michael Strube, Carolyn Rosé

The CODI-CRAC 2022 Shared Task on Anaphora Resolution in Dialogues is the second edition of an initiative focused on detecting different types of anaphoric relations in conversations of different kinds.

The Universal Anaphora Scorer

no code implementations LREC 2022 Juntao Yu, Sopan Khosla, Nafise Sadat Moosavi, Silviu Paun, Sameer Pradhan, Massimo Poesio

It also supports the evaluation of split antecedent anaphora and discourse deixis, for which no tools existed.

ArMIS - The Arabic Misogyny and Sexism Corpus with Annotator Subjective Disagreements

no code implementations LREC 2022 Dina Almanea, Massimo Poesio

The use of misogynistic and sexist language has increased in recent years in social media, and is increasing in the Arabic world in reaction to reforms attempting to remove restrictions on women lives.

We Need to Consider Disagreement in Evaluation

no code implementations ACL (BPPF) 2021 Valerio Basile, Michael Fell, Tommaso Fornaciari, Dirk Hovy, Silviu Paun, Barbara Plank, Massimo Poesio, Alexandra Uma

Instead, we suggest that we need to better capture the sources of disagreement to improve today’s evaluation practice.

Anaphoric Zero Pronoun Identification: A Multilingual Approach

no code implementations COLING (CRAC) 2020 Abdulrahman Aloraini, Massimo Poesio

We propose a BERT-based multilingual model for AZP identification from predicted zero pronoun positions, and evaluate it on the Arabic and Chinese portions of OntoNotes 5. 0.

Transfer Learning

Word Sense Distance in Human Similarity Judgements and Contextualised Word Embeddings

no code implementations PaM 2020 Janosch Haber, Massimo Poesio

Homonymy is often used to showcase one of the advantages of context-sensitive word embedding techniques such as ELMo and BERT.

Word Embeddings

Large Language Models as Minecraft Agents

no code implementations13 Feb 2024 Chris Madge, Massimo Poesio

In this work we examine the use of Large Language Models (LLMs) in the challenging setting of acting as a Minecraft agent.

SemEval-2023 Task 11: Learning With Disagreements (LeWiDi)

no code implementations28 Apr 2023 Elisa Leonardelli, Alexandra Uma, Gavin Abercrombie, Dina Almanea, Valerio Basile, Tommaso Fornaciari, Barbara Plank, Verena Rieser, Massimo Poesio

We report on the second LeWiDi shared task, which differs from the first edition in three crucial respects: (i) it focuses entirely on NLP, instead of both NLP and computer vision tasks in its first edition; (ii) it focuses on subjective tasks, instead of covering different types of disagreements-as training with aggregated labels for subjective NLP tasks is a particularly obvious misrepresentation of the data; and (iii) for the evaluation, we concentrate on soft approaches to evaluation.

Sentiment Analysis

Joint Coreference Resolution for Zeros and non-Zeros in Arabic

no code implementations21 Oct 2022 Abdulrahman Aloraini, Sameer Pradhan, Massimo Poesio

Most existing proposals about anaphoric zero pronoun (AZP) resolution regard full mention coreference and AZP resolution as two independent tasks, even though the two tasks are clearly related.


Aggregating Crowdsourced and Automatic Judgments to Scale Up a Corpus of Anaphoric Reference for Fiction and Wikipedia Texts

no code implementations11 Oct 2022 Juntao Yu, Silviu Paun, Maris Camilleri, Paloma Carretero Garcia, Jon Chamberlain, Udo Kruschwitz, Massimo Poesio

Although several datasets annotated for anaphoric reference/coreference exist, even the largest such datasets have limitations in terms of size, range of domains, coverage of anaphoric phenomena, and size of documents included.

Scoring Coreference Chains with Split-Antecedent Anaphors

1 code implementation24 May 2022 Silviu Paun, Juntao Yu, Nafise Sadat Moosavi, Massimo Poesio

Anaphoric reference is an aspect of language interpretation covering a variety of types of interpretation beyond the simple case of identity reference to entities introduced via nominal expressions covered by the traditional coreference task in its most recent incarnation in ONTONOTES and similar datasets.

Patterns of Lexical Ambiguity in Contextualised Language Models

no code implementations27 Sep 2021 Janosch Haber, Massimo Poesio

One of the central aspects of contextualised language models is that they should be able to distinguish the meaning of lexically ambiguous words by their contexts.

Coreference Resolution for the Biomedical Domain: A Survey

no code implementations CRAC (ACL) 2021 Pengcheng Lu, Massimo Poesio

Issues with coreference resolution are one of the most frequently mentioned challenges for information extraction from the biomedical literature.


Data Augmentation Methods for Anaphoric Zero Pronouns

no code implementations CRAC (ACL) 2021 Abdulrahman Aloraini, Massimo Poesio

In pro-drop language like Arabic, Chinese, Italian, Japanese, Spanish, and many others, unrealized (null) arguments in certain syntactic positions can refer to a previously introduced entity, and are thus called anaphoric zero pronouns.

Data Augmentation

SemEval-2021 Task 12: Learning with Disagreements

no code implementations SEMEVAL 2021 Alexandra Uma, Tommaso Fornaciari, Anca Dumitrache, Tristan Miller, Jon Chamberlain, Barbara Plank, Edwin Simpson, Massimo Poesio

Disagreement between coders is ubiquitous in virtually all datasets annotated with human judgements in both natural language processing and computer vision.

Stay Together: A System for Single and Split-antecedent Anaphora Resolution

1 code implementation NAACL 2021 Juntao Yu, Nafise Sadat Moosavi, Silviu Paun, Massimo Poesio

Split-antecedent anaphora is rarer and more complex to resolve than single-antecedent anaphora; as a result, it is not annotated in many datasets designed to test coreference, and previous work on resolving this type of anaphora was carried out in unrealistic conditions that assume gold mentions and/or gold split-antecedent anaphors are available.

Neural Coreference Resolution for Arabic

1 code implementation COLING (CRAC) 2020 Abdulrahman Aloraini, Juntao Yu, Massimo Poesio

No neural coreference resolver for Arabic exists, in fact we are not aware of any learning-based coreference resolver for Arabic since (Bjorkelund and Kuhn, 2014).


Named Entity Recognition as Dependency Parsing

1 code implementation ACL 2020 Juntao Yu, Bernd Bohnet, Massimo Poesio

Named Entity Recognition (NER) is a fundamental task in Natural Language Processing, concerned with identifying spans of text expressing references to entities.

Dependency Parsing named-entity-recognition +4

Cross-lingual Zero Pronoun Resolution

no code implementations LREC 2020 Abdulrahman Aloraini, Massimo Poesio

In languages like Arabic, Chinese, Italian, Japanese, Korean, Portuguese, Spanish, and many others, predicate arguments in certain syntactic positions are not realized instead of being realized as overt pronouns, and are thus called zero- or null-pronouns.

Machine Translation Translation

Aggregation Driven Progression System for GWAPs

no code implementations LREC 2020 Osman Doruk Kicikoglu, Richard Bartle, Jon Chamberlain, Silviu Paun, Massimo Poesio

As the uses of Games-With-A-Purpose (GWAPs) broadens, the systems that incorporate its usages have expanded in complexity.

A Cluster Ranking Model for Full Anaphora Resolution

1 code implementation LREC 2020 Juntao Yu, Alexandra Uma, Massimo Poesio

In this paper, we introduce an architecture to simultaneously identify non-referring expressions (including expletives, predicative s, and other types) and build coreference chains, including singletons.

Coreference Resolution

A Mention-Pair Model of Annotation with Nonparametric User Communities

no code implementations25 Sep 2019 Silviu Paun, Juntao Yu, Jon Chamberlain, Udo Kruschwitz, Massimo Poesio

The model is also flexible enough to be used in standard annotation tasks for classification where it registers on par performance with the state of the art.

Neural Mention Detection

1 code implementation LREC 2020 Juntao Yu, Bernd Bohnet, Massimo Poesio

We then evaluate our models for coreference resolution by using mentions predicted by our best model in start-of-the-art coreference systems.

coreference-resolution NER

Crowdsourcing and Aggregating Nested Markable Annotations

1 code implementation ACL 2019 Chris Madge, Juntao Yu, Jon Chamberlain, Udo Kruschwitz, Silviu Paun, Massimo Poesio

One of the key steps in language resource creation is the identification of the text segments to be annotated, or markables, which depending on the task may vary from nominal chunks for named entity resolution to (potentially nested) noun phrases in coreference resolution (or mentions) to larger text segments in text segmentation.

coreference-resolution Entity Resolution +1

A Crowdsourced Corpus of Multiple Judgments and Disagreement on Anaphoric Interpretation

no code implementations NAACL 2019 Massimo Poesio, Jon Chamberlain, Silviu Paun, Juntao Yu, Alex Uma, ra, Udo Kruschwitz

The corpus, containing annotations for about 108, 000 markables, is one of the largest corpora for coreference for English, and one of the largest crowdsourced NLP corpora, but its main feature is the large number of judgments per markable: 20 on average, and over 2. 2M in total.

Anaphora Resolution with the ARRAU Corpus

no code implementations WS 2018 Massimo Poesio, Yulia Grishina, Varada Kolhatkar, Nafise Moosavi, Ina Roesiger, Adam Roussel, Fabian Simonjetz, Alex Uma, ra, Olga Uryupina, Juntao Yu, Heike Zinsmeister

The most distinctive feature of the corpus is the annotation of a wide range of anaphoric relations, including bridging references and discourse deixis in addition to identity (coreference).

Comparing Bayesian Models of Annotation

no code implementations TACL 2018 Silviu Paun, Bob Carpenter, Jon Chamberlain, Dirk Hovy, Udo Kruschwitz, Massimo Poesio

We evaluate these models along four aspects: comparison to gold labels, predictive accuracy for new annotations, annotator characterization, and item difficulty, using four datasets with varying degrees of noise in the form of random (spammy) annotators.

Model Selection

Incongruent Headlines: Yet Another Way to Mislead Your Readers

no code implementations WS 2017 Sophie Chesney, Maria Liakata, Massimo Poesio, Matthew Purver

This paper discusses the problem of incongruent headlines: those which do not accurately represent the information contained in the article with which they occur.

Visually Grounded and Textual Semantic Models Differentially Decode Brain Activity Associated with Concrete and Abstract Nouns

no code implementations TACL 2017 Andrew J. Anderson, Douwe Kiela, Stephen Clark, Massimo Poesio

Dual coding theory considers concrete concepts to be encoded in the brain both linguistically and visually, and abstract concepts only linguistically.

The OnForumS corpus from the Shared Task on Online Forum Summarisation at MultiLing 2015

no code implementations LREC 2016 Mijail Kabadjov, Udo Kruschwitz, Massimo Poesio, Josef Steinberger, Jorge Valderrama, Hugo Zaragoza

In this paper we present the OnForumS corpus developed for the shared task of the same name on Online Forum Summarisation (OnForumS at MultiLing{'}15).

Phrase Detectives Corpus 1.0 Crowdsourced Anaphoric Coreference.

no code implementations LREC 2016 Jon Chamberlain, Massimo Poesio, Udo Kruschwitz

Corpora are typically annotated by several experts to create a gold standard; however, there are now compelling reasons to use a non-expert crowd to annotate text, driven by cost, speed and scalability.

text annotation

ARRAU: Linguistically-Motivated Annotation of Anaphoric Descriptions

no code implementations LREC 2016 Olga Uryupina, Ron artstein, Antonella Bristot, Federica Cavicchio, Kepa Rodriguez, Massimo Poesio

This paper presents a second release of the ARRAU dataset: a multi-domain corpus with thorough linguistically motivated annotation of anaphora and related phenomena.

Combining Minimally-supervised Methods for Arabic Named Entity Recognition

no code implementations TACL 2015 Maha Althobaiti, Udo Kruschwitz, Massimo Poesio

Supervised methods can achieve high performance on NLP tasks, such as Named Entity Recognition (NER), but new annotations are required for every new domain and/or genre change.

named-entity-recognition Named Entity Recognition +2

PR2: A Language Independent Unsupervised Tool for Personality Recognition from Text

no code implementations12 Feb 2014 Fabio Celli, Massimo Poesio

We present PR2, a personality recognition system available online, that performs instance-based classification of Big5 personality types from unstructured text, using language-independent features.

General Classification

DeCour: a corpus of DEceptive statements in Italian COURts

no code implementations LREC 2012 Tommaso Fornaciari, Massimo Poesio

In criminal proceedings, sometimes it is not easy to evaluate the sincerity of oral testimonies.

Deception Detection

Cannot find the paper you are looking for? You can Submit a new open access paper.