Search Results for author: Simone Paolo Ponzetto

Found 79 papers, 35 papers with code

The Robotic Surgery Procedural Framebank

1 code implementation LREC 2022 Marco Bombieri, Marco Rospocher, Simone Paolo Ponzetto, Paolo Fiorini

Robot-Assisted minimally invasive robotic surgery is the gold standard for the surgical treatment of many pathological conditions, and several manuals and academic papers describe how to perform these interventions.

Natural Language Understanding Semantic Parsing +1

FrameASt: A Framework for Second-level Agenda Setting in Parliamentary Debates through the Lense of Comparative Agenda Topics

1 code implementation ParlaCLARIN (LREC) 2022 Christopher Klamm, Ines Rehbein, Simone Paolo Ponzetto

In addition, we present a new annotated data set of parliamentary debates, following the coding schema of policy topics developed in the Comparative Agendas Project (CAP), and release models for topic classification in parliamentary debates.

Topic Classification

Come hither or go away? Recognising pre-electoral coalition signals in the news

no code implementations EMNLP 2021 Ines Rehbein, Simone Paolo Ponzetto, Anna Adendorf, Oke Bahnsen, Lukas Stoetzer, Heiner Stuckenschmidt

In this paper, we introduce the task of political coalition signal prediction from text, that is, the task of recognizing from the news coverage leading up to an election the (un)willingness of political parties to form a government coalition.

Multi-Task Learning

DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ

1 code implementation24 May 2024 Jonas Belouadi, Simone Paolo Ponzetto, Steffen Eger

Creating high-quality scientific figures can be time-consuming and challenging, even though sketching ideas on paper is relatively easy.

Language Modelling

ROUGE-K: Do Your Summaries Have Keywords?

1 code implementation8 Mar 2024 Sotaro Takeshita, Simone Paolo Ponzetto, Kai Eckert

Keywords, that is, content-relevant words in summaries play an important role in efficient information conveyance, making it critical to assess if system-generated summaries contain such informative words during evaluation.

Extreme Summarization

Do LLMs Dream of Ontologies?

1 code implementation26 Jan 2024 Marco Bombieri, Paolo Fiorini, Simone Paolo Ponzetto, Marco Rospocher

Large language models (LLMs) have recently revolutionized automated text understanding and generation.

Memorization

Multi-Source (Pre-)Training for Cross-Domain Measurement, Unit and Context Extraction

1 code implementation5 Aug 2023 Yueling Li, Sebastian Martschat, Simone Paolo Ponzetto

We present a cross-domain approach for automated measurement and context extraction based on pre-trained language models.

Domain Generalization

Can Demographic Factors Improve Text Classification? Revisiting Demographic Adaptation in the Age of Transformers

1 code implementation13 Oct 2022 Chia-Chien Hung, Anne Lauscher, Dirk Hovy, Simone Paolo Ponzetto, Goran Glavaš

Previous work showed that incorporating demographic factors can consistently improve performance for various NLP tasks with traditional NLP models.

Language Modelling Multi-Task Learning +2

Towards Automated Survey Variable Search and Summarization in Social Science Publications

no code implementations14 Sep 2022 Yavuz Selim Kartal, Sotaro Takeshita, Tornike Tsereteli, Kai Eckert, Henning Kroll, Philipp Mayr, Simone Paolo Ponzetto, Benjamin Zapilko, Andrea Zielinski

Nowadays there is a growing trend in many scientific disciplines to support researchers by providing enhanced information access through linking of publications and underlying datasets, so as to support research with infrastructure to enhance reproducibility and reusability of research results.

Variable Detection

Massively Multilingual Lexical Specialization of Multilingual Transformers

no code implementations1 Aug 2022 Tommaso Green, Simone Paolo Ponzetto, Goran Glavaš

While pretrained language models (PLMs) primarily serve as general-purpose text encoders that can be fine-tuned for a wide variety of downstream tasks, recent work has shown that they can also be rewired to produce high-quality word representations (i. e., static word embeddings) and yield good performance in type-level lexical tasks.

Bilingual Lexicon Induction Retrieval +5

On the Limitations of Sociodemographic Adaptation with Transformers

1 code implementation1 Aug 2022 Chia-Chien Hung, Anne Lauscher, Dirk Hovy, Simone Paolo Ponzetto, Goran Glavaš

We adapt the language representations for the sociodemographic dimensions of gender and age, using continuous language modeling and dynamic multi-task learning for adaptation, where we couple language modeling with the prediction of a sociodemographic class.

Language Modelling Multi-Task Learning

X-SCITLDR: Cross-Lingual Extreme Summarization of Scholarly Documents

1 code implementation30 May 2022 Sotaro Takeshita, Tommaso Green, Niklas Friedrich, Kai Eckert, Simone Paolo Ponzetto

The number of scientific publications nowadays is rapidly increasing, causing information overload for researchers and making it hard for scholars to keep up to date with current trends and lines of work.

Extreme Summarization Machine Translation +1

Multi2WOZ: A Robust Multilingual Dataset and Conversational Pretraining for Task-Oriented Dialog

1 code implementation NAACL 2022 Chia-Chien Hung, Anne Lauscher, Ivan Vulić, Simone Paolo Ponzetto, Goran Glavaš

We then introduce a new framework for multilingual conversational specialization of pretrained language models (PrLMs) that aims to facilitate cross-lingual transfer for arbitrary downstream TOD tasks.

Cross-Lingual Transfer dialog state tracking +1

Fair and Argumentative Language Modeling for Computational Argumentation

1 code implementation ACL 2022 Carolin Holtermann, Anne Lauscher, Simone Paolo Ponzetto

We employ our resource to assess the effect of argumentative fine-tuning and debiasing on the intrinsic bias found in transformer-based language models using a lightweight adapter-based approach that is more sustainable and parameter-efficient than full fine-tuning.

Language Modelling

PET: An Annotated Dataset for Process Extraction from Natural Language Text

no code implementations9 Mar 2022 Patrizio Bellan, Han van der Aa, Mauro Dragoni, Chiara Ghidini, Simone Paolo Ponzetto

Therefore, to bridge this gap, we present the PET dataset, a first corpus of business process descriptions annotated with activities, gateways, actors, and flow information.

On Cross-Lingual Retrieval with Multilingual Text Encoders

1 code implementation21 Dec 2021 Robert Litschko, Ivan Vulić, Simone Paolo Ponzetto, Goran Glavaš

In this work we present a systematic empirical study focused on the suitability of the state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks across a number of diverse language pairs.

Re-Ranking Retrieval +3

DS-TOD: Efficient Domain Specialization for Task Oriented Dialog

1 code implementation15 Oct 2021 Chia-Chien Hung, Anne Lauscher, Simone Paolo Ponzetto, Goran Glavaš

Recent work has shown that self-supervised dialog-specific pretraining on large conversational datasets yields substantial gains over traditional language modeling (LM) pretraining in downstream task-oriented dialog (TOD).

dialog state tracking Language Modelling +2

Process Extraction from Text: Benchmarking the State of the Art and Paving the Way for Future Challenges

2 code implementations7 Oct 2021 Patrizio Bellan, Mauro Dragoni, Chiara Ghidini, Han van der Aa, Simone Paolo Ponzetto

The extraction of process models from text refers to the problem of turning the information contained in an unstructured textual process descriptions into a formal representation, i. e., a process model.

Benchmarking Model extraction +1

Diachronic Analysis of German Parliamentary Proceedings: Ideological Shifts through the Lens of Political Biases

1 code implementation13 Aug 2021 Tobias Walter, Celina Kirschner, Steffen Eger, Goran Glavaš, Anne Lauscher, Simone Paolo Ponzetto

We analyze bias in historical corpora as encoded in diachronic distributional semantic models by focusing on two specific forms of bias, namely a political (i. e., anti-communism) and racist (i. e., antisemitism) one.

Diachronic Word Embeddings Word Embeddings

Large-scale Taxonomy Induction Using Entity and Word Embeddings

no code implementations4 May 2021 Petar Ristoski, Stefano Faralli, Simone Paolo Ponzetto, Heiko Paulheim

Taxonomies are an important ingredient of knowledge organization, and serve as a backbone for more sophisticated knowledge representations in intelligent systems, such as formal ontologies.

Word Embeddings

FakeFlow: Fake News Detection by Modeling the Flow of Affective Information

1 code implementation EACL 2021 Bilal Ghanem, Simone Paolo Ponzetto, Paolo Rosso, Francisco Rangel

To capture this, we propose in this paper to model the flow of affective information in fake news articles using a neural architecture.

Fake News Detection

Evaluating Multilingual Text Encoders for Unsupervised Cross-Lingual Retrieval

1 code implementation21 Jan 2021 Robert Litschko, Ivan Vulić, Simone Paolo Ponzetto, Goran Glavaš

Therefore, in this work we present a systematic empirical study focused on the suitability of the state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks across a large number of language pairs.

Cross-Lingual Word Embeddings Representation Learning +4

Self-Supervised Learning for Visual Summary Identification in Scientific Publications

no code implementations21 Dec 2020 Shintaro Yamamoto, Anne Lauscher, Simone Paolo Ponzetto, Goran Glavaš, Shigeo Morishima

Providing visual summaries of scientific publications can increase information access for readers and thereby help deal with the exponential growth in the number of scientific publications.

Self-Supervised Learning

SemEval-2020 Task 2: Predicting Multilingual and Cross-Lingual (Graded) Lexical Entailment

no code implementations SEMEVAL 2020 Goran Glava{\v{s}}, Ivan Vuli{\'c}, Anna Korhonen, Simone Paolo Ponzetto

The shared task spans three dimensions: (1) monolingual vs. cross-lingual LE, (2) binary vs. graded LE, and (3) a set of 6 diverse languages (and 15 corresponding language pairs).

Lexical Entailment Natural Language Inference +1

AraWEAT: Multidimensional Analysis of Biases in Arabic Word Embeddings

no code implementations COLING (WANLP) 2020 Anne Lauscher, Rafik Takieddin, Simone Paolo Ponzetto, Goran Glavaš

Our analysis yields several interesting findings, e. g., that implicit gender bias in embeddings trained on Arabic news corpora steadily increases over time (between 2007 and 2017).

Word Embeddings

Word Sense Disambiguation for 158 Languages using Word Embeddings Only

no code implementations LREC 2020 Varvara Logacheva, Denis Teslenko, Artem Shelmanov, Steffen Remus, Dmitry Ustalov, Andrey Kutuzov, Ekaterina Artemova, Chris Biemann, Simone Paolo Ponzetto, Alexander Panchenko

We use this method to induce a collection of sense inventories for 158 languages on the basis of the original pre-trained fastText word embeddings by Grave et al. (2018), enabling WSD in these languages.

Word Embeddings Word Sense Disambiguation

Policy Preference Detection in Parliamentary Debate Motions

no code implementations CONLL 2019 Gavin Abercrombie, Federico Nanni, Riza Batista-Navarro, Simone Paolo Ponzetto

Debate motions (proposals) tabled in the UK Parliament contain information about the stated policy preferences of the Members of Parliament who propose them, and are key to the analysis of all subsequent speeches given in response to them.

General Classification

FacTweet: Profiling Fake News Twitter Accounts

no code implementations15 Oct 2019 Bilal Ghanem, Simone Paolo Ponzetto, Paolo Rosso

We present an approach to detect fake news in Twitter at the account level using a neural recurrent model and a variety of different semantic and stylistic features.

A General Framework for Implicit and Explicit Debiasing of Distributional Word Vector Spaces

4 code implementations13 Sep 2019 Anne Lauscher, Goran Glavaš, Simone Paolo Ponzetto, Ivan Vulić

Moreover, we successfully transfer debiasing models, by means of cross-lingual embedding spaces, and remove or attenuate biases in distributional word vector spaces of languages that lack readily available bias specifications.

Word Embeddings

Multilingual and Cross-Lingual Graded Lexical Entailment

no code implementations ACL 2019 Ivan Vuli{\'c}, Simone Paolo Ponzetto, Goran Glava{\v{s}}

Starting from HyperLex, the only available GR-LE dataset in English, we construct new monolingual GR-LE datasets for three other languages, and combine those to create a set of six cross-lingual GR-LE datasets termed CL-HYPERLEX.

Lexical Entailment

Computational Analysis of Political Texts: Bridging Research Efforts Across Communities

no code implementations ACL 2019 Goran Glava{\v{s}}, Federico Nanni, Simone Paolo Ponzetto

Political scientists created resources and used available NLP methods to process textual data largely in isolation from the NLP community.

Stance Detection

Unmasking Bias in News

no code implementations11 Jun 2019 Javier Sánchez-Junquera, Paolo Rosso, Manuel Montes-y-Gómez, Simone Paolo Ponzetto

We present experiments on detecting hyperpartisanship in news using a 'masking' method that allows us to assess the role of style vs. content for the task at hand.

HHMM at SemEval-2019 Task 2: Unsupervised Frame Induction using Contextualized Word Embeddings

1 code implementation SEMEVAL 2019 Saba Anwar, Dmitry Ustalov, Nikolay Arefyev, Simone Paolo Ponzetto, Chris Biemann, Alexander Panchenko

We present our system for semantic frame induction that showed the best performance in Subtask B. 1 and finished as the runner-up in Subtask A of the SemEval 2019 Task 2 on unsupervised semantic frame induction (QasemiZadeh et al., 2019).

Clustering Task 2 +1

Knowledge-rich Image Gist Understanding Beyond Literal Meaning

no code implementations18 Apr 2019 Lydia Weiland, Ioana Hulpus, Simone Paolo Ponzetto, Wolfgang Effelsberg, Laura Dietz

We investigate the problem of understanding the message (gist) conveyed by images and their captions as found, for instance, on websites or news articles.

Political Text Scaling Meets Computational Semantics

2 code implementations12 Apr 2019 Federico Nanni, Goran Glavas, Ines Rehbein, Simone Paolo Ponzetto, Heiner Stuckenschmidt

During the last fifteen years, automatic text scaling has become one of the key tools of the Text as Data community in political science.

feature selection

Event-based Access to Historical Italian War Memoirs

no code implementations8 Apr 2019 Marco Rovera, Federico Nanni, Simone Paolo Ponzetto

The progressive digitization of historical archives provides new, often domain specific, textual resources that report on facts and events which have happened in the past; among these, memoirs are a very common type of primary source.

An Argument-Annotated Corpus of Scientific Publications

no code implementations WS 2018 Anne Lauscher, Goran Glava{\v{s}}, Simone Paolo Ponzetto

We analyze the annotated argumentative structures and investigate the relations between argumentation and other rhetorical aspects of scientific writing, such as discourse roles and citation contexts.

Argument Mining

Unsupervised Sense-Aware Hypernymy Extraction

1 code implementation17 Sep 2018 Dmitry Ustalov, Alexander Panchenko, Chris Biemann, Simone Paolo Ponzetto

In this paper, we show how unsupervised sense representations can be used to improve hypernymy extraction.

Watset: Local-Global Graph Clustering with Applications in Sense and Frame Induction

2 code implementations CL 2019 Dmitry Ustalov, Alexander Panchenko, Chris Biemann, Simone Paolo Ponzetto

We present a detailed theoretical and computational analysis of the Watset meta-algorithm for fuzzy graph clustering, which has been found to be widely applicable in a variety of domains.

Clustering Graph Clustering

Unsupervised Semantic Frame Induction using Triclustering

1 code implementation ACL 2018 Dmitry Ustalov, Alexander Panchenko, Andrei Kutuzov, Chris Biemann, Simone Paolo Ponzetto

We use dependency triples automatically extracted from a Web-scale corpus to perform unsupervised semantic frame induction.

Clustering

Unsupervised Cross-Lingual Information Retrieval using Monolingual Data Only

1 code implementation2 May 2018 Robert Litschko, Goran Glavaš, Simone Paolo Ponzetto, Ivan Vulić

We propose a fully unsupervised framework for ad-hoc cross-lingual information retrieval (CLIR) which requires no bilingual data at all.

Cross-Lingual Information Retrieval Retrieval

Enriching Frame Representations with Distributionally Induced Senses

no code implementations LREC 2018 Stefano Faralli, Alexander Panchenko, Chris Biemann, Simone Paolo Ponzetto

We introduce a new lexical resource that enriches the Framester knowledge graph, which links Framnet, WordNet, VerbNet and other resources, with semantic features from text corpora.

A Resource-Light Method for Cross-Lingual Semantic Textual Similarity

1 code implementation19 Jan 2018 Goran Glavaš, Marc Franco-Salvador, Simone Paolo Ponzetto, Paolo Rosso

In contrast, we propose an unsupervised and a very resource-light approach for measuring semantic similarity between texts in different languages.

Cross-Lingual Information Retrieval Cross-Lingual Semantic Textual Similarity +9

A Framework for Enriching Lexical Semantic Resources with Distributional Semantics

no code implementations23 Dec 2017 Chris Biemann, Stefano Faralli, Alexander Panchenko, Simone Paolo Ponzetto

While both kinds of semantic resources are available with high lexical coverage, our aligned resource combines the domain specificity and availability of contextual information from distributional models with the conciseness and high quality of manually crafted lexical networks.

Specificity Word Sense Disambiguation

Building a Web-Scale Dependency-Parsed Corpus from CommonCrawl

no code implementations LREC 2018 Alexander Panchenko, Eugen Ruppert, Stefano Faralli, Simone Paolo Ponzetto, Chris Biemann

We present DepCC, the largest-to-date linguistically analyzed corpus in English including 365 million documents, composed of 252 billion tokens and 7. 5 billion of named entity occurrences in 14. 3 billion sentences from a web-scale crawl of the \textsc{Common Crawl} project.

Open Information Extraction Question Answering +1

Effects of Lexical Properties on Viewing Time per Word in Autistic and Neurotypical Readers

no code implementations WS 2017 Sanja {\v{S}}tajner, Victoria Yaneva, Ruslan Mitkov, Simone Paolo Ponzetto

Eye tracking studies from the past few decades have shaped the way we think of word complexity and cognitive load: words that are long, rare and ambiguous are more difficult to read.

Lexical Simplification

Topic-Based Agreement and Disagreement in US Electoral Manifestos

no code implementations EMNLP 2017 Stefano Menini, Federico Nanni, Simone Paolo Ponzetto, Sara Tonelli

We present a topic-based analysis of agreement and disagreement in political manifestos, which relies on a new method for topic detection based on key concept clustering.

Clustering

Cross-Lingual Classification of Topics in Political Texts

no code implementations WS 2017 Goran Glava{\v{s}}, Federico Nanni, Simone Paolo Ponzetto

In this paper, we propose an approach for cross-lingual topical coding of sentences from electoral manifestos of political parties in different languages.

General Classification Text Classification +2

Unsupervised, Knowledge-Free, and Interpretable Word Sense Disambiguation

1 code implementation EMNLP 2017 Alexander Panchenko, Fide Marten, Eugen Ruppert, Stefano Faralli, Dmitry Ustalov, Simone Paolo Ponzetto, Chris Biemann

In word sense disambiguation (WSD), knowledge-based systems tend to be much more interpretable than knowledge-free counterparts as they rely on the wealth of manually-encoded elements representing word senses, such as hypernyms, usage examples, and images.

Word Sense Disambiguation

Exploring Neural Text Simplification Models

1 code implementation ACL 2017 Sergiu Nisioi, Sanja {\v{S}}tajner, Simone Paolo Ponzetto, Liviu P. Dinu

Unlike the previously proposed automated TS systems, our neural text simplification (NTS) systems are able to simultaneously perform lexical simplification and content reduction.

Lexical Simplification Machine Translation +2

Using Linked Disambiguated Distributional Networks for Word Sense Disambiguation

no code implementations WS 2017 Alex Panchenko, er, Stefano Faralli, Simone Paolo Ponzetto, Chris Biemann

We introduce a new method for unsupervised knowledge-based word sense disambiguation (WSD) based on a resource that links two types of sense-aware lexical networks: one is induced from a corpus using distributional semantics, the other is manually constructed.

Machine Translation Translation +2

Improving Neural Knowledge Base Completion with Cross-Lingual Projections

no code implementations EACL 2017 Patrick Klein, Simone Paolo Ponzetto, Goran Glava{\v{s}}

We exploit multilingual synsets from BabelNet to translate English triples to other languages and then augment the reference knowledge base with cross-lingual triples.

Knowledge Base Completion Link Prediction +2

Unsupervised Cross-Lingual Scaling of Political Texts

1 code implementation EACL 2017 Goran Glava{\v{s}}, Federico Nanni, Simone Paolo Ponzetto

Political text scaling aims to linearly order parties and politicians across political dimensions (e. g., left-to-right ideology) based on textual content (e. g., politician speeches or party manifestos).

Unsupervised Does Not Mean Uninterpretable: The Case for Word Sense Induction and Disambiguation

no code implementations EACL 2017 Alex Panchenko, er, Eugen Ruppert, Stefano Faralli, Simone Paolo Ponzetto, Chris Biemann

On the example of word sense induction and disambiguation (WSID), we show that it is possible to develop an interpretable model that matches the state-of-the-art models in accuracy.

Word Embeddings Word Sense Induction

A Large DataBase of Hypernymy Relations Extracted from the Web.

no code implementations LREC 2016 Julian Seitner, Christian Bizer, Kai Eckert, Stefano Faralli, Robert Meusel, Heiko Paulheim, Simone Paolo Ponzetto

Hypernymy relations (those where an hyponym term shares a {``}isa{''} relationship with his hypernym) play a key role for many Natural Language Processing (NLP) tasks, e. g. ontology learning, automatically building or extending knowledge bases, or word sense disambiguation and induction.

Word Sense Disambiguation

DBpedia Domains: augmenting DBpedia with domain information

no code implementations LREC 2014 Gregor Titze, Volha Bryl, C{\"a}cilia Zirn, Simone Paolo Ponzetto

We present an approach for augmenting DBpedia, a very large ontology lying at the heart of the Linked Open Data (LOD) cloud, with domain information.

Clustering Open-Domain Question Answering

Cannot find the paper you are looking for? You can Submit a new open access paper.