Search Results for author: Ion Androutsopoulos

Found 63 papers, 25 papers with code

Context Sensitivity Estimation in Toxicity Detection

no code implementations ACL (WOAH) 2021 Alexandros Xenos, John Pavlopoulos, Ion Androutsopoulos

We introduce a new task, context-sensitivity estimation, which aims to identify posts whose perceived toxicity changes if the context (previous post) is also considered.

MultiEURLEX - A multi-lingual and multi-label legal document classification dataset for zero-shot cross-lingual transfer

1 code implementation EMNLP 2021 Ilias Chalkidis, Manos Fergadiotis, Ion Androutsopoulos

We use the dataset as a testbed for zero-shot cross-lingual transfer, where we exploit annotated training documents in one language (source) to classify documents in another language (target).

Document Classification Topic Classification +1

From the Detection of Toxic Spans in Online Discussions to the Analysis of Toxic-to-Civil Transfer

1 code implementation ACL 2022 John Pavlopoulos, Leo Laugier, Alexandros Xenos, Jeffrey Sorensen, Ion Androutsopoulos

We study the task of toxic spans detection, which concerns the detection of the spans that make a text toxic, when detecting such spans is possible.

Toxic Spans Detection

Cache me if you Can: an Online Cost-aware Teacher-Student framework to Reduce the Calls to Large Language Models

1 code implementation20 Oct 2023 Ilias Stogiannidis, Stavros Vassos, Prodromos Malakasiotis, Ion Androutsopoulos

We propose a framework that allows reducing the calls to LLMs by caching previous LLM responses and using them to train a local inexpensive model on the SME side.

Intent Detection Intent Recognition +1

Processing Long Legal Documents with Pre-trained Transformers: Modding LegalBERT and Longformer

no code implementations2 Nov 2022 Dimitris Mamakas, Petros Tsotsi, Ion Androutsopoulos, Ilias Chalkidis

Even sparse-attention models, such as Longformer and BigBird, which increase the maximum input length to 4, 096 sub-words, severely truncate texts in three of the six datasets of LexGLUE.

Document Classification

Data Augmentation for Biomedical Factoid Question Answering

1 code implementation BioNLP (ACL) 2022 Dimitris Pappas, Prodromos Malakasiotis, Ion Androutsopoulos

We study the effect of seven data augmentation (da) methods in factoid question answering, focusing on the biomedical domain, where obtaining training instances is particularly difficult.

Data Augmentation Information Retrieval +8

FiNER: Financial Numeric Entity Recognition for XBRL Tagging

1 code implementation ACL 2022 Lefteris Loukas, Manos Fergadiotis, Ilias Chalkidis, Eirini Spyropoulou, Prodromos Malakasiotis, Ion Androutsopoulos, Georgios Paliouras

We, therefore, introduce XBRL tagging as a new entity extraction task for the financial domain and release FiNER-139, a dataset of 1. 1M sentences with gold XBRL tags.

TAG

Restoring and attributing ancient texts using deep neural networks

2 code implementations Nature 2022 Yannis Assael, Thea Sommerschield, Brendan Shillingford, Mahyar Bordbar, John Pavlopoulos, Marita Chatzipanagiotou, Ion Androutsopoulos, Jonathan Prag, Nando de Freitas

Ithaca can attribute inscriptions to their original location with an accuracy of 71% and can date them to less than 30 years of their ground-truth ranges, redating key texts of Classical Athens and contributing to topical debates in ancient history.

Ancient Text Restoration Attribute

MAGNEx: A Model Agnostic Global Neural Explainer

no code implementations29 Sep 2021 Nikolaos Manginas, Prodromos Malakasiotis, Eirini Spyropoulou, Ion Androutsopoulos, Georgios Paliouras

Black-box decision models have been widely adopted both in industry and academia due to their excellent performance across many challenging tasks and domains.

Computational Efficiency

MultiEURLEX -- A multi-lingual and multi-label legal document classification dataset for zero-shot cross-lingual transfer

1 code implementation2 Sep 2021 Ilias Chalkidis, Manos Fergadiotis, Ion Androutsopoulos

We use the dataset as a testbed for zero-shot cross-lingual transfer, where we exploit annotated training documents in one language (source) to classify documents in another language (target).

Document Classification Topic Classification +1

SemEval-2021 Task 5: Toxic Spans Detection

no code implementations SEMEVAL 2021 John Pavlopoulos, Jeffrey Sorensen, L{\'e}o Laugier, Ion Androutsopoulos

For the supervised sequence labeling approach and evaluation purposes, posts previously labeled as toxic were crowd-annotated for toxic spans.

Toxic Spans Detection

A Neural Model for Joint Document and Snippet Ranking in Question Answering for Large Document Collections

no code implementations ACL 2021 Dimitris Pappas, Ion Androutsopoulos

To test our key findings on another dataset, we modified the Natural Questions dataset so that it can also be used for document and snippet retrieval.

Natural Questions Question Answering +1

Diagnostic Captioning: A Survey

no code implementations18 Jan 2021 John Pavlopoulos, Vasiliki Kougia, Ion Androutsopoulos, Dimitris Papamichail

Diagnostic Captioning (DC) concerns the automatic generation of a diagnostic text from a set of medical images of a patient collected during an examination.

Image Captioning

Neural Contract Element Extraction Revisited: Letters from Sesame Street

no code implementations12 Jan 2021 Ilias Chalkidis, Manos Fergadiotis, Prodromos Malakasiotis, Ion Androutsopoulos

Morpho-syntactic features in the form of POS tag and token shape embeddings, as well as context-aware ELMO embeddings do not improve performance.

POS TAG

BIOMRC: A Dataset for Biomedical Machine Reading Comprehension

1 code implementation WS 2020 Petros Stavropoulos, Dimitris Pappas, Ion Androutsopoulos, Ryan Mcdonald

Non-expert human performance is also higher on the new dataset compared to BIOREAD, and biomedical experts perform even better.

Machine Reading Comprehension

SumQE: a BERT-based Summary Quality Estimation Model

1 code implementation2 Sep 2019 Stratos Xenouleas, Prodromos Malakasiotis, Marianna Apidianaki, Ion Androutsopoulos

We propose SumQE, a novel Quality Estimation model for summarization based on BERT.

Transfer Learning for Causal Sentence Detection

2 code implementations WS 2019 Manolis Kyriakakis, Ion Androutsopoulos, Joan Ginés i Ametllé, Artur Saudabayev

We consider the task of detecting sentences that express causality, as a step towards mining causal relations from texts.

Relation Relation Extraction +2

ConvAI at SemEval-2019 Task 6: Offensive Language Identification and Categorization with Perspective and BERT

no code implementations SEMEVAL 2019 John Pavlopoulos, Nithum Thain, Lucas Dixon, Ion Androutsopoulos

This paper presents the application of two strong baseline systems for toxicity detection and evaluates their performance in identifying and categorizing offensive language in social media.

Language Identification

A Survey on Biomedical Image Captioning

2 code implementations WS 2019 Vasiliki Kougia, John Pavlopoulos, Ion Androutsopoulos

Image captioning applied to biomedical images can assist and accelerate the diagnosis process followed by clinicians.

Image Captioning

SEQ^3: Differentiable Sequence-to-Sequence-to-Sequence Autoencoder for Unsupervised Abstractive Sentence Compression

1 code implementation7 Apr 2019 Christos Baziotis, Ion Androutsopoulos, Ioannis Konstas, Alexandros Potamianos

The proposed model does not require parallel text-summary pairs, achieving promising results in unsupervised sentence compression on benchmark datasets.

Language Modelling Sentence +1

Extracting Linguistic Resources from the Web for Concept-to-Text Generation

no code implementations31 Oct 2018 Gerasimos Lampouras, Ion Androutsopoulos

Many concept-to-text generation systems require domain-specific linguistic resources to produce high quality texts, but manually constructing these resources can be tedious and costly.

Concept-To-Text Generation Sentence

Generating Texts with Integer Linear Programming

no code implementations31 Oct 2018 Gerasimos Lampouras, Ion Androutsopoulos

Content selection, for example, may greedily select the most important facts, which may require, however, too many words to express, and this may be undesirable when space is limited or expensive.

Concept-To-Text Generation Referring Expression +2

AUEB at BioASQ 6: Document and Snippet Retrieval

1 code implementation WS 2018 Georgios-Ioannis Brokos, Polyvios Liosis, Ryan Mcdonald, Dimitris Pappas, Ion Androutsopoulos

We present AUEB's submissions to the BioASQ 6 document and snippet retrieval tasks (parts of Task 6b, Phase A).

Retrieval

Deep Relevance Ranking Using Enhanced Document-Query Interactions

1 code implementation EMNLP 2018 Ryan McDonald, Georgios-Ioannis Brokos, Ion Androutsopoulos

We explore several new models for document relevance ranking, building upon the Deep Relevance Matching Model (DRMM) of Guo et al. (2016).

Ad-Hoc Information Retrieval Question Answering

Deeper Attention to Abusive User Content Moderation

no code implementations EMNLP 2017 John Pavlopoulos, Prodromos Malakasiotis, Ion Androutsopoulos

Experimenting with a new dataset of 1. 6M user comments from a news portal and an existing dataset of 115K Wikipedia talk page comments, we show that an RNN operating on word embeddings outpeforms the previous state of the art in moderation, which used logistic regression or an MLP classifier with character or word n-grams.

regression Word Embeddings

Improved Abusive Comment Moderation with User Embeddings

no code implementations WS 2017 John Pavlopoulos, Prodromos Malakasiotis, Juli Bakagianni, Ion Androutsopoulos

Experimenting with a dataset of approximately 1. 6M user comments from a Greek news sports portal, we explore how a state of the art RNN-based moderation method can be improved by adding user embeddings, user type embeddings, user biases, or user type biases.

Vocal Bursts Type Prediction

Deep Learning for User Comment Moderation

no code implementations WS 2017 John Pavlopoulos, Prodromos Malakasiotis, Ion Androutsopoulos

We also compare against a CNN and a word-list baseline, considering both fully automatic and semi-automatic moderation.

General Classification

LSHTC: A Benchmark for Large-Scale Text Classification

no code implementations30 Mar 2015 Ioannis Partalas, Aris Kosmopoulos, Nicolas Baskiotis, Thierry Artieres, George Paliouras, Eric Gaussier, Ion Androutsopoulos, Massih-Reza Amini, Patrick Galinari

LSHTC is a series of challenges which aims to assess the performance of classification systems in large-scale classification in a a large number of classes (up to hundreds of thousands).

General Classification text-classification +1

Generating Natural Language Descriptions from OWL Ontologies: the NaturalOWL System

no code implementations24 Apr 2014 Ion Androutsopoulos, Gerasimos Lampouras, Dimitrios Galanis

We present NaturalOWL, a natural language generation system that produces texts describing individuals or classes of OWL ontologies.

Sentence Text Generation

A Survey of Paraphrasing and Textual Entailment Methods

no code implementations18 Dec 2009 Ion Androutsopoulos, Prodromos Malakasiotis

Paraphrasing methods recognize, generate, or extract phrases, sentences, or longer natural language expressions that convey almost the same information.

Machine Translation Natural Language Inference +3

Cannot find the paper you are looking for? You can Submit a new open access paper.