no code implementations • ACL (WOAH) 2021 • Alexandros Xenos, John Pavlopoulos, Ion Androutsopoulos
We introduce a new task, context-sensitivity estimation, which aims to identify posts whose perceived toxicity changes if the context (previous post) is also considered.
1 code implementation • ACL 2022 • John Pavlopoulos, Leo Laugier, Alexandros Xenos, Jeffrey Sorensen, Ion Androutsopoulos
We study the task of toxic spans detection, which concerns the detection of the spans that make a text toxic, when detecting such spans is possible.
1 code implementation • EMNLP 2021 • Ilias Chalkidis, Manos Fergadiotis, Ion Androutsopoulos
We use the dataset as a testbed for zero-shot cross-lingual transfer, where we exploit annotated training documents in one language (source) to classify documents in another language (target).
no code implementations • 2 Nov 2022 • Dimitris Mamakas, Petros Tsotsi, Ion Androutsopoulos, Ilias Chalkidis
Even sparse-attention models, such as Longformer and BigBird, which increase the maximum input length to 4, 096 sub-words, severely truncate texts in three of the six datasets of LexGLUE.
no code implementations • 8 Jun 2022 • Stratos Xenouleas, Alexia Tsoukara, Giannis Panagiotakis, Ilias Chalkidis, Ion Androutsopoulos
We consider zero-shot cross-lingual transfer in legal topic classification using the recent MultiEURLEX dataset.
1 code implementation • BioNLP (ACL) 2022 • Dimitris Pappas, Prodromos Malakasiotis, Ion Androutsopoulos
We study the effect of seven data augmentation (da) methods in factoid question answering, focusing on the biomedical domain, where obtaining training instances is particularly difficult.
1 code implementation • ACL 2022 • Lefteris Loukas, Manos Fergadiotis, Ilias Chalkidis, Eirini Spyropoulou, Prodromos Malakasiotis, Ion Androutsopoulos, Georgios Paliouras
We, therefore, introduce XBRL tagging as a new entity extraction task for the financial domain and release FiNER-139, a dataset of 1. 1M sentences with gold XBRL tags.
2 code implementations • Nature 2022 • Yannis Assael, Thea Sommerschield, Brendan Shillingford, Mahyar Bordbar, John Pavlopoulos, Marita Chatzipanagiotou, Ion Androutsopoulos, Jonathan Prag, Nando de Freitas
Ithaca can attribute inscriptions to their original location with an accuracy of 71% and can date them to less than 30 years of their ground-truth ranges, redating key texts of Classical Athens and contributing to topical debates in ancient history.
Ranked #1 on
Ancient Text Restoration
on I.PHI
no code implementations • 19 Nov 2021 • Alexandros Xenos, John Pavlopoulos, Ion Androutsopoulos, Lucas Dixon, Jeffrey Sorensen, Leo Laugier
User posts whose perceived toxicity depends on the conversational context are rare in current toxicity detection datasets.
1 code implementation • ACL 2022 • Ilias Chalkidis, Abhik Jana, Dirk Hartung, Michael Bommarito, Ion Androutsopoulos, Daniel Martin Katz, Nikolaos Aletras
Laws and their interpretations, legal arguments and agreements\ are typically expressed in writing, leading to the production of vast corpora of legal text.
Ranked #1 on
Natural Language Understanding
on LexGLUE
no code implementations • 29 Sep 2021 • Nikolaos Manginas, Prodromos Malakasiotis, Eirini Spyropoulou, Ion Androutsopoulos, Georgios Paliouras
Black-box decision models have been widely adopted both in industry and academia due to their excellent performance across many challenging tasks and domains.
2 code implementations • EMNLP (ECONLP) 2021 • Lefteris Loukas, Manos Fergadiotis, Ion Androutsopoulos, Prodromos Malakasiotis
We use EDGAR-CORPUS to train and release EDGAR-W2V, which are WORD2VEC embeddings for the financial domain.
1 code implementation • 2 Sep 2021 • Ilias Chalkidis, Manos Fergadiotis, Ion Androutsopoulos
We use the dataset as a testbed for zero-shot cross-lingual transfer, where we exploit annotated training documents in one language (source) to classify documents in another language (target).
no code implementations • SEMEVAL 2021 • John Pavlopoulos, Jeffrey Sorensen, L{\'e}o Laugier, Ion Androutsopoulos
For the supervised sequence labeling approach and evaluation purposes, posts previously labeled as toxic were crowd-annotated for toxic spans.
no code implementations • ACL 2021 • Dimitris Pappas, Ion Androutsopoulos
To test our key findings on another dataset, we modified the Natural Questions dataset so that it can also be used for document and snippet retrieval.
no code implementations • 26 May 2021 • Katerina Papantoniou, Panagiotis Papadakos, Theodore Patkos, Giorgos Flouris, Ion Androutsopoulos, Dimitris Plexousakis
Our focus is on automatic deception detection in text across cultures.
Cultural Vocal Bursts Intensity Prediction
Deception Detection
no code implementations • NAACL 2021 • Ilias Chalkidis, Manos Fergadiotis, Dimitrios Tsarapatsanis, Nikolaos Aletras, Ion Androutsopoulos, Prodromos Malakasiotis
We also release a new dataset comprising European Court of Human Rights cases, including annotations for paragraph-level rationales.
no code implementations • 18 Jan 2021 • John Pavlopoulos, Vasiliki Kougia, Ion Androutsopoulos, Dimitris Papamichail
Diagnostic Captioning (DC) concerns the automatic generation of a diagnostic text from a set of medical images of a patient collected during an examination.
no code implementations • 12 Jan 2021 • Ilias Chalkidis, Manos Fergadiotis, Prodromos Malakasiotis, Ion Androutsopoulos
Morpho-syntactic features in the form of POS tag and token shape embeddings, as well as context-aware ELMO embeddings do not improve performance.
no code implementations • Findings of the Association for Computational Linguistics 2020 • Ilias Chalkidis, Manos Fergadiotis, Prodromos Malakasiotis, Nikolaos Aletras, Ion Androutsopoulos
Thus we propose a systematic investigation of the available strategies when applying BERT in specialised domains.
1 code implementation • EMNLP 2020 • Ilias Chalkidis, Manos Fergadiotis, Sotiris Kotitsas, Prodromos Malakasiotis, Nikolaos Aletras, Ion Androutsopoulos
Furthermore, we show that Transformer-based approaches outperform the state-of-the-art in two of the datasets, and we propose a new state-of-the-art method which combines BERT with LWANs.
Multi-Label Classification
Multi Label Text Classification
+4
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Giorgos Vernikos, Katerina Margatina, Alexandra Chronopoulou, Ion Androutsopoulos
To address this issue, we introduce a new regularization technique, AFTER; domain Adversarial Fine-Tuning as an Effective Regularizer.
1 code implementation • 27 Aug 2020 • John Koutsikakis, Ilias Chalkidis, Prodromos Malakasiotis, Ion Androutsopoulos
We expect these resources to boost NLP research and applications for modern Greek.
1 code implementation • ACL 2020 • John Pavlopoulos, Jeffrey Sorensen, Lucas Dixon, Nithum Thain, Ion Androutsopoulos
Moderation is crucial to promoting healthy on-line discussions.
1 code implementation • WS 2020 • Petros Stavropoulos, Dimitris Pappas, Ion Androutsopoulos, Ryan Mcdonald
Non-expert human performance is also higher on the new dataset compared to BIOREAD, and biomedical experts perform even better.
no code implementations • IJCNLP 2019 • Stratos Xenouleas, Prodromos Malakasiotis, Marianna Apidianaki, Ion Androutsopoulos
We propose SUM-QE, a novel Quality Estimation model for summarization based on BERT.
no code implementations • NeurIPS Workshop Document_Intelligen 2019 • Ilias Chalkidis, Manos Fergadiotis, Prodromos Malakasiotis, Ion Androutsopoulos
We investigate contract element extraction.
1 code implementation • 2 Sep 2019 • Stratos Xenouleas, Prodromos Malakasiotis, Marianna Apidianaki, Ion Androutsopoulos
We propose SumQE, a novel Quality Estimation model for summarization based on BERT.
2 code implementations • WS 2019 • Manolis Kyriakakis, Ion Androutsopoulos, Joan Ginés i Ametllé, Artur Saudabayev
We consider the task of detecting sentences that express causality, as a step towards mining causal relations from texts.
1 code implementation • WS 2019 • Sotiris Kotitsas, Dimitris Pappas, Ion Androutsopoulos, Ryan Mcdonald, Marianna Apidianaki
Many existing NE methods rely only on network structure, overlooking other information associated with the nodes, e. g., text describing the nodes.
1 code implementation • ACL 2019 • Ilias Chalkidis, Manos Fergadiotis, Prodromos Malakasiotis, Ion Androutsopoulos
We consider Large-Scale Multi-Label Text Classification (LMTC) in the legal domain.
Ranked #1 on
Multi-Label Text Classification
on EUR-Lex
no code implementations • ACL 2019 • Ilias Chalkidis, Ion Androutsopoulos, Nikolaos Aletras
Legal judgment prediction is the task of automatically predicting the outcome of a court case, given a text describing the case's facts.
Ranked #1 on
Binary text classification
on ECHR Non-Anonymized
no code implementations • SEMEVAL 2019 • John Pavlopoulos, Nithum Thain, Lucas Dixon, Ion Androutsopoulos
This paper presents the application of two strong baseline systems for toxicity detection and evaluates their performance in identifying and categorizing offensive language in social media.
1 code implementation • NAACL 2019 • Christos Baziotis, Ion Androutsopoulos, Ioannis Konstas, Alex Potamianos, ros
The proposed model does not require parallel text-summary pairs, achieving promising results in unsupervised sentence compression on benchmark datasets.
2 code implementations • WS 2019 • Vasiliki Kougia, John Pavlopoulos, Ion Androutsopoulos
Image captioning applied to biomedical images can assist and accelerate the diagnosis process followed by clinicians.
no code implementations • WS 2019 • Ilias Chalkidis, Manos Fergadiotis, Prodromos Malakasiotis, Nikolaos Aletras, Ion Androutsopoulos
We consider the task of Extreme Multi-Label Text Classification (XMTC) in the legal domain.
1 code implementation • 7 Apr 2019 • Christos Baziotis, Ion Androutsopoulos, Ioannis Konstas, Alexandros Potamianos
The proposed model does not require parallel text-summary pairs, achieving promising results in unsupervised sentence compression on benchmark datasets.
no code implementations • 31 Oct 2018 • Gerasimos Lampouras, Ion Androutsopoulos
Content selection, for example, may greedily select the most important facts, which may require, however, too many words to express, and this may be undesirable when space is limited or expensive.
no code implementations • 31 Oct 2018 • Gerasimos Lampouras, Ion Androutsopoulos
Many concept-to-text generation systems require domain-specific linguistic resources to produce high quality texts, but manually constructing these resources can be tedious and costly.
1 code implementation • WS 2018 • Georgios-Ioannis Brokos, Polyvios Liosis, Ryan Mcdonald, Dimitris Pappas, Ion Androutsopoulos
We present AUEB's submissions to the BioASQ 6 document and snippet retrieval tasks (parts of Task 6b, Phase A).
1 code implementation • EMNLP 2018 • Ryan McDonald, Georgios-Ioannis Brokos, Ion Androutsopoulos
We explore several new models for document relevance ranking, building upon the Deep Relevance Matching Model (DRMM) of Guo et al. (2016).
Ranked #7 on
Ad-Hoc Information Retrieval
on TREC Robust04
no code implementations • ACL 2018 • Ilias Chalkidis, Ion Androutsopoulos, Achilleas Michos
We consider the task of detecting contractual obligations and prohibitions.
no code implementations • EMNLP 2017 • John Pavlopoulos, Prodromos Malakasiotis, Ion Androutsopoulos
Experimenting with a new dataset of 1. 6M user comments from a news portal and an existing dataset of 115K Wikipedia talk page comments, we show that an RNN operating on word embeddings outpeforms the previous state of the art in moderation, which used logistic regression or an MLP classifier with character or word n-grams.
no code implementations • WS 2017 • John Pavlopoulos, Prodromos Malakasiotis, Juli Bakagianni, Ion Androutsopoulos
Experimenting with a dataset of approximately 1. 6M user comments from a Greek news sports portal, we explore how a state of the art RNN-based moderation method can be improved by adding user embeddings, user type embeddings, user biases, or user type biases.
no code implementations • WS 2017 • John Pavlopoulos, Prodromos Malakasiotis, Ion Androutsopoulos
We also compare against a CNN and a word-list baseline, considering both fully automatic and semi-automatic moderation.
no code implementations • SEMEVAL 2016 • Maria Pontiki, Dimitris Galanis, Haris Papageorgiou, Ion Androutsopoulos, Man, Suresh har, Mohammad AL-Smadi, Mahmoud Al-Ayyoub, Yanyan Zhao, Bing Qin, Orph{\'e}e De Clercq, V{\'e}ronique Hoste, Marianna Apidianaki, Xavier Tannier, Natalia Loukachevitch, Evgeniy Kotelnikov, Nuria Bel, Salud Mar{\'\i}a Jim{\'e}nez-Zafra, G{\"u}l{\c{s}}en Eryi{\u{g}}it
Aspect-Based Sentiment Analysis (ABSA)
Coreference Resolution
+1
no code implementations • 9 May 2015 • Aris Kosmopoulos, Georgios Paliouras, Ion Androutsopoulos
Hierarchies are frequently used for the organization of objects.
no code implementations • 30 Mar 2015 • Ioannis Partalas, Aris Kosmopoulos, Nicolas Baskiotis, Thierry Artieres, George Paliouras, Eric Gaussier, Ion Androutsopoulos, Massih-Reza Amini, Patrick Galinari
LSHTC is a series of challenges which aims to assess the performance of classification systems in large-scale classification in a a large number of classes (up to hundreds of thousands).
no code implementations • 24 Apr 2014 • Ion Androutsopoulos, Gerasimos Lampouras, Dimitrios Galanis
We present NaturalOWL, a natural language generation system that produces texts describing individuals or classes of OWL ontologies.
2 code implementations • 28 Jun 2013 • Aris Kosmopoulos, Ioannis Partalas, Eric Gaussier, Georgios Paliouras, Ion Androutsopoulos
Hierarchical classification addresses the problem of classifying items into a hierarchy of classes.
no code implementations • 18 Dec 2009 • Ion Androutsopoulos, Prodromos Malakasiotis
Paraphrasing methods recognize, generate, or extract phrases, sentences, or longer natural language expressions that convey almost the same information.