1 code implementation • EMNLP 2021 • Ilias Chalkidis, Manos Fergadiotis, Ion Androutsopoulos
We use the dataset as a testbed for zero-shot cross-lingual transfer, where we exploit annotated training documents in one language (source) to classify documents in another language (target).
1 code implementation • 20 Mar 2024 • Ilias Chalkidis, Stephanie Brandl
Instruction-finetuned Large Language Models inherit clear political leanings that have been shown to influence downstream task performance.
no code implementations • 25 Oct 2023 • Stephanie Brandl, Emanuele Bugliarello, Ilias Chalkidis
In order to build reliable and trustworthy NLP applications, models need to be both fair across different demographics and explainable.
no code implementations • 18 Oct 2023 • Oliver Eberle, Ilias Chalkidis, Laura Cabello, Stephanie Brandl
A cross-comparison between model-based rationales and human annotations, both in contrastive and non-contrastive settings, yields a high agreement between the two settings for models as well as for humans.
no code implementations • 9 Oct 2023 • Catalina Goanta, Nikolaos Aletras, Ilias Chalkidis, Sofia Ranchordas, Gerasimos Spanakis
Regulation studies are a rich source of knowledge on how to systematically deal with risk and uncertainty, as well as with scientific evidence, to evaluate and compare regulatory options.
2 code implementations • 15 Jun 2023 • Vishvaksenan Rasiah, Ronja Stern, Veton Matoshi, Matthias Stürmer, Ilias Chalkidis, Daniel E. Ho, Joel Niklaus
In this paper, we introduce a novel NLP benchmark that poses challenges to current LLMs across four key dimensions: processing long documents (up to 50K tokens), utilizing domain specific knowledge (embodied in legal texts), multilingual understanding (covering five languages), and multitasking (comprising legal document to document Information Retrieval, Court View Generation, Leading Decision Summarization, Citation Extraction, and eight challenging Text Classification tasks).
no code implementations • 5 Jun 2023 • Laura Cabello, Jiaang Li, Ilias Chalkidis
We then evaluate its ability to acquire new knowledge and include it in its reasoning process.
no code implementations • 3 Jun 2023 • Joel Niklaus, Veton Matoshi, Matthias Stürmer, Ilias Chalkidis, Daniel E. Ho
Large, high-quality datasets are crucial for training Large Language Models (LLMs).
no code implementations • 25 May 2023 • Daniel Saggau, Mina Rezaei, Bernd Bischl, Ilias Chalkidis
Learning quality document embeddings is a fundamental problem in natural language processing (NLP), information retrieval (IR), recommendation systems, and search engines.
no code implementations • 22 May 2023 • Ilias Chalkidis, Yova Kementchedjhieva
Multi-label text classification (MLC) is a challenging task in settings of large label sets, where label support follows a Zipfian distribution.
Multi Label Text Classification Multi-Label Text Classification +2
1 code implementation • 12 May 2023 • Ilias Chalkidis, Nicolas Garneau, Catalina Goanta, Daniel Martin Katz, Anders Søgaard
To this end, we release a multinational English legal corpus (LeXFiles) and a legal knowledge probing benchmark (LegalLAMA) to facilitate training and detailed analysis of legal-oriented PLMs.
1 code implementation • 9 May 2023 • Yova Kementchedjhieva, Ilias Chalkidis
Standard methods for multi-label text classification largely rely on encoder-only pre-trained language models, whereas encoder-decoder models have proven more effective in other classification tasks.
Multi-Label Classification Multi Label Text Classification +2
1 code implementation • 9 Mar 2023 • Ilias Chalkidis
Following the hype around OpenAI's ChatGPT conversational agent, the last straw in the recent development of Large Language Models (LLMs) that demonstrate emergent unprecedented zero-shot capabilities, we audit the latest OpenAI's GPT-3. 5 model, `gpt-3. 5-turbo', the first available ChatGPT model, in the LexGLUE benchmark in a zero-shot fashion providing examples in a templated instruction-following format.
1 code implementation • 30 Jan 2023 • Joel Niklaus, Veton Matoshi, Pooja Rani, Andrea Galassi, Matthias Stürmer, Ilias Chalkidis
To provide a fair comparison, we propose two aggregate scores, one based on the datasets and one on the languages.
no code implementations • 2 Nov 2022 • Dimitris Mamakas, Petros Tsotsi, Ion Androutsopoulos, Ilias Chalkidis
Even sparse-attention models, such as Longformer and BigBird, which increase the maximum input length to 4, 096 sub-words, severely truncate texts in three of the six datasets of LexGLUE.
no code implementations • 24 Oct 2022 • Stelios Maroudas, Sotiris Legkas, Prodromos Malakasiotis, Ilias Chalkidis
In the era of billion-parameter-sized Language Models (LMs), start-ups have to follow trends and adapt their technology accordingly.
no code implementations • 11 Oct 2022 • Ilias Chalkidis, Xiang Dai, Manos Fergadiotis, Prodromos Malakasiotis, Desmond Elliott
Non-hierarchical sparse attention Transformer-based models, such as Longformer and Big Bird, are popular approaches to working with long documents.
2 code implementations • 25 Sep 2022 • Joel Niklaus, Matthias Stürmer, Ilias Chalkidis
We find that in both settings (legal areas, origin regions), models trained across all groups perform overall better, while they also have improved results in the worst-case scenarios.
no code implementations • 8 Jun 2022 • Stratos Xenouleas, Alexia Tsoukara, Giannis Panagiotakis, Ilias Chalkidis, Ion Androutsopoulos
We consider zero-shot cross-lingual transfer in legal topic classification using the recent MultiEURLEX dataset.
1 code implementation • 14 Apr 2022 • Xiang Dai, Ilias Chalkidis, Sune Darkner, Desmond Elliott
The recent literature in text classification is biased towards short text sequences (e. g., sentences or paragraphs).
no code implementations • ACL 2022 • Daniel Hershcovich, Stella Frank, Heather Lent, Miryam de Lhoneux, Mostafa Abdou, Stephanie Brandl, Emanuele Bugliarello, Laura Cabello Piqueras, Ilias Chalkidis, Ruixiang Cui, Constanza Fierro, Katerina Margatina, Phillip Rust, Anders Søgaard
Various efforts in the Natural Language Processing (NLP) community have been made to accommodate linguistic diversity and serve speakers of many different languages.
1 code implementation • Findings (ACL) 2022 • Ilias Chalkidis, Anders Søgaard
In document classification for, e. g., legal and biomedical text, we often deal with hundreds of classes, including very infrequent ones, as well as temporal concept drift caused by the influence of real world events, e. g., policy changes, conflicts, or pandemics.
1 code implementation • ACL 2022 • Ilias Chalkidis, Tommaso Pasini, Sheng Zhang, Letizia Tomada, Sebastian Felix Schwemer, Anders Søgaard
We present a benchmark suite of four datasets for evaluating the fairness of pre-trained language models and the techniques used to fine-tune them for downstream tasks.
1 code implementation • ACL 2022 • Lefteris Loukas, Manos Fergadiotis, Ilias Chalkidis, Eirini Spyropoulou, Prodromos Malakasiotis, Ion Androutsopoulos, Georgios Paliouras
We, therefore, introduce XBRL tagging as a new entity extraction task for the financial domain and release FiNER-139, a dataset of 1. 1M sentences with gold XBRL tags.
1 code implementation • ACL 2022 • Ilias Chalkidis, Abhik Jana, Dirk Hartung, Michael Bommarito, Ion Androutsopoulos, Daniel Martin Katz, Nikolaos Aletras
Laws and their interpretations, legal arguments and agreements\ are typically expressed in writing, leading to the production of vast corpora of legal text.
Ranked #1 on Natural Language Understanding on LexGLUE
1 code implementation • EMNLP (NLLP) 2021 • Joel Niklaus, Ilias Chalkidis, Matthias Stürmer
We evaluate state-of-the-art BERT-based methods including two variants of BERT that overcome the BERT input (text) length limitation (up to 512 tokens).
1 code implementation • EMNLP (NLLP) 2021 • Christos Papaloukas, Ilias Chalkidis, Konstantinos Athinaios, Despina-Athanasia Pantazi, Manolis Koubarakis
In this work, we study the task of classifying legal texts written in the Greek language.
1 code implementation • 2 Sep 2021 • Ilias Chalkidis, Manos Fergadiotis, Ion Androutsopoulos
We use the dataset as a testbed for zero-shot cross-lingual transfer, where we exploit annotated training documents in one language (source) to classify documents in another language (target).
no code implementations • NAACL 2021 • Ilias Chalkidis, Manos Fergadiotis, Dimitrios Tsarapatsanis, Nikolaos Aletras, Ion Androutsopoulos, Prodromos Malakasiotis
We also release a new dataset comprising European Court of Human Rights cases, including annotations for paragraph-level rationales.
no code implementations • EACL 2021 • Ilias Chalkidis, Manos Fergadiotis, Nikolaos Manginas, Eva Katakalou, Prodromos Malakasiotis
Major scandals in corporate history have urged the need for regulatory compliance, where organizations need to ensure that their controls (processes) comply with relevant laws, regulations, and policies.
no code implementations • 12 Jan 2021 • Ilias Chalkidis, Manos Fergadiotis, Prodromos Malakasiotis, Ion Androutsopoulos
Morpho-syntactic features in the form of POS tag and token shape embeddings, as well as context-aware ELMO embeddings do not improve performance.
no code implementations • EMNLP (spnlp) 2020 • Nikolaos Manginas, Ilias Chalkidis, Prodromos Malakasiotis
Although BERT is widely used by the NLP community, little is known about its inner workings.
no code implementations • Findings of the Association for Computational Linguistics 2020 • Ilias Chalkidis, Manos Fergadiotis, Prodromos Malakasiotis, Nikolaos Aletras, Ion Androutsopoulos
Thus we propose a systematic investigation of the available strategies when applying BERT in specialised domains.
1 code implementation • EMNLP 2020 • Ilias Chalkidis, Manos Fergadiotis, Sotiris Kotitsas, Prodromos Malakasiotis, Nikolaos Aletras, Ion Androutsopoulos
Furthermore, we show that Transformer-based approaches outperform the state-of-the-art in two of the datasets, and we propose a new state-of-the-art method which combines BERT with LWANs.
Multi-Label Classification Multi Label Text Classification +5
1 code implementation • 27 Aug 2020 • John Koutsikakis, Ilias Chalkidis, Prodromos Malakasiotis, Ion Androutsopoulos
We expect these resources to boost NLP research and applications for modern Greek.
no code implementations • NeurIPS Workshop Document_Intelligen 2019 • Ilias Chalkidis, Manos Fergadiotis, Prodromos Malakasiotis, Ion Androutsopoulos
We investigate contract element extraction.
no code implementations • ACL 2019 • Ilias Chalkidis, Ion Androutsopoulos, Nikolaos Aletras
Legal judgment prediction is the task of automatically predicting the outcome of a court case, given a text describing the case's facts.
Ranked #1 on Binary text classification on ECHR Non-Anonymized
1 code implementation • ACL 2019 • Ilias Chalkidis, Manos Fergadiotis, Prodromos Malakasiotis, Ion Androutsopoulos
We consider Large-Scale Multi-Label Text Classification (LMTC) in the legal domain.
Ranked #1 on Multi-Label Text Classification on EUR-Lex
no code implementations • WS 2019 • Ilias Chalkidis, Manos Fergadiotis, Prodromos Malakasiotis, Nikolaos Aletras, Ion Androutsopoulos
We consider the task of Extreme Multi-Label Text Classification (XMTC) in the legal domain.
no code implementations • ACL 2018 • Ilias Chalkidis, Ion Androutsopoulos, Achilleas Michos
We consider the task of detecting contractual obligations and prohibitions.