Search Results for author: Ilias Chalkidis

Found 40 papers, 19 papers with code

MultiEURLEX - A multi-lingual and multi-label legal document classification dataset for zero-shot cross-lingual transfer

1 code implementation EMNLP 2021 Ilias Chalkidis, Manos Fergadiotis, Ion Androutsopoulos

We use the dataset as a testbed for zero-shot cross-lingual transfer, where we exploit annotated training documents in one language (source) to classify documents in another language (target).

Document Classification Topic Classification +1

Llama meets EU: Investigating the European Political Spectrum through the Lens of LLMs

1 code implementation20 Mar 2024 Ilias Chalkidis, Stephanie Brandl

Instruction-finetuned Large Language Models inherit clear political leanings that have been shown to influence downstream task performance.

On the Interplay between Fairness and Explainability

no code implementations25 Oct 2023 Stephanie Brandl, Emanuele Bugliarello, Ilias Chalkidis

In order to build reliable and trustworthy NLP applications, models need to be both fair across different demographics and explainable.

Fairness Multi Class Text Classification +2

Rather a Nurse than a Physician -- Contrastive Explanations under Investigation

no code implementations18 Oct 2023 Oliver Eberle, Ilias Chalkidis, Laura Cabello, Stephanie Brandl

A cross-comparison between model-based rationales and human annotations, both in contrastive and non-contrastive settings, yields a high agreement between the two settings for models as well as for humans.

text-classification Text Classification

Regulation and NLP (RegNLP): Taming Large Language Models

no code implementations9 Oct 2023 Catalina Goanta, Nikolaos Aletras, Ilias Chalkidis, Sofia Ranchordas, Gerasimos Spanakis

Regulation studies are a rich source of knowledge on how to systematically deal with risk and uncertainty, as well as with scientific evidence, to evaluate and compare regulatory options.


SCALE: Scaling up the Complexity for Advanced Language Model Evaluation

2 code implementations15 Jun 2023 Vishvaksenan Rasiah, Ronja Stern, Veton Matoshi, Matthias Stürmer, Ilias Chalkidis, Daniel E. Ho, Joel Niklaus

In this paper, we introduce a novel NLP benchmark that poses challenges to current LLMs across four key dimensions: processing long documents (up to 50K tokens), utilizing domain specific knowledge (embodied in legal texts), multilingual understanding (covering five languages), and multitasking (comprising legal document to document Information Retrieval, Court View Generation, Leading Decision Summarization, Citation Extraction, and eight challenging Text Classification tasks).

Information Retrieval Language Modelling +2

MultiLegalPile: A 689GB Multilingual Legal Corpus

no code implementations3 Jun 2023 Joel Niklaus, Veton Matoshi, Matthias Stürmer, Ilias Chalkidis, Daniel E. Ho

Large, high-quality datasets are crucial for training Large Language Models (LLMs).

Efficient Document Embeddings via Self-Contrastive Bregman Divergence Learning

no code implementations25 May 2023 Daniel Saggau, Mina Rezaei, Bernd Bischl, Ilias Chalkidis

Learning quality document embeddings is a fundamental problem in natural language processing (NLP), information retrieval (IR), recommendation systems, and search engines.

Contrastive Learning Information Retrieval +5

Retrieval-augmented Multi-label Text Classification

no code implementations22 May 2023 Ilias Chalkidis, Yova Kementchedjhieva

Multi-label text classification (MLC) is a challenging task in settings of large label sets, where label support follows a Zipfian distribution.

Multi Label Text Classification Multi-Label Text Classification +2

LeXFiles and LegalLAMA: Facilitating English Multinational Legal Language Model Development

1 code implementation12 May 2023 Ilias Chalkidis, Nicolas Garneau, Catalina Goanta, Daniel Martin Katz, Anders Søgaard

To this end, we release a multinational English legal corpus (LeXFiles) and a legal knowledge probing benchmark (LegalLAMA) to facilitate training and detailed analysis of legal-oriented PLMs.

Knowledge Probing Language Modelling

An Exploration of Encoder-Decoder Approaches to Multi-Label Classification for Legal and Biomedical Text

1 code implementation9 May 2023 Yova Kementchedjhieva, Ilias Chalkidis

Standard methods for multi-label text classification largely rely on encoder-only pre-trained language models, whereas encoder-decoder models have proven more effective in other classification tasks.

Decoder Multi-Label Classification +3

ChatGPT may Pass the Bar Exam soon, but has a Long Way to Go for the LexGLUE benchmark

1 code implementation9 Mar 2023 Ilias Chalkidis

Following the hype around OpenAI's ChatGPT conversational agent, the last straw in the recent development of Large Language Models (LLMs) that demonstrate emergent unprecedented zero-shot capabilities, we audit the latest OpenAI's GPT-3. 5 model, `gpt-3. 5-turbo', the first available ChatGPT model, in the LexGLUE benchmark in a zero-shot fashion providing examples in a templated instruction-following format.

Instruction Following

LEXTREME: A Multi-Lingual and Multi-Task Benchmark for the Legal Domain

1 code implementation30 Jan 2023 Joel Niklaus, Veton Matoshi, Pooja Rani, Andrea Galassi, Matthias Stürmer, Ilias Chalkidis

To provide a fair comparison, we propose two aggregate scores, one based on the datasets and one on the languages.


Processing Long Legal Documents with Pre-trained Transformers: Modding LegalBERT and Longformer

no code implementations2 Nov 2022 Dimitris Mamakas, Petros Tsotsi, Ion Androutsopoulos, Ilias Chalkidis

Even sparse-attention models, such as Longformer and BigBird, which increase the maximum input length to 4, 096 sub-words, severely truncate texts in three of the six datasets of LexGLUE.

Document Classification

An Exploration of Hierarchical Attention Transformers for Efficient Long Document Classification

no code implementations11 Oct 2022 Ilias Chalkidis, Xiang Dai, Manos Fergadiotis, Prodromos Malakasiotis, Desmond Elliott

Non-hierarchical sparse attention Transformer-based models, such as Longformer and Big Bird, are popular approaches to working with long documents.

Document Classification

An Empirical Study on Cross-X Transfer for Legal Judgment Prediction

2 code implementations25 Sep 2022 Joel Niklaus, Matthias Stürmer, Ilias Chalkidis

We find that in both settings (legal areas, origin regions), models trained across all groups perform overall better, while they also have improved results in the worst-case scenarios.

Cross-Lingual Transfer Transfer Learning

Revisiting Transformer-based Models for Long Document Classification

1 code implementation14 Apr 2022 Xiang Dai, Ilias Chalkidis, Sune Darkner, Desmond Elliott

The recent literature in text classification is biased towards short text sequences (e. g., sentences or paragraphs).

Document Classification text-classification

Improved Multi-label Classification under Temporal Concept Drift: Rethinking Group-Robust Algorithms in a Label-Wise Setting

1 code implementation Findings (ACL) 2022 Ilias Chalkidis, Anders Søgaard

In document classification for, e. g., legal and biomedical text, we often deal with hundreds of classes, including very infrequent ones, as well as temporal concept drift caused by the influence of real world events, e. g., policy changes, conflicts, or pandemics.

Document Classification Multi-Label Classification

FairLex: A Multilingual Benchmark for Evaluating Fairness in Legal Text Processing

1 code implementation ACL 2022 Ilias Chalkidis, Tommaso Pasini, Sheng Zhang, Letizia Tomada, Sebastian Felix Schwemer, Anders Søgaard

We present a benchmark suite of four datasets for evaluating the fairness of pre-trained language models and the techniques used to fine-tune them for downstream tasks.


FiNER: Financial Numeric Entity Recognition for XBRL Tagging

1 code implementation ACL 2022 Lefteris Loukas, Manos Fergadiotis, Ilias Chalkidis, Eirini Spyropoulou, Prodromos Malakasiotis, Ion Androutsopoulos, Georgios Paliouras

We, therefore, introduce XBRL tagging as a new entity extraction task for the financial domain and release FiNER-139, a dataset of 1. 1M sentences with gold XBRL tags.


Swiss-Judgment-Prediction: A Multilingual Legal Judgment Prediction Benchmark

1 code implementation EMNLP (NLLP) 2021 Joel Niklaus, Ilias Chalkidis, Matthias Stürmer

We evaluate state-of-the-art BERT-based methods including two variants of BERT that overcome the BERT input (text) length limitation (up to 512 tokens).

MultiEURLEX -- A multi-lingual and multi-label legal document classification dataset for zero-shot cross-lingual transfer

1 code implementation2 Sep 2021 Ilias Chalkidis, Manos Fergadiotis, Ion Androutsopoulos

We use the dataset as a testbed for zero-shot cross-lingual transfer, where we exploit annotated training documents in one language (source) to classify documents in another language (target).

Document Classification Topic Classification +1

Regulatory Compliance through Doc2Doc Information Retrieval: A case study in EU/UK legislation where text similarity has limitations

no code implementations EACL 2021 Ilias Chalkidis, Manos Fergadiotis, Nikolaos Manginas, Eva Katakalou, Prodromos Malakasiotis

Major scandals in corporate history have urged the need for regulatory compliance, where organizations need to ensure that their controls (processes) comply with relevant laws, regulations, and policies.

domain classification Information Retrieval +2

Neural Contract Element Extraction Revisited: Letters from Sesame Street

no code implementations12 Jan 2021 Ilias Chalkidis, Manos Fergadiotis, Prodromos Malakasiotis, Ion Androutsopoulos

Morpho-syntactic features in the form of POS tag and token shape embeddings, as well as context-aware ELMO embeddings do not improve performance.


Cannot find the paper you are looking for? You can Submit a new open access paper.