no code implementations • EACL (Hackashop) 2021 • Senja Pollak, Marko Robnik-Šikonja, Matthew Purver, Michele Boggia, Ravi Shekhar, Marko Pranjić, Salla Salmela, Ivar Krustok, Tarmo Paju, Carl-Gustav Linden, Leo Leppänen, Elaine Zosa, Matej Ulčar, Linda Freienthal, Silver Traat, Luis Adrián Cabrera-Diego, Matej Martinc, Nada Lavrač, Blaž Škrlj, Martin Žnidaršič, Andraž Pelicon, Boshko Koloski, Vid Podpečan, Janez Kranjc, Shane Sheehan, Emanuela Boros, Jose G. Moreno, Antoine Doucet, Hannu Toivonen
This paper presents tools and data sources collected and released by the EMBEDDIA project, supported by the European Union’s Horizon 2020 research and innovation program.
no code implementations • COLING (TextGraphs) 2022 • Thi Hong Hanh Tran, Matej Martinc, Antoine Doucet, Senja Pollak
The results demonstrate that the contextual representation is better at capturing meaningful information despite not being pretrained in the mathematical background compared to the statistical approach (e. g., the TF-IDF) with a boost of around 3. 00% MAP@500.
no code implementations • SemEval (NAACL) 2022 • Emanuela Boros, Carlos-Emiliano González-Gallardo, Jose Moreno, Antoine Doucet
Also, we consider that using additional contexts from the training set could improve the performance of a NER on short texts.
no code implementations • JEP/TALN/RECITAL 2022 • Stephen Mutuvi, Emanuela Boros, Antoine Doucet, Adam Jatowt, Gaël Lejeune, Moses Odeo
Dans cet article, nous explorons plusieurs hypothèses concernant les facteurs qui pourraient avoir une influence sur les performances d’un système d’extraction d’événements épidémiologiques dans un scénario multilingue à faibles ressources : le type de modèle pré-entraîné, la qualité du tokenizer ainsi que les caractéristiques des entités à extraire.
no code implementations • JEP/TALN/RECITAL 2022 • Emanuela Boros, Jose Moreno, Antoine Doucet
Dans cet article, nous abordons un paradigme récent et peu étudié pour la tâche de détection d’événements en la présentant comme un problème de question-réponse avec possibilité de réponses multiples et le support d’entités.
1 code implementation • EACL (BSNLP) 2021 • Luis Adrián Cabrera-Diego, Jose G. Moreno, Antoine Doucet
We present a collection of Named Entity Recognition (NER) systems for six Slavic languages: Bulgarian, Czech, Polish, Slovenian, Russian and Ukrainian.
no code implementations • 28 Sep 2023 • Julien Delaunay, Hanh Thi Hong Tran, Carlos-Emiliano González-Gallardo, Georgeta Bordea, Nicolas Sidere, Antoine Doucet
Document-level relation extraction (DocRE) is an active area of research in natural language processing (NLP) concerned with identifying and extracting relationships between entities beyond sentence boundaries.
1 code implementation • 30 Mar 2023 • Carlos-Emiliano González-Gallardo, Emanuela Boros, Nancy Girdhar, Ahmed Hamdi, Jose G. Moreno, Antoine Doucet
Large language models (LLMs) have been leveraged for several years now, obtaining state-of-the-art performance in recognizing entities from modern documents.
1 code implementation • 11 Feb 2023 • Štěpán Šimsa, Milan Šulc, Michal Uřičář, Yash Patel, Ahmed Hamdi, Matěj Kocián, Matyáš Skalický, Jiří Matas, Antoine Doucet, Mickaël Coustaty, Dimosthenis Karatzas
This paper introduces the DocILE benchmark with the largest dataset of business documents for the tasks of Key Information Localization and Extraction and Line Item Recognition.
no code implementations • LaTeCHCLfL (COLING) 2022 • Nicolas Gutehrlé, Antoine Doucet, Adam Jatowt
Archive collections are nowadays mostly available through search engines interfaces, which allow a user to retrieve documents by issuing queries.
1 code implementation • 20 Jan 2023 • Nhu Khoa Nguyen, Thierry Delahaut, Emanuela Boros, Antoine Doucet, Gaël Lejeune
Identifying and exploring emerging trends in the news is becoming more essential than ever with many changes occurring worldwide due to the global health crises.
no code implementations • 17 Jan 2023 • Hanh Thi Hong Tran, Matej Martinc, Jaya Caporusso, Antoine Doucet, Senja Pollak
Automatic term extraction (ATE) is a Natural Language Processing (NLP) task that eases the effort of manually identifying terms from domain-specific corpora by providing a list of candidate terms.
no code implementations • 12 Dec 2022 • Hanh Thi Hong Tran, Matej Martinc, Andraz Pelicon, Antoine Doucet, Senja Pollak
Automatic term extraction plays an essential role in domain language understanding and several natural language processing downstream tasks.
no code implementations • 4 Jul 2022 • Elvys Linhares Pontes, Mohamed Benjannet, Jose G. Moreno, Antoine Doucet
For the second sub-task, we combine the RoBERTa model with a feed-forward multi-layer perceptron in order to extract the context of sentences and classify them.
1 code implementation • 15 Dec 2021 • Tran Thi Hong Hanh, Antoine Doucet, Nicolas Sidere, Jose G. Moreno, Senja Pollak
Named entity recognition (NER) is an information extraction technique that aims to locate and classify named entities (e. g., organizations, locations,...) within a document into predefined categories.
1 code implementation • 23 Sep 2021 • Maud Ehrmann, Ahmed Hamdi, Elvys Linhares Pontes, Matteo Romanello, Antoine Doucet
After decades of massive digitisation, an unprecedented amount of historical documents is available in digital format, along with their machine-readable texts.
no code implementations • ACL 2021 • Yi Yu, Adam Jatowt, Antoine Doucet, Kazunari Sugiyama, Masatoshi Yoshikawa
In this paper, we address a novel task, Multiple TimeLine Summarization (MTLS), which extends the flexibility and versatility of Time-Line Summarization (TLS).
1 code implementation • 14 Apr 2021 • Emanuela Boros, Jose G. Moreno, Antoine Doucet
In this paper, we propose a recent and under-researched paradigm for the task of event detection (ED) by casting it as a question-answering (QA) problem with the possibility of multiple answers and the support of entities.
no code implementations • 13 Apr 2021 • Emanuela Boros, Antoine Doucet
This paper summarizes the participation of the Laboratoire Informatique, Image et Interaction (L3i laboratory) of the University of La Rochelle in the Recognizing Ultra Fine-grained Entities (RUFES) track within the Text Analysis Conference (TAC) series of evaluation workshops.
no code implementations • COLING 2020 • Stephen Mutuvi, Emanuela Boros, Antoine Doucet, Adam Jatowt, Ga{\"e}l Lejeune, Moses Odeo
We conduct a comparative study of different machine and deep learning text classification models using a dataset comprising news articles related to epidemic outbreaks from six languages, four low-resourced and two high-resourced, in order to analyze the influence of the nature of the language, the structure of the document, and the size of the data.
1 code implementation • CONLL 2020 • Emanuela Boros, Ahmed Hamdi, Elvys Linhares Pontes, Luis Adri{\'a}n Cabrera-Diego, Jose G. Moreno, Nicolas Sidere, Antoine Doucet
This paper tackles the task of named entity recognition (NER) applied to digitized historical texts obtained from processing digital images of newspapers using optical character recognition (OCR) techniques.
no code implementations • LREC 2020 • Stephen Mutuvi, Antoine Doucet, Ga{\"e}l Lejeune, Moses Odeo
This paper proposes a corpus for the development and evaluation of tools and techniques for identifying emerging infectious disease threats in online news text.
no code implementations • LREC 2020 • Esteban Frossard, Mickael Coustaty, Antoine Doucet, Adam Jatowt, Simon Hengchen
Languages change over time and, thanks to the abundance of digital corpora, their evolutionary analysis using computational techniques has recently gained much research attention.
no code implementations • 5 Mar 2020 • Cong Tri Pham, Mai Chi Luong, Dung Van Hoang, Antoine Doucet
The model performance is compared to that of 157 dermatologists from 12 university hospitals in Germany based on MClass-D dataset.
no code implementations • WS 2019 • Jose G. Moreno, Elvys Linhares Pontes, Mickael Coustaty, Antoine Doucet
This paper presents our participation at the shared task on multilingual named entity recognition at BSNLP2019.
Multilingual Named Entity Recognition named-entity-recognition +1
no code implementations • WS 2017 • Natalia Klyueva, Antoine Doucet, Milan Straka
In this paper we describe the MUMULS system that participated to the 2017 shared task on automatic identification of verbal multiword expressions (VMWEs).
no code implementations • WS 2017 • Agata Savary, Carlos Ramisch, Silvio Cordeiro, Federico Sangati, Veronika Vincze, Behrang Qasemizadeh, C, Marie ito, Fabienne Cap, Voula Giouli, Ivelina Stoyanova, Antoine Doucet
This paper presents the corpus annotation methodology and outcome, the shared task organisation and the results of the participating systems.