Search Results for author: Thomas Demeester

Found 69 papers, 41 papers with code

Joint entity recognition and relation extraction as a multi-head selection problem

6 code implementations20 Apr 2018 Giannis Bekoulis, Johannes Deleu, Thomas Demeester, Chris Develder

State-of-the-art models for joint entity recognition and relation extraction strongly rely on external natural language processing (NLP) tools such as POS (part-of-speech) taggers and dependency parsers.

POS Relation

Adversarial training for multi-context joint entity and relation extraction

1 code implementation EMNLP 2018 Giannis Bekoulis, Johannes Deleu, Thomas Demeester, Chris Develder

Adversarial training (AT) is a regularization method that can be used to improve the robustness of neural network methods by adding small perturbations in the training data.

Joint Entity and Relation Extraction Relation

In-Context Learning for Extreme Multi-Label Classification

2 code implementations22 Jan 2024 Karel D'Oosterlinck, Omar Khattab, François Remy, Thomas Demeester, Chris Develder, Christopher Potts

Multi-label classification problems with thousands of classes are hard to solve with in-context learning alone, as language models (LMs) might lack prior knowledge about the precise classes or how to assign them, and it is generally infeasible to demonstrate every class in a prompt.

Classification Extreme Multi-Label Classification +2

Jack the Reader - A Machine Reading Framework

2 code implementations20 Jun 2018 Dirk Weissenborn, Pasquale Minervini, Tim Dettmers, Isabelle Augenstein, Johannes Welbl, Tim Rocktäschel, Matko Bošnjak, Jeff Mitchell, Thomas Demeester, Pontus Stenetorp, Sebastian Riedel

For example, in Question Answering, the supporting text can be newswire or Wikipedia articles; in Natural Language Inference, premises can be seen as the supporting text and hypotheses as questions.

Link Prediction Natural Language Inference +3

Jack the Reader -- A Machine Reading Framework

1 code implementation ACL 2018 Dirk Weissenborn, Pasquale Minervini, Isabelle Augenstein, Johannes Welbl, Tim Rockt{\"a}schel, Matko Bo{\v{s}}njak, Jeff Mitchell, Thomas Demeester, Tim Dettmers, Pontus Stenetorp, Sebastian Riedel

For example, in Question Answering, the supporting text can be newswire or Wikipedia articles; in Natural Language Inference, premises can be seen as the supporting text and hypotheses as questions.

Information Retrieval Link Prediction +4

DeepProbLog: Neural Probabilistic Logic Programming

4 code implementations NeurIPS 2018 Robin Manhaeve, Sebastijan Dumančić, Angelika Kimmig, Thomas Demeester, Luc De Raedt

We introduce DeepProbLog, a probabilistic logic programming language that incorporates deep learning by means of neural predicates.

Program induction

A Self-Training Approach for Short Text Clustering

1 code implementation WS 2019 Amir Hadifar, Lucas Sterckx, Thomas Demeester, Chris Develder

Short text clustering is a challenging problem when adopting traditional bag-of-words or TF-IDF representations, since these lead to sparse vector representations of the short texts.

Clustering Deep Clustering +4

Adversarial Sets for Regularising Neural Link Predictors

1 code implementation24 Jul 2017 Pasquale Minervini, Thomas Demeester, Tim Rocktäschel, Sebastian Riedel

The training objective is defined as a minimax problem, where an adversary finds the most offending adversarial examples by maximising the inconsistency loss, and the model is trained by jointly minimising a supervised loss and the inconsistency loss on the adversarial examples.

Link Prediction Relational Reasoning

DWIE: an entity-centric dataset for multi-task document-level information extraction

2 code implementations26 Sep 2020 Klim Zaporojets, Johannes Deleu, Chris Develder, Thomas Demeester

Second, the document-level multi-task annotations require the models to transfer information between entity mentions located in different parts of the document, as well as between different tasks, in a joint learning setting.

 Ranked #1 on Coreference Resolution on DWIE (Avg. F1 metric)

coreference-resolution Entity Linking +5

Representation learning for very short texts using weighted word embedding aggregation

1 code implementation2 Jul 2016 Cedric De Boom, Steven Van Canneyt, Thomas Demeester, Bart Dhoedt

Traditional textual representations, such as tf-idf, have difficulty grasping the semantic meaning of such texts, which is important in applications such as event detection, opinion mining, news recommendation, etc.

Event Detection News Recommendation +5

BioDEX: Large-Scale Biomedical Adverse Drug Event Extraction for Real-World Pharmacovigilance

1 code implementation22 May 2023 Karel D'Oosterlinck, François Remy, Johannes Deleu, Thomas Demeester, Chris Develder, Klim Zaporojets, Aneiss Ghodsi, Simon Ellershaw, Jack Collins, Christopher Potts

We introduce BioDEX, a large-scale resource for Biomedical adverse Drug Event Extraction, rooted in the historical output of drug safety reporting in the U. S. BioDEX consists of 65k abstracts and 19k full-text biomedical papers with 256k associated document-level safety reports created by medical experts.

Event Extraction

Character-level Recurrent Neural Networks in Practice: Comparing Training and Sampling Schemes

2 code implementations2 Jan 2018 Cedric De Boom, Thomas Demeester, Bart Dhoedt

Recurrent neural networks are nowadays successfully used in an abundance of applications, going from text, speech and image processing to recommender systems.

Recommendation Systems

System Identification with Time-Aware Neural Sequence Models

1 code implementation21 Nov 2019 Thomas Demeester

Established recurrent neural networks are well-suited to solve a wide variety of prediction tasks involving discrete sequences.

Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They Discover Linguistic Rules?

1 code implementation EMNLP 2018 Fréderic Godin, Kris Demuynck, Joni Dambre, Wesley De Neve, Thomas Demeester

In this paper, we investigate which character-level patterns neural networks learn and if those patterns coincide with manually-defined word segmentations and annotations.

Morphological Tagging

Next-Year Bankruptcy Prediction from Textual Data: Benchmark and Baselines

1 code implementation24 Aug 2022 Henri Arno, Klaas Mulier, Joke Baeck, Thomas Demeester

Models for bankruptcy prediction are useful in several real-world scenarios, and multiple research contributions have been devoted to the task, based on structured (numerical) as well as unstructured (textual) data.

Towards Consistent Document-level Entity Linking: Joint Models for Entity Linking and Coreference Resolution

1 code implementation ACL 2022 Klim Zaporojets, Johannes Deleu, Yiwei Jiang, Thomas Demeester, Chris Develder

We consider the task of document-level entity linking (EL), where it is important to make consistent decisions for entity mentions over the full document jointly.

coreference-resolution Entity Linking +1

Design of Negative Sampling Strategies for Distantly Supervised Skill Extraction

1 code implementation13 Sep 2022 Jens-Joris Decorte, Jeroen Van Hautte, Johannes Deleu, Chris Develder, Thomas Demeester

We introduce a manually annotated evaluation benchmark for skill extraction based on the ESCO taxonomy, on which we validate our models.

Predefined Sparseness in Recurrent Sequence Models

1 code implementation CONLL 2018 Thomas Demeester, Johannes Deleu, Fréderic Godin, Chris Develder

Inducing sparseness while training neural networks has been shown to yield models with a lower memory footprint but similar effectiveness to dense models.

Language Modelling Word Embeddings

Injecting Knowledge Base Information into End-to-End Joint Entity and Relation Extraction and Coreference Resolution

1 code implementation Findings (ACL) 2021 Severine Verlinden, Klim Zaporojets, Johannes Deleu, Thomas Demeester, Chris Develder

The used KB entity representations are learned from either (i) hyperlinked text documents (Wikipedia), or (ii) a knowledge graph (Wikidata), and appear complementary in raising IE performance.

coreference-resolution Entity Linking +4

Efficiency Evaluation of Character-level RNN Training Schedules

1 code implementation9 May 2016 Cedric De Boom, Sam Leroux, Steven Bohez, Pieter Simoens, Thomas Demeester, Bart Dhoedt

We present four training and prediction schedules from the same character-level recurrent neural network.

EduQG: A Multi-format Multiple Choice Dataset for the Educational Domain

1 code implementation12 Oct 2022 Amir Hadifar, Semere Kiros Bitew, Johannes Deleu, Chris Develder, Thomas Demeester

Thus, our versatile dataset can be used for both question and distractor generation, as well as to explore new challenges such as question format conversion.

Distractor Generation Multiple-choice +3

CookDial: A dataset for task-oriented dialogs grounded in procedural documents

1 code implementation17 Jun 2022 Yiwei Jiang, Klim Zaporojets, Johannes Deleu, Thomas Demeester, Chris Develder

This work presents a new dialog dataset, CookDial, that facilitates research on task-oriented dialog systems with procedural knowledge understanding.

Decision Making Response Generation

CAW-coref: Conjunction-Aware Word-level Coreference Resolution

1 code implementation9 Oct 2023 Karel D'Oosterlinck, Semere Kiros Bitew, Brandon Papineau, Christopher Potts, Thomas Demeester, Chris Develder

State-of-the-art coreference resolutions systems depend on multiple LLM calls per document and are thus prohibitively expensive for many use cases (e. g., information extraction with large corpora).

coreference-resolution

Sub-event detection from Twitter streams as a sequence labeling problem

1 code implementation NAACL 2019 Giannis Bekoulis, Johannes Deleu, Thomas Demeester, Chris Develder

This paper introduces improved methods for sub-event detection in social media streams, by applying neural sequence models not only on the level of individual posts, but also directly on the stream level.

Event Detection

TempEL: Linking Dynamically Evolving and Newly Emerging Entities

1 code implementation5 Feb 2023 Klim Zaporojets, Lucie-Aimee Kaffee, Johannes Deleu, Thomas Demeester, Chris Develder, Isabelle Augenstein

For that study, we introduce TempEL, an entity linking dataset that consists of time-stratified English Wikipedia snapshots from 2013 to 2022, from which we collect both anchor mentions of entities, and these target entities' descriptions.

Entity Disambiguation Entity Linking

An attentive neural architecture for joint segmentation and parsing and its application to real estate ads

1 code implementation27 Sep 2017 Giannis Bekoulis, Johannes Deleu, Thomas Demeester, Chris Develder

In this work, we propose a new joint model that is able to tackle the two tasks simultaneously and construct the property tree by (i) avoiding the error propagation that would arise from the subtasks one after the other in a pipelined fashion, and (ii) exploiting the interactions between the subtasks.

Dependency Parsing

Reconstructing the house from the ad: Structured prediction on real estate classifieds

1 code implementation EACL 2017 Giannis Bekoulis, Johannes Deleu, Thomas Demeester, Chris Develder

In this paper, we address the (to the best of our knowledge) new problem of extracting a structured description of real estate properties from their natural language descriptions in classifieds.

Dependency Parsing Named Entity Recognition (NER) +1

Distractor generation for multiple-choice questions with predictive prompting and large language models

1 code implementation30 Jul 2023 Semere Kiros Bitew, Johannes Deleu, Chris Develder, Thomas Demeester

We also show the gains of our approach 1 in generating high-quality distractors by comparing it with a zero-shot ChatGPT and a few-shot ChatGPT prompted with static examples.

Distractor Generation Multiple-choice

Neural Bayesian Network Understudy

1 code implementation15 Nov 2022 Paloma Rabaey, Cedric De Boom, Thomas Demeester

Bayesian Networks may be appealing for clinical decision-making due to their inclusion of causal knowledge, but their practical adoption remains limited as a result of their inability to deal with unstructured data.

Decision Making

IDAS: Intent Discovery with Abstractive Summarization

1 code implementation31 May 2023 Maarten De Raedt, Fréderic Godin, Thomas Demeester, Chris Develder

Intent discovery is the task of inferring latent intents from a set of unlabeled utterances, and is a useful step towards the efficient creation of new conversational agents.

Abstractive Text Summarization Descriptive +4

Block-wise Dynamic Sparseness

1 code implementation14 Jan 2020 Amir Hadifar, Johannes Deleu, Chris Develder, Thomas Demeester

In this paper, we present a new method for \emph{dynamic sparseness}, whereby part of the computations are omitted dynamically, based on the input.

Language Modelling

An Emotional Journey: Detecting Emotion Trajectories in Dutch Customer Service Dialogues

2 code implementations COLING (WNUT) 2022 Sofie Labat, Amir Hadifar, Thomas Demeester, Veronique Hoste

The ability to track fine-grained emotions in customer service dialogues has many real-world applications, but has not been studied extensively.

Robustifying Sentiment Classification by Maximally Exploiting Few Counterfactuals

1 code implementation21 Oct 2022 Maarten De Raedt, Fréderic Godin, Chris Develder, Thomas Demeester

We demonstrate the effectiveness of our approach in sentiment classification, using IMDb data for training and other sets for OOD tests (i. e., Amazon, SemEval and Yelp).

counterfactual Sentiment Analysis +3

Lifted Rule Injection for Relation Embeddings

no code implementations EMNLP 2016 Thomas Demeester, Tim Rocktäschel, Sebastian Riedel

Methods based on representation learning currently hold the state-of-the-art in many natural language processing and knowledge base inference tasks.

Relation Representation Learning

Knowledge Base Population using Semantic Label Propagation

no code implementations19 Nov 2015 Lucas Sterckx, Thomas Demeester, Johannes Deleu, Chris Develder

We propose to combine distant supervision with minimal manual supervision in a technique called feature labeling, to eliminate noise from the large and noisy initial training set, resulting in a significant increase of precision.

Knowledge Base Population Relation

Learning Semantic Similarity for Very Short Texts

no code implementations2 Dec 2015 Cedric De Boom, Steven Van Canneyt, Steven Bohez, Thomas Demeester, Bart Dhoedt

We therefore investigated several text representations as a combination of word embeddings in the context of semantic pair matching.

Information Retrieval Retrieval +5

Prior Attention for Style-aware Sequence-to-Sequence Models

no code implementations25 Jun 2018 Lucas Sterckx, Johannes Deleu, Chris Develder, Thomas Demeester

We extend sequence-to-sequence models with the possibility to control the characteristics or style of the generated output, via attention that is generated a priori (before decoding) from a latent code vector.

Lexical Simplification Sentence

Neural Probabilistic Logic Programming in DeepProbLog

no code implementations NeurIPS 2018 Robin Manhaeve, Sebastijan Dumančić, Angelika Kimmig, Thomas Demeester, Luc De Raedt

We introduce DeepProbLog, a neural probabilistic logic programming language that incorporates deep learning by means of neural predicates.

Program induction

Solving Arithmetic Word Problems by Scoring Equations with Recursive Neural Networks

no code implementations11 Sep 2020 Klim Zaporojets, Giannis Bekoulis, Johannes Deleu, Thomas Demeester, Chris Develder

Recent works use automatic extraction and ranking of candidate solution equations providing the answer to arithmetic word problems.

A Million Tweets Are Worth a Few Points: Tuning Transformers for Customer Service Tasks

1 code implementation NAACL 2021 Amir Hadifar, Sofie Labat, Véronique Hoste, Chris Develder, Thomas Demeester

In online domain-specific customer service applications, many companies struggle to deploy advanced NLP models successfully, due to the limited availability of and noise in their datasets.

UGent-T2K at the 2nd DialDoc Shared Task: A Retrieval-Focused Dialog System Grounded in Multiple Documents

no code implementations dialdoc (ACL) 2022 Yiwei Jiang, Amir Hadifar, Johannes Deleu, Thomas Demeester, Chris Develder

Further, error analysis reveals two major failure cases, to be addressed in future work: (i) in case of topic shift within the dialog, retrieval often fails to select the correct grounding document(s), and (ii) generation sometimes fails to use the correctly retrieved grounding passage.

Passage Retrieval Response Generation +1

Variation in the Expression and Annotation of Emotions: A Wizard of Oz Pilot Study

no code implementations NLPerspectives (LREC) 2022 Sofie Labat, Naomi Ackaert, Thomas Demeester, Veronique Hoste

Finally, for the third premise, we observed a positive correlation between the internal-external agreement on emotion labels and the personality traits conscientiousness and extraversion.

BioLORD: Learning Ontological Representations from Definitions (for Biomedical Concepts and their Textual Descriptions)

no code implementations21 Oct 2022 François Remy, Kris Demuynck, Thomas Demeester

This work introduces BioLORD, a new pre-training strategy for producing meaningful representations for clinical sentences and biomedical concepts.

Contrastive Learning text similarity

Learning to Reuse Distractors to support Multiple Choice Question Generation in Education

1 code implementation25 Oct 2022 Semere Kiros Bitew, Amir Hadifar, Lucas Sterckx, Johannes Deleu, Chris Develder, Thomas Demeester

This paper studies how a large existing set of manually created answers and distractors for questions over a variety of domains, subjects, and languages can be leveraged to help teachers in creating new MCQs, by the smart reuse of existing distractors.

Multiple-choice Question Generation +1

Detecting Idiomatic Multiword Expressions in Clinical Terminology using Definition-Based Representation Learning

no code implementations11 May 2023 François Remy, Alfiya Khabibullina, Thomas Demeester

This paper shines a light on the potential of definition-based semantic models for detecting idiomatic and semi-idiomatic multiword expressions (MWEs) in clinical terminology.

Language Modelling Representation Learning

Learning from Partially Annotated Data: Example-aware Creation of Gap-filling Exercises for Language Learning

1 code implementation2 Jun 2023 Semere Kiros Bitew, Johannes Deleu, A. Seza Doğruöz, Chris Develder, Thomas Demeester

Since performing exercises (including, e. g., practice tests) forms a crucial component of learning, and creating such exercises requires non-trivial effort from the teacher, there is a great value in automatic exercise generation in digital tools in education.

Extreme Multi-Label Skill Extraction Training using Large Language Models

no code implementations20 Jul 2023 Jens-Joris Decorte, Severine Verlinden, Jeroen Van Hautte, Johannes Deleu, Chris Develder, Thomas Demeester

Online job ads serve as a valuable source of information for skill requirements, playing a crucial role in labor market analysis and e-recruitment processes.

Contrastive Learning Extreme Multi-Label Classification

EmoTwiCS: A Corpus for Modelling Emotion Trajectories in Dutch Customer Service Dialogues on Twitter

no code implementations10 Oct 2023 Sofie Labat, Thomas Demeester, Véronique Hoste

In our business-oriented corpus, we view emotions as dynamic attributes of the customer that can change at each utterance of the conversation.

Career Path Prediction using Resume Representation Learning and Skill-based Matching

no code implementations24 Oct 2023 Jens-Joris Decorte, Jeroen Van Hautte, Johannes Deleu, Chris Develder, Thomas Demeester

The impact of person-job fit on job satisfaction and performance is widely acknowledged, which highlights the importance of providing workers with next steps at the right time in their career.

Representation Learning

Zero-Shot Cross-Lingual Sentiment Classification under Distribution Shift: an Exploratory Study

no code implementations11 Nov 2023 Maarten De Raedt, Semere Kiros Bitew, Fréderic Godin, Thomas Demeester, Chris Develder

The brittleness of finetuned language model performance on out-of-distribution (OOD) test samples in unseen domains has been well-studied for English, yet is unexplored for multi-lingual models.

Cross-Lingual Sentiment Classification Sentiment Analysis +3

BioLORD-2023: Semantic Textual Representations Fusing LLM and Clinical Knowledge Graph Insights

no code implementations27 Nov 2023 François Remy, Kris Demuynck, Thomas Demeester

Our new multilingual model enables a range of languages to benefit from our advancements in biomedical semantic representation learning, opening a new avenue for bioinformatics researchers around the world.

Clinical Knowledge Contrastive Learning +2

Accelerating Hierarchical Associative Memory: A Deep Equilibrium Approach

1 code implementation27 Nov 2023 Cédric Goemaere, Johannes Deleu, Thomas Demeester

Hierarchical Associative Memory models have recently been proposed as a versatile extension of continuous Hopfield networks.

Retrieval

Training a Hopfield Variational Autoencoder with Equilibrium Propagation

no code implementations25 Nov 2023 Tom Van Der Meersch, Johannes Deleu, Thomas Demeester

In spite of its theoretical guarantees, its application in the AI domain remains limited to the discriminative setting.

Exploring the Temperature-Dependent Phase Transition in Modern Hopfield Networks

no code implementations30 Nov 2023 Felix Koulischer, Cédric Goemaere, Tom Van Der Meersch, Johannes Deleu, Thomas Demeester

To achieve this, the distribution of energy minima is tracked in a simplified MHN in which equidistant normalised patterns are stored.

Clinical Reasoning over Tabular Data and Text with Bayesian Networks

1 code implementation14 Mar 2024 Paloma Rabaey, Johannes Deleu, Stefan Heytens, Thomas Demeester

Bayesian networks are well-suited for clinical reasoning on tabular data, but are less compatible with natural language data, for which neural networks provide a successful framework.

Cannot find the paper you are looking for? You can Submit a new open access paper.