Search Results for author: Walter Daelemans

Found 70 papers, 21 papers with code

MBT: A Memory-Based Part of Speech Tagger-Generator

1 code implementation11 Jul 1996 Walter Daelemans, Jakub Zavrel, Peter Berck, Steven Gillis

In this paper we show that a large-scale application of the memory-based approach is feasible: we obtain a tagging accuracy that is on a par with that of known statistical approaches, and with attractive space and time complexity properties when using {\em IGTree}, a tree-based formalism for indexing and searching huge case bases.}

Incremental Learning Morphological Analysis +2

Forgetting Exceptions is Harmful in Language Learning

no code implementations22 Dec 1998 Walter Daelemans, Antal Van den Bosch, Jakub Zavrel

We provide explanations for both results in terms of the properties of the natural language processing tasks and the learning algorithms.

Chunking Part-Of-Speech Tagging +1

ConanDoyle-neg: Annotation of negation cues and their scope in Conan Doyle stories

no code implementations LREC 2012 Roser Morante, Walter Daelemans

In this paper we present ConanDoyle-neg, a corpus of stories by Conan Doyle annotated with negation information.

Negation

The Netlog Corpus. A Resource for the Study of Flemish Dutch Internet Language

no code implementations LREC 2012 Mike Kestemont, Claudia Peersman, Benny De Decker, Guy De Pauw, Kim Luyckx, Roser Morante, Frederik Vaassen, Janneke van de Loo, Walter Daelemans

Although in recent years numerous forms of Internet communication ― such as e-mail, blogs, chat rooms and social network environments ― have emerged, balanced corpora of Internet speech with trustworthy meta-information (e. g. age and gender) or linguistic annotations are still limited.

Lemmatization POS +2

``Vreselijk mooi!'' (terribly beautiful): A Subjectivity Lexicon for Dutch Adjectives.

no code implementations LREC 2012 Tom De Smedt, Walter Daelemans

The lexicon is a dictionary of 1, 100 adjectives that occur frequently in online product reviews, manually annotated with polarity strength, subjectivity and intensity, for each word sense.

BIG-bench Machine Learning Lemmatization +3

The Effects of Age, Gender and Region on Non-standard Linguistic Variation in Online Social Networks

no code implementations11 Jan 2016 Claudia Peersman, Walter Daelemans, Reinhild Vandekerckhove, Bram Vandekerckhove, Leona Van Vaerenbergh

We present a corpus-based analysis of the effects of age, gender and region of origin on the production of both "netspeak" or "chatspeak" features and regional speech features in Flemish Dutch posts that were collected from a Belgian online social network platform.

Predicting the Effectiveness of Self-Training: Application to Sentiment Classification

no code implementations13 Jan 2016 Vincent Van Asch, Walter Daelemans

The goal of this paper is to investigate the connection between the performance gain that can be obtained by selftraining and the similarity between the corpora used in this approach.

Classification General Classification +2

Evaluating Unsupervised Dutch Word Embeddings as a Linguistic Resource

1 code implementation LREC 2016 Stéphan Tulkens, Chris Emmery, Walter Daelemans

With this research, we provide the embeddings themselves, the relation evaluation task benchmark for use in further research, and demonstrate how the benchmarked embeddings prove a useful unsupervised linguistic resource, effectively used in a downstream task.

Dialect Identification Relation +1

Using Distributed Representations to Disambiguate Biomedical and Clinical Concepts

2 code implementations WS 2016 Stéphan Tulkens, Simon Šuster, Walter Daelemans

In this paper, we report a knowledge-based method for Word Sense Disambiguation in the domains of biomedical and clinical text.

Word Sense Disambiguation

A Dictionary-based Approach to Racism Detection in Dutch Social Media

1 code implementation31 Aug 2016 Stéphan Tulkens, Lisa Hilte, Elise Lodewyckx, Ben Verhoeven, Walter Daelemans

The best-performing model used the manually cleaned dictionary and obtained an F-score of 0. 46 for the racist class on a test set consisting of unseen Dutch comments, retrieved from the same sites used for the training set.

A Short Review of Ethical Challenges in Clinical Natural Language Processing

1 code implementation WS 2017 Simon Šuster, Stéphan Tulkens, Walter Daelemans

Clinical NLP has an immense potential in contributing to how clinical practice will be revolutionized by the advent of large scale processing of clinical records.

Unsupervised Context-Sensitive Spelling Correction of Clinical Free-Text with Word and Character N-Gram Embeddings

no code implementations WS 2017 Pieter Fivez, Simon {\v{S}}uster, Walter Daelemans

We present an unsupervised context-sensitive spelling correction method for clinical free-text that uses word and character n-gram embeddings.

Spelling Correction

Simple Queries as Distant Labels for Predicting Gender on Twitter

no code implementations WS 2017 Chris Emmery, Grzegorz Chrupa{\l}a, Walter Daelemans

The majority of research on extracting missing user attributes from social media profiles use costly hand-annotated labels for supervised learning.

Gender Classification General Classification

Towards the Improvement of Automatic Emotion Pre-annotation with Polarity and Subjective Information

no code implementations RANLP 2017 Lea Canales, Walter Daelemans, Ester Boldrini, Patricio Mart{\'\i}nez-Barco

Our objective in this paper is to show the pre-annotation process, as well as to evaluate the usability of subjective and polarity information in this process.

Unsupervised Context-Sensitive Spelling Correction of English and Dutch Clinical Free-Text with Word and Character N-Gram Embeddings

1 code implementation19 Oct 2017 Pieter Fivez, Simon Šuster, Walter Daelemans

We present an unsupervised context-sensitive spelling correction method for clinical free-text that uses word and character n-gram embeddings.

Spelling Correction

Unsupervised patient representations from clinical notes with interpretable classification decisions

no code implementations14 Nov 2017 Madhumita Sushil, Simon Šuster, Kim Luyckx, Walter Daelemans

To understand and interpret the representations, we explore the best encoded features within the patient representations obtained from the autoencoder model.

Classification Denoising +1

Automatic Detection of Cyberbullying in Social Media Text

no code implementations17 Jan 2018 Cynthia Van Hee, Gilles Jacobs, Chris Emmery, Bart Desmet, Els Lefever, Ben Verhoeven, Guy De Pauw, Walter Daelemans, Véronique Hoste

While social media offer great communication opportunities, they also increase the vulnerability of young people to threatening situations online.

Binary Classification

Patient representation learning and interpretable evaluation using clinical notes

no code implementations3 Jul 2018 Madhumita Sushil, Simon Šuster, Kim Luyckx, Walter Daelemans

We compare the model performance of the feature set constructed from a bag of words to that obtained from medical concepts.

Denoising General Classification +1

Exploring Classifier Combinations for Language Variety Identification

no code implementations COLING 2018 Tim Kreutz, Walter Daelemans

This paper describes CLiPS{'}s submissions for the Discriminating between Dutch and Flemish in Subtitles (DFS) shared task at VarDial 2018.

Language Identification POS

Rule induction for global explanation of trained models

1 code implementation WS 2018 Madhumita Sushil, Simon Šuster, Walter Daelemans

We find that the output rule-sets can explain the predictions of a neural network trained for 4-class text classification from the 20 newsgroups dataset to a macro-averaged F-score of 0. 80.

Feature Importance text-classification +1

Multilingual Cross-domain Perspectives on Online Hate Speech

no code implementations11 Sep 2018 Tom De Smedt, Sylvia Jaki, Eduan Kotzé, Leïla Saoud, Maja Gwóźdź, Guy De Pauw, Walter Daelemans

In this report, we present a study of eight corpora of online hate speech, by demonstrating the NLP techniques that we used to collect and analyze the jihadist, extremist, racist, and sexist content.

General Classification text-classification +1

Predicting Adolescents' Educational Track from Chat Messages on Dutch Social Media

no code implementations WS 2018 Lisa Hilte, Walter Daelemans, V, Reinhild ekerckhove

We aim to predict Flemish adolescents{'} educational track based on their Dutch social media writing.

Revisiting neural relation classification in clinical notes with external information

1 code implementation WS 2018 Simon {\v{S}}uster, Madhumita Sushil, Walter Daelemans

Recently, segment convolutional neural networks have been proposed for end-to-end relation extraction in the clinical domain, achieving results comparable to or outperforming the approaches with heavy manual feature engineering.

Classification Feature Engineering +5

From Strings to Other Things: Linking the Neighborhood and Transposition Effects in Word Reading

1 code implementation CONLL 2018 St{\'e}phan Tulkens, S, Dominiek ra, Walter Daelemans

We conclude that the neighborhood effect is unlikely to have a perceptual basis, but is more likely to be the result of items co-activating after recognition.

A weakly supervised sequence tagging and grammar induction approach to semantic frame slot filling

no code implementations15 Jun 2019 Janneke van de Loo, Guy De Pauw, Walter Daelemans

This paper describes continuing work on semantic frame slot filling for a command and control task using a weakly-supervised approach.

slot-filling Slot Filling

Why can't memory networks read effectively?

no code implementations16 Oct 2019 Simon Šuster, Madhumita Sushil, Walter Daelemans

Memory networks have been a popular choice among neural architectures for machine reading comprehension and question answering.

Machine Reading Comprehension Question Answering

Orthographic Codes and the Neighborhood Effect: Lessons from Information Theory

no code implementations LREC 2020 St{\'e}phan Tulkens, S, Dominiek ra, Walter Daelemans

We consider the orthographic neighborhood effect: the effect that words with more orthographic similarity to other words are read faster.

Streaming Language-Specific Twitter Data with Optimal Keywords

no code implementations LREC 2020 Tim Kreutz, Walter Daelemans

In these cases, key phrases that limit finding the competitive language are selected, and overall recall on the target language also decreases.

Distilling neural networks into skipgram-level decision lists

2 code implementations14 May 2020 Madhumita Sushil, Simon Šuster, Walter Daelemans

For evaluation of explanations, we create a synthetic sepsis-identification dataset, as well as apply our technique on additional clinical and sentiment analysis datasets.

Sentiment Analysis

Character-level Transformer-based Neural Machine Translation

no code implementations22 May 2020 Nikolay Banar, Walter Daelemans, Mike Kestemont

To stimulate further research in this area and close the gap with subword-level NMT, we make all our code and models publicly available.

Machine Translation NMT +1

Sarcasm Detection Using an Ensemble Approach

no code implementations WS 2020 Jens Lemmens, Ben Burtenshaw, Ehsan Lotfi, Ilia Markov, Walter Daelemans

We present an ensemble approach for the detection of sarcasm in Reddit and Twitter responses in the context of The Second Workshop on Figurative Language Processing held in conjunction with ACL 2020.

Sarcasm Detection

A Deep Generative Approach to Native Language Identification

no code implementations COLING 2020 Ehsan Lotfi, Ilia Markov, Walter Daelemans

Native language identification (NLI) {--} identifying the native language (L1) of a person based on his/her writing in the second language (L2) {--} is useful for a variety of purposes, including marketing, security, and educational applications.

BIG-bench Machine Learning Language Modelling +3

Conceptual Grounding Constraints for Truly Robust Biomedical Name Representations

1 code implementation EACL 2021 Pieter Fivez, Simon Suster, Walter Daelemans

Effective representation of biomedical names for downstream NLP tasks requires the encoding of both lexical as well as domain-specific semantic information.

ConveRT for FAQ Answering

1 code implementation2 Aug 2021 Maxime De Bruyn, Ehsan Lotfi, Jeska Buhmann, Walter Daelemans

While powerful and efficient retrieval-based models exist for English, it is rarely the case for other languages for which the same amount of training data is not available.

Chatbot Retrieval

Teach Me What to Say and I Will Learn What to Pick: Unsupervised Knowledge Selection Through Response Generation with Pretrained Generative Models

no code implementations EMNLP (NLP4ConvAI) 2021 Ehsan Lotfi, Maxime De Bruyn, Jeska Buhmann, Walter Daelemans

In this work we study the unsupervised selection abilities of pre-trained generative models (e. g. BART) and show that by adding a score-and-aggregate module between encoder and decoder, they are capable of learning to pick the proper knowledge through minimising the language modelling loss (i. e. without having access to knowledge labels).

Language Modelling Response Generation +1

Cyberbullying Classifiers are Sensitive to Model-Agnostic Perturbations

1 code implementation LREC 2022 Chris Emmery, Ákos Kádár, Grzegorz Chrupała, Walter Daelemans

The perturbed data, models, and code are available for reproduction at https://github. com/cmry/augtox

PersonalityChat: Conversation Distillation for Personalized Dialog Modeling with Facts and Traits

no code implementations14 Jan 2024 Ehsan Lotfi, Maxime De Bruyn, Jeska Buhmann, Walter Daelemans

The new wave of Large Language Models (LLM) has offered an efficient tool to curate sizeable conversational datasets.

Contextual explanation rules for neural clinical classifiers

no code implementations NAACL (BioNLP) 2021 Madhumita Sushil, Simon Suster, Walter Daelemans

For evaluation of explanations, we create a synthetic sepsis-identification dataset, as well as apply our technique on additional clinical and sentiment analysis datasets.

Sentiment Analysis

Exploring Stylometric and Emotion-Based Features for Multilingual Cross-Domain Hate Speech Detection

no code implementations EACL (WASSA) 2021 Ilia Markov, Nikola Ljubešić, Darja Fišer, Walter Daelemans

In this paper, we describe experiments designed to evaluate the impact of stylometric and emotion-based features on hate speech detection: the task of classifying textual content into hate or non-hate speech classes.

Hate Speech Detection

Scalable Few-Shot Learning of Robust Biomedical Name Representations

1 code implementation NAACL (BioNLP) 2021 Pieter Fivez, Simon Suster, Walter Daelemans

Recent research on robust representations of biomedical names has focused on modeling large amounts of fine-grained conceptual distinctions using complex neural encoders.

Continual Learning Few-Shot Learning

Improving Cross-Domain Hate Speech Detection by Reducing the False Positive Rate

no code implementations NAACL (NLP4IF) 2021 Ilia Markov, Walter Daelemans

Hate speech detection is an actively growing field of research with a variety of recently proposed approaches that allowed to push the state-of-the-art results.

Blocking Hate Speech Detection

Improving Hate Speech Type and Target Detection with Hateful Metaphor Features

no code implementations NAACL (NLP4IF) 2021 Jens Lemmens, Ilia Markov, Walter Daelemans

We study the usefulness of hateful metaphorsas features for the identification of the type and target of hate speech in Dutch Facebook comments.

Vocal Bursts Type Prediction

The Role of Context in Detecting the Target of Hate Speech

no code implementations TRAC (COLING) 2022 Ilia Markov, Walter Daelemans

Online hate speech detection is an inherently challenging task that has recently received much attention from the natural language processing community.

Hate Speech Detection Language Modelling

The LiLaH Emotion Lexicon of Croatian, Dutch and Slovene

no code implementations COLING (PEOPLES) 2020 Nikola Ljubešić, Ilia Markov, Darja Fišer, Walter Daelemans

We further showcase the usage of the lexicons by calculating the difference in emotion distributions in texts containing and not containing socially unacceptable discourse, comparing them across four languages (English, Croatian, Dutch, Slovene) and two topics (migrants and LGBT).

Translation

Mapping probability word problems to executable representations

no code implementations EMNLP 2021 Simon Suster, Pieter Fivez, Pietro Totis, Angelika Kimmig, Jesse Davis, Luc De Raedt, Walter Daelemans

While solving math word problems automatically has received considerable attention in the NLP community, few works have addressed probability word problems specifically.

Contextualised Word Representations Math +2

Integrating Higher-Level Semantics into Robust Biomedical Name Representations

1 code implementation EACL (Louhi) 2021 Pieter Fivez, Simon Suster, Walter Daelemans

It has not yet been empirically confirmed that training biomedical name encoders on fine-grained distinctions automatically leads to bottom-up encoding of such higher-level semantics.

Cannot find the paper you are looking for? You can Submit a new open access paper.