Search Results for author: Mike Kestemont

Found 18 papers, 3 papers with code

A Dutch Dataset for Cross-lingual Multilabel Toxicity Detection

no code implementations RANLP (BUCC) 2021 Ben Burtenshaw, Mike Kestemont

Multi-label toxicity detection is highly prominent, with many research groups, companies, and individuals engaging with it through shared tasks and dedicated venues.

Multi Label Text Classification Multi-Label Text Classification +1

Quantifying Contextual Aspects of Inter-annotator Agreement in Intertextuality Research

no code implementations EMNLP (LaTeCHCLfL, CLFL, LaTeCH) 2021 Enrique Manjavacas Arevalo, Laurence Mellerin, Mike Kestemont

We report on an inter-annotator agreement experiment involving instances of text reuse focusing on the well-known case of biblical intertextuality in medieval literature.

From exemplar to copy: the scribal appropriation of a Hadewijch manuscript computationally explored

1 code implementation25 Oct 2022 Wouter Haverals, Mike Kestemont

This study is devoted to two of the oldest known manuscripts in which the oeuvre of the medieval mystical author Hadewijch has been preserved: Brussels, KBR, 2879-2880 (ms. A) and Brussels, KBR, 2877-2878 (ms. B).

UAntwerp at SemEval-2021 Task 5: Spans are Spans, stacking a binary word level approach to toxic span detection

no code implementations SEMEVAL 2021 Ben Burtenshaw, Mike Kestemont

This paper describes the system developed by the Antwerp Centre for Digital humanities and literary Criticism [UAntwerp] for toxic span detection.

RFC-0000 - RFC on RFCs

no code implementations TimeMachine RFC 2031 Frédéric Kaplan, Kevin Baumer, Mike Kestemont, Daniel Jeller

Reaching consensus on the technology options to pursue in a programme as large as Time Machine is a complex issue.

Character-level Transformer-based Neural Machine Translation

no code implementations22 May 2020 Nikolay Banar, Walter Daelemans, Mike Kestemont

To stimulate further research in this area and close the gap with subword-level NMT, we make all our code and models publicly available.

Machine Translation NMT +1

On the Transferability of Winning Tickets in Non-Natural Image Datasets

no code implementations11 May 2020 Matthia Sabatelli, Mike Kestemont, Pierre Geurts

We study the generalization properties of pruned neural networks that are the winners of the lottery ticket hypothesis on datasets of natural images.

Detecting Direct Speech in Multilingual Collection of 19th-century Novels

no code implementations LREC 2020 Joanna Byszuk, Micha{\l} Wo{\'z}niak, Mike Kestemont, Albert Le{\'s}niak, Wojciech {\L}ukasik, Artjoms {\v{S}}e{\c{l}}a, Maciej Eder

Fictional prose can be broadly divided into narrative and discursive forms with direct speech being central to any discourse representation (alongside indirect reported speech and free indirect discourse).


On the Feasibility of Automated Detection of Allusive Text Reuse

no code implementations WS 2019 Enrique Manjavacas, Brian Long, Mike Kestemont

The detection of allusive text reuse is particularly challenging due to the sparse evidence on which allusive references rely---commonly based on none or very few shared words.

Information Retrieval Retrieval

Improving Lemmatization of Non-Standard Languages with Joint Learning

2 code implementations NAACL 2019 Enrique Manjavacas, Ákos Kádár, Mike Kestemont

Lemmatization of standard languages is concerned with (i) abstracting over morphological differences and (ii) resolving token-lemma ambiguities of inflected words in order to map them to a dictionary headword.

Language Modelling LEMMA +4

Integrated Sequence Tagging for Medieval Latin Using Deep Representation Learning

1 code implementation4 Mar 2016 Mike Kestemont, Jeroen De Gussem

In this paper we consider two sequence tagging tasks for medieval Latin: part-of-speech tagging and lemmatization.

LEMMA Lemmatization +4

The Netlog Corpus. A Resource for the Study of Flemish Dutch Internet Language

no code implementations LREC 2012 Mike Kestemont, Claudia Peersman, Benny De Decker, Guy De Pauw, Kim Luyckx, Roser Morante, Frederik Vaassen, Janneke van de Loo, Walter Daelemans

Although in recent years numerous forms of Internet communication ― such as e-mail, blogs, chat rooms and social network environments ― have emerged, balanced corpora of Internet speech with trustworthy meta-information (e. g. age and gender) or linguistic annotations are still limited.

Lemmatization POS +2

Cannot find the paper you are looking for? You can Submit a new open access paper.