Multi-label toxicity detection is highly prominent, with many research groups, companies, and individuals engaging with it through shared tasks and dedicated venues.
We report on an inter-annotator agreement experiment involving instances of text reuse focusing on the well-known case of biblical intertextuality in medieval literature.
This study is devoted to two of the oldest known manuscripts in which the oeuvre of the medieval mystical author Hadewijch has been preserved: Brussels, KBR, 2879-2880 (ms. A) and Brussels, KBR, 2877-2878 (ms. B).
This paper describes the system developed by the Antwerp Centre for Digital humanities and literary Criticism [UAntwerp] for toxic span detection.
To stimulate further research in this area and close the gap with subword-level NMT, we make all our code and models publicly available.
We study the generalization properties of pruned neural networks that are the winners of the lottery ticket hypothesis on datasets of natural images.
Fictional prose can be broadly divided into narrative and discursive forms with direct speech being central to any discourse representation (alongside indirect reported speech and free indirect discourse).
The detection of allusive text reuse is particularly challenging due to the sparse evidence on which allusive references rely---commonly based on none or very few shared words.
Lemmatization of standard languages is concerned with (i) abstracting over morphological differences and (ii) resolving token-lemma ambiguities of inflected words in order to map them to a dictionary headword.
Recent applications of neural language models have led to an increased interest in the automatic generation of natural language.
In this paper we consider two sequence tagging tasks for medieval Latin: part-of-speech tagging and lemmatization.
Although in recent years numerous forms of Internet communication ― such as e-mail, blogs, chat rooms and social network environments ― have emerged, balanced corpora of Internet speech with trustworthy meta-information (e. g. age and gender) or linguistic annotations are still limited.