1 code implementation • LREC 2022 • Florian Borchert, Christina Lohr, Luise Modersohn, Jonas Witt, Thomas Langer, Markus Follmann, Matthias Gietzelt, Bert Arnrich, Udo Hahn, Matthieu-P. Schapranow
Despite remarkable advances in the development of language resources over the recent years, there is still a shortage of annotated, publicly available corpora covering (German) medical language.
no code implementations • LREC 2022 • Elisabeth Eder, Michael Wiegand, Ulrike Krieg-Holz, Udo Hahn
The exploding amount of user-generated content has spurred NLP research to deal with documents from various digital communication formats (tweets, chats, emails, etc.).
no code implementations • 15 Aug 2023 • Sven Buechel, Udo Hahn
Human emotion is expressed in many communication modalities and media formats and so their computational study is equally diversified into natural language processing, audio signal analysis, computer vision, etc.
1 code implementation • EACL 2017 • Sven Buechel, Udo Hahn
We describe EmoBank, a corpus of 10k English sentences balancing multiple genres, which we annotated with dimensional emotion metadata in the Valence-Arousal-Dominance (VAD) representation format.
1 code implementation • EACL 2021 • Elisabeth Eder, Ulrike Krieg-Holz, Udo Hahn
To track different levels of formality in written discourse, we introduce a novel type of lexicon for the German language, with entries ordered by their degree of (in)formality.
no code implementations • EMNLP 2021 • Sven Buechel, Luise Modersohn, Udo Hahn
Research in emotion analysis is scattered across different label formats (e. g., polarity types, basic emotion categories, and affective dimensions), linguistic levels (word vs. sentence vs. discourse), and, of course, (few well-resourced but much more under-resourced) natural languages and text genres (e. g., product reviews, tweets, news).
1 code implementation • EMNLP (Louhi) 2020 • Florian Borchert, Christina Lohr, Luise Modersohn, Thomas Langer, Markus Follmann, Jan Philipp Sachs, Udo Hahn, Matthieu-P. Schapranow
The lack of publicly accessible text corpora is a major obstacle for progress in natural language processing.
no code implementations • 4 Jun 2020 • Erik Faessler, Michel Oleynik, Udo Hahn
From 2017 to 2019 the Text REtrieval Conference (TREC) held a challenge task on precision medicine using documents from medical publications (PubMed) and clinical trials.
1 code implementation • ACL 2020 • Sven Buechel, Susanna Rücker, Udo Hahn
Emotion lexicons describe the affective meaning of words and thus constitute a centerpiece for advanced sentiment and emotion analysis.
no code implementations • LREC 2020 • Bernd Kampe, Tinghui Duan, Udo Hahn
The massive digitization efforts related to historical newspapers over the past decades have focused on mass media sources and ordinary people as their primary recipients.
no code implementations • LREC 2020 • Elisabeth Eder, Ulrike Krieg-Holz, Udo Hahn
The vast amount of social communication distributed over various electronic media channels (tweets, blogs, emails, etc.
no code implementations • LREC 2020 • Erik Faessler, Luise Modersohn, Christina Lohr, Udo Hahn
Genes and proteins constitute the fundamental entities of molecular genetics.
no code implementations • WS 2019 • Sven Buechel, Simon Junker, Thore Schlaak, Claus Michelsen, Udo Hahn
We examine the affective content of central bank press statements using emotion analysis.
no code implementations • RANLP 2019 • Elisabeth Eder, Ulrike Krieg-Holz, Udo Hahn
We deal with the pseudonymization of those stretches of text in emails that might allow to identify real individual persons.
1 code implementation • WS 2019 • Christina Lohr, Johannes Kiesel, Stephanie Luther, Johannes Hellrich, Tobias Kolditz, Benno Stein, Udo Hahn
Today{'}s widely used annotation tools were designed for annotating typically short textual mentions of entities or relations, making their interface cumbersome to use for long(er) stretches of text, e. g, sentences running over several lines in a document.
no code implementations • WS 2019 • Elisabeth Eder, Ulrike Krieg-Holz, Udo Hahn
In this paper, we describe a workflow for the data-driven acquisition and semantic scaling of a lexicon that covers lexical items from the lower end of the German language register{---}terms typically considered as rough, vulgar or obscene.
no code implementations • WS 2019 • Johannes Hellrich, Bernd Kampe, Udo Hahn
The stability of word embedding algorithms, i. e., the consistency of the word representations they reveal when trained repeatedly on the same data set, has recently raised concerns.
no code implementations • COLING 2018 • Johannes Hellrich, Sven Buechel, Udo Hahn
We here introduce a substantially extended version of JeSemE, an interactive website for visually exploring computationally derived time-variant information on word meanings and lexical emotions assembled from five large diachronic text corpora.
2 code implementations • 11 Jul 2018 • Johannes Hellrich, Sven Buechel, Udo Hahn
We here introduce a substantially extended version of JeSemE, an interactive website for visually exploring computationally derived time-variant information on word meanings and lexical emotions assembled from five large diachronic text corpora.
1 code implementation • LREC 2018 • Sven Buechel, Udo Hahn
In the past years, sentiment analysis has increasingly shifted attention to representational frameworks more expressive than semantic polarity (being positive, negative or neutral).
no code implementations • WS 2018 • Sebastian G. M. H{\"a}ndschke, Sven Buechel, Jan Goldenstein, Philipp Poschmann, Tinghui Duan, Peter Walgenbach, Udo Hahn
We introduce JOCo, a novel text corpus for NLP analytics in the field of economics, business and management.
1 code implementation • COLING 2018 • Sven Buechel, Udo Hahn
Emotion Representation Mapping (ERM) has the goal to convert existing emotion ratings from one representation format into another one, e. g., mapping Valence-Arousal-Dominance annotations for words or sentences into Ekman's Basic Emotions and vice versa.
no code implementations • WS 2019 • Johannes Hellrich, Sven Buechel, Udo Hahn
To understand historical texts, we must be aware that language -- including the emotional connotation attached to words -- changes over time.
1 code implementation • NAACL 2018 • Sven Buechel, Udo Hahn
Predicting the emotional value of lexical items is a well-known problem in sentiment analysis.
1 code implementation • WS 2017 • Sven Buechel, Udo Hahn
We here examine how different perspectives of understanding written discourse, like the reader{'}s, the writer{'}s or the text{'}s point of view, affect the quality of emotion annotations.
1 code implementation • COLING 2016 • Johannes Hellrich, Udo Hahn
We assess the reliability and accuracy of (neural) word embeddings for both modern and historical English and German.
no code implementations • WS 2016 • Sven Buechel, Johannes Hellrich, Udo Hahn
We here describe a novel methodology for measuring affective language in historical text by expanding an affective lexicon and jointly adapting it to prior language stages.
no code implementations • LREC 2016 • Udo Hahn, Franz Matthies, Erik Faessler, Johannes Hellrich
We introduce JCoRe 2. 0, the relaunch of a UIMA-based open software repository for full-scale natural language processing originating from the Jena University Language {\&} Information Engineering (JULIE) Lab.
no code implementations • LREC 2016 • Ulrike Krieg-Holz, Christian Schuschnig, Franz Matthies, Benjamin Redling, Udo Hahn
We introduce CODE ALLTAG, a text corpus composed of German-language e-mails.
no code implementations • LREC 2014 • Johannes Hellrich, Simon Clematide, Udo Hahn, Dietrich Rebholz-Schuhmann
The coverage of multilingual biomedical resources is high for the English language, yet sparse for non-English languages―an observation which holds for seemingly well-resourced, yet still dramatically low-resourced ones such as Spanish, French or German but even more so for really under-resourced ones such as Dutch.
no code implementations • LREC 2014 • Erik Faessler, Johannes Hellrich, Udo Hahn
Confidential corpora from the medical, enterprise, security or intelligence domains often contain sensitive raw data which lead to severe restrictions as far as the public accessibility and distribution of such language resources are concerned.
no code implementations • LREC 2012 • {\c{S}}enay Kafkas, Ian Lewin, David Milward, Erik van Mulligen, Jan Kors, Udo Hahn, Dietrich Rebholz-Schuhmann
These usually lead to implementation of trained solutions (1) for a limited number of semantic entity types and (2) lacking in generalization capability.
no code implementations • LREC 2012 • Udo Hahn, Elena Beisswanger, Ekaterina Buyko, Erik Faessler, Jenny Traum{\"u}ller, Susann Schr{\"o}der, Kerstin Hornbostel
We here discuss a methodology for dealing with the annotation of semantically hard to delineate, i. e., sloppy, named entity types.