Search Results for author: Udo Hahn

Found 38 papers, 15 papers with code

“Beste Grüße, Maria Meyer” — Pseudonymization of Privacy-Sensitive Information in Emails

no code implementations LREC 2022 Elisabeth Eder, Michael Wiegand, Ulrike Krieg-Holz, Udo Hahn

The exploding amount of user-generated content has spurred NLP research to deal with documents from various digital communication formats (tweets, chats, emails, etc.).

Emotion Embeddings $\unicode{x2014}$ Learning Stable and Homogeneous Abstractions from Heterogeneous Affective Datasets

no code implementations15 Aug 2023 Sven Buechel, Udo Hahn

Human emotion is expressed in many communication modalities and media formats and so their computational study is equally diversified into natural language processing, audio signal analysis, computer vision, etc.

Emotion Recognition

EmoBank: Studying the Impact of Annotation Perspective and Representation Format on Dimensional Emotion Analysis

1 code implementation EACL 2017 Sven Buechel, Udo Hahn

We describe EmoBank, a corpus of 10k English sentences balancing multiple genres, which we annotated with dimensional emotion metadata in the Valence-Arousal-Dominance (VAD) representation format.

Emotion Recognition

Acquiring a Formality-Informed Lexical Resource for Style Analysis

1 code implementation EACL 2021 Elisabeth Eder, Ulrike Krieg-Holz, Udo Hahn

To track different levels of formality in written discourse, we introduce a novel type of lexicon for the German language, with entries ordered by their degree of (in)formality.

regression Sentence

Towards Label-Agnostic Emotion Embeddings

no code implementations EMNLP 2021 Sven Buechel, Luise Modersohn, Udo Hahn

Research in emotion analysis is scattered across different label formats (e. g., polarity types, basic emotion categories, and affective dimensions), linguistic levels (word vs. sentence vs. discourse), and, of course, (few well-resourced but much more under-resourced) natural languages and text genres (e. g., product reviews, tweets, news).

Emotion Recognition Sentence

What Makes a Top-Performing Precision Medicine Search Engine? Tracing Main System Features in a Systematic Way

no code implementations4 Jun 2020 Erik Faessler, Michel Oleynik, Udo Hahn

From 2017 to 2019 the Text REtrieval Conference (TREC) held a challenge task on precision medicine using documents from medical publications (PubMed) and clinical trials.

Retrieval SMAC+ +1

Learning and Evaluating Emotion Lexicons for 91 Languages

1 code implementation ACL 2020 Sven Buechel, Susanna Rücker, Udo Hahn

Emotion lexicons describe the affective meaning of words and thus constitute a centerpiece for advanced sentiment and emotion analysis.

Emotion Recognition Translation +1

Allgemeine Musikalische Zeitung as a Searchable Online Corpus

no code implementations LREC 2020 Bernd Kampe, Tinghui Duan, Udo Hahn

The massive digitization efforts related to historical newspapers over the past decades have focused on mass media sources and ordinary people as their primary recipients.

Philosophy

CodE Alltag 2.0 --- A Pseudonymized German-Language Email Corpus

no code implementations LREC 2020 Elisabeth Eder, Ulrike Krieg-Holz, Udo Hahn

The vast amount of social communication distributed over various electronic media channels (tweets, blogs, emails, etc.

De-identification

De-Identification of Emails: Pseudonymizing Privacy-Sensitive Data in a German Email Corpus

no code implementations RANLP 2019 Elisabeth Eder, Ulrike Krieg-Holz, Udo Hahn

We deal with the pseudonymization of those stretches of text in emails that might allow to identify real individual persons.

De-identification

Continuous Quality Control and Advanced Text Segment Annotation with WAT-SL 2.0

1 code implementation WS 2019 Christina Lohr, Johannes Kiesel, Stephanie Luther, Johannes Hellrich, Tobias Kolditz, Benno Stein, Udo Hahn

Today{'}s widely used annotation tools were designed for annotating typically short textual mentions of entities or relations, making their interface cumbersome to use for long(er) stretches of text, e. g, sentences running over several lines in a document.

At the Lower End of Language---Exploring the Vulgar and Obscene Side of German

no code implementations WS 2019 Elisabeth Eder, Ulrike Krieg-Holz, Udo Hahn

In this paper, we describe a workflow for the data-driven acquisition and semantic scaling of a lexicon that covers lexical items from the lower end of the German language register{---}terms typically considered as rough, vulgar or obscene.

The Influence of Down-Sampling Strategies on SVD Word Embedding Stability

no code implementations WS 2019 Johannes Hellrich, Bernd Kampe, Udo Hahn

The stability of word embedding algorithms, i. e., the consistency of the word representations they reveal when trained repeatedly on the same data set, has recently raised concerns.

Word Embeddings

JeSemE: Interleaving Semantics and Emotions in a Web Service for the Exploration of Language Change Phenomena

no code implementations COLING 2018 Johannes Hellrich, Sven Buechel, Udo Hahn

We here introduce a substantially extended version of JeSemE, an interactive website for visually exploring computationally derived time-variant information on word meanings and lexical emotions assembled from five large diachronic text corpora.

Sentiment Analysis Word Embeddings

JeSemE: A Website for Exploring Diachronic Changes in Word Meaning and Emotion

2 code implementations11 Jul 2018 Johannes Hellrich, Sven Buechel, Udo Hahn

We here introduce a substantially extended version of JeSemE, an interactive website for visually exploring computationally derived time-variant information on word meanings and lexical emotions assembled from five large diachronic text corpora.

Representation Mapping: A Novel Approach to Generate High-Quality Multi-Lingual Emotion Lexicons

1 code implementation LREC 2018 Sven Buechel, Udo Hahn

In the past years, sentiment analysis has increasingly shifted attention to representational frameworks more expressive than semantic polarity (being positive, negative or neutral).

Sentiment Analysis

Emotion Representation Mapping for Automatic Lexicon Construction (Mostly) Performs on Human Level

1 code implementation COLING 2018 Sven Buechel, Udo Hahn

Emotion Representation Mapping (ERM) has the goal to convert existing emotion ratings from one representation format into another one, e. g., mapping Valence-Arousal-Dominance annotations for words or sentences into Ekman's Basic Emotions and vice versa.

Modeling Word Emotion in Historical Language: Quantity Beats Supposed Stability in Seed Word Selection

no code implementations WS 2019 Johannes Hellrich, Sven Buechel, Udo Hahn

To understand historical texts, we must be aware that language -- including the emotional connotation attached to words -- changes over time.

Readers vs. Writers vs. Texts: Coping with Different Perspectives of Text Understanding in Emotion Annotation

1 code implementation WS 2017 Sven Buechel, Udo Hahn

We here examine how different perspectives of understanding written discourse, like the reader{'}s, the writer{'}s or the text{'}s point of view, affect the quality of emotion annotations.

Reading Comprehension

Bad Company---Neighborhoods in Neural Embedding Spaces Considered Harmful

1 code implementation COLING 2016 Johannes Hellrich, Udo Hahn

We assess the reliability and accuracy of (neural) word embeddings for both modern and historical English and German.

Word Embeddings

Feelings from the Past---Adapting Affective Lexicons for Historical Emotion Analysis

no code implementations WS 2016 Sven Buechel, Johannes Hellrich, Udo Hahn

We here describe a novel methodology for measuring affective language in historical text by expanding an affective lexicon and jointly adapting it to prior language stages.

Emotion Recognition Word Embeddings

UIMA-Based JCoRe 2.0 Goes GitHub and Maven Central ― State-of-the-Art Software Resource Engineering and Distribution of NLP Pipelines

no code implementations LREC 2016 Udo Hahn, Franz Matthies, Erik Faessler, Johannes Hellrich

We introduce JCoRe 2. 0, the relaunch of a UIMA-based open software repository for full-scale natural language processing originating from the Jena University Language {\&} Information Engineering (JULIE) Lab.

Management

Collaboratively Annotating Multilingual Parallel Corpora in the Biomedical Domain---some MANTRAs

no code implementations LREC 2014 Johannes Hellrich, Simon Clematide, Udo Hahn, Dietrich Rebholz-Schuhmann

The coverage of multilingual biomedical resources is high for the English language, yet sparse for non-English languages―an observation which holds for seemingly well-resourced, yet still dramatically low-resourced ones such as Spanish, French or German but even more so for really under-resourced ones such as Dutch.

Named Entity Recognition (NER) Translation

Disclose Models, Hide the Data - How to Make Use of Confidential Corpora without Seeing Sensitive Raw Data

no code implementations LREC 2014 Erik Faessler, Johannes Hellrich, Udo Hahn

Confidential corpora from the medical, enterprise, security or intelligence domains often contain sensitive raw data which lead to severe restrictions as far as the public accessibility and distribution of such language resources are concerned.

POS POS Tagging +1

CALBC: Releasing the Final Corpora

no code implementations LREC 2012 {\c{S}}enay Kafkas, Ian Lewin, David Milward, Erik van Mulligen, Jan Kors, Udo Hahn, Dietrich Rebholz-Schuhmann

These usually lead to implementation of trained solutions (1) for a limited number of semantic entity types and (2) lacking in generalization capability.

named-entity-recognition Named Entity Recognition +1

Cannot find the paper you are looking for? You can Submit a new open access paper.