Search Results for author: Udo Hahn

Found 38 papers, 15 papers with code

GGPONC 2.0 - The German Clinical Guideline Corpus for Oncology: Curation Workflow, Annotation Policy, Baseline NER Taggers

1 code implementation • LREC 2022 • Florian Borchert, Christina Lohr, Luise Modersohn, Jonas Witt, Thomas Langer, Markus Follmann, Matthias Gietzelt, Bert Arnrich, Udo Hahn, Matthieu-P. Schapranow

Despite remarkable advances in the development of language resources over the recent years, there is still a shortage of annotated, publicly available corpora covering (German) medical language.

named-entity-recognition Named Entity Recognition +1

Paper
Code

“Beste Grüße, Maria Meyer” — Pseudonymization of Privacy-Sensitive Information in Emails

no code implementations • LREC 2022 • Elisabeth Eder, Michael Wiegand, Ulrike Krieg-Holz, Udo Hahn

The exploding amount of user-generated content has spurred NLP research to deal with documents from various digital communication formats (tweets, chats, emails, etc.).

Paper
Add Code

Emotion Embeddings $\unicode{x2014}$ Learning Stable and Homogeneous Abstractions from Heterogeneous Affective Datasets

no code implementations • 15 Aug 2023 • Sven Buechel, Udo Hahn

Human emotion is expressed in many communication modalities and media formats and so their computational study is equally diversified into natural language processing, audio signal analysis, computer vision, etc.

Emotion Recognition

Paper
Add Code

EmoBank: Studying the Impact of Annotation Perspective and Representation Format on Dimensional Emotion Analysis

1 code implementation • EACL 2017 • Sven Buechel, Udo Hahn

We describe EmoBank, a corpus of 10k English sentences balancing multiple genres, which we annotated with dimensional emotion metadata in the Valence-Arousal-Dominance (VAD) representation format.

Emotion Recognition

184

Paper
Code

Acquiring a Formality-Informed Lexical Resource for Style Analysis

1 code implementation • EACL 2021 • Elisabeth Eder, Ulrike Krieg-Holz, Udo Hahn

To track different levels of formality in written discourse, we introduce a novel type of lexicon for the German language, with entries ordered by their degree of (in)formality.

regression Sentence

Paper
Code

Towards Label-Agnostic Emotion Embeddings

no code implementations • EMNLP 2021 • Sven Buechel, Luise Modersohn, Udo Hahn

Research in emotion analysis is scattered across different label formats (e. g., polarity types, basic emotion categories, and affective dimensions), linguistic levels (word vs. sentence vs. discourse), and, of course, (few well-resourced but much more under-resourced) natural languages and text genres (e. g., product reviews, tweets, news).

Emotion Recognition Sentence

Paper
Add Code

GGPONC: A Corpus of German Medical Text with Rich Metadata Based on Clinical Practice Guidelines

1 code implementation • EMNLP (Louhi) 2020 • Florian Borchert, Christina Lohr, Luise Modersohn, Thomas Langer, Markus Follmann, Jan Philipp Sachs, Udo Hahn, Matthieu-P. Schapranow

The lack of publicly accessible text corpora is a major obstacle for progress in natural language processing.

Paper
Code

What Makes a Top-Performing Precision Medicine Search Engine? Tracing Main System Features in a Systematic Way

no code implementations • 4 Jun 2020 • Erik Faessler, Michel Oleynik, Udo Hahn

From 2017 to 2019 the Text REtrieval Conference (TREC) held a challenge task on precision medicine using documents from medical publications (PubMed) and clinical trials.

Retrieval SMAC+ +1

Paper
Add Code

Learning and Evaluating Emotion Lexicons for 91 Languages

1 code implementation • ACL 2020 • Sven Buechel, Susanna Rücker, Udo Hahn

Emotion lexicons describe the affective meaning of words and thus constitute a centerpiece for advanced sentiment and emotion analysis.

Emotion Recognition Translation +1

Paper
Code

Allgemeine Musikalische Zeitung as a Searchable Online Corpus

no code implementations • LREC 2020 • Bernd Kampe, Tinghui Duan, Udo Hahn

The massive digitization efforts related to historical newspapers over the past decades have focused on mass media sources and ordinary people as their primary recipients.

Philosophy

Paper
Add Code

CodE Alltag 2.0 --- A Pseudonymized German-Language Email Corpus

no code implementations • LREC 2020 • Elisabeth Eder, Ulrike Krieg-Holz, Udo Hahn

The vast amount of social communication distributed over various electronic media channels (tweets, blogs, emails, etc.

De-identification

Paper
Add Code

ProGene - A Large-scale, High-Quality Protein-Gene Annotated Benchmark Corpus

no code implementations • LREC 2020 • Erik Faessler, Luise Modersohn, Christina Lohr, Udo Hahn

Genes and proteins constitute the fundamental entities of molecular genetics.

Vocal Bursts Intensity Prediction

Paper
Add Code

A Time Series Analysis of Emotional Loading in Central Bank Statements

no code implementations • WS 2019 • Sven Buechel, Simon Junker, Thore Schlaak, Claus Michelsen, Udo Hahn

We examine the affective content of central bank press statements using emotion analysis.

Emotion Recognition Time Series +1

Paper
Add Code

De-Identification of Emails: Pseudonymizing Privacy-Sensitive Data in a German Email Corpus

no code implementations • RANLP 2019 • Elisabeth Eder, Ulrike Krieg-Holz, Udo Hahn

We deal with the pseudonymization of those stretches of text in emails that might allow to identify real individual persons.

De-identification

Paper
Add Code

Continuous Quality Control and Advanced Text Segment Annotation with WAT-SL 2.0

1 code implementation • WS 2019 • Christina Lohr, Johannes Kiesel, Stephanie Luther, Johannes Hellrich, Tobias Kolditz, Benno Stein, Udo Hahn

Today{'}s widely used annotation tools were designed for annotating typically short textual mentions of entities or relations, making their interface cumbersome to use for long(er) stretches of text, e. g, sentences running over several lines in a document.

Paper
Code

At the Lower End of Language---Exploring the Vulgar and Obscene Side of German

no code implementations • WS 2019 • Elisabeth Eder, Ulrike Krieg-Holz, Udo Hahn

In this paper, we describe a workflow for the data-driven acquisition and semantic scaling of a lexicon that covers lexical items from the lower end of the German language register{---}terms typically considered as rough, vulgar or obscene.

Paper
Add Code

The Influence of Down-Sampling Strategies on SVD Word Embedding Stability

no code implementations • WS 2019 • Johannes Hellrich, Bernd Kampe, Udo Hahn

The stability of word embedding algorithms, i. e., the consistency of the word representations they reveal when trained repeatedly on the same data set, has recently raised concerns.

Word Embeddings

Paper
Add Code

JeSemE: Interleaving Semantics and Emotions in a Web Service for the Exploration of Language Change Phenomena

no code implementations • COLING 2018 • Johannes Hellrich, Sven Buechel, Udo Hahn

We here introduce a substantially extended version of JeSemE, an interactive website for visually exploring computationally derived time-variant information on word meanings and lexical emotions assembled from five large diachronic text corpora.

Sentiment Analysis Word Embeddings

Paper
Add Code

JeSemE: A Website for Exploring Diachronic Changes in Word Meaning and Emotion

2 code implementations • 11 Jul 2018 • Johannes Hellrich, Sven Buechel, Udo Hahn

Paper
Code

Representation Mapping: A Novel Approach to Generate High-Quality Multi-Lingual Emotion Lexicons

1 code implementation • LREC 2018 • Sven Buechel, Udo Hahn

In the past years, sentiment analysis has increasingly shifted attention to representational frameworks more expressive than semantic polarity (being positive, negative or neutral).

Sentiment Analysis

Paper
Code

A Corpus of Corporate Annual and Social Responsibility Reports: 280 Million Tokens of Balanced Organizational Writing

no code implementations • WS 2018 • Sebastian G. M. H{\"a}ndschke, Sven Buechel, Jan Goldenstein, Philipp Poschmann, Tinghui Duan, Peter Walgenbach, Udo Hahn

We introduce JOCo, a novel text corpus for NLP analytics in the field of economics, business and management.

Management

Paper
Add Code

Emotion Representation Mapping for Automatic Lexicon Construction (Mostly) Performs on Human Level

1 code implementation • COLING 2018 • Sven Buechel, Udo Hahn

Emotion Representation Mapping (ERM) has the goal to convert existing emotion ratings from one representation format into another one, e. g., mapping Valence-Arousal-Dominance annotations for words or sentences into Ekman's Basic Emotions and vice versa.

Paper
Code

Modeling Word Emotion in Historical Language: Quantity Beats Supposed Stability in Seed Word Selection

no code implementations • WS 2019 • Johannes Hellrich, Sven Buechel, Udo Hahn

To understand historical texts, we must be aware that language -- including the emotional connotation attached to words -- changes over time.

Paper
Add Code

Word Emotion Induction for Multiple Languages as a Deep Multi-Task Learning Problem

1 code implementation • NAACL 2018 • Sven Buechel, Udo Hahn

Predicting the emotional value of lexical items is a well-known problem in sentiment analysis.

Multi-Task Learning Sentiment Analysis

Paper
Code

Sharing Copies of Synthetic Clinical Corpora without Physical Distribution --- A Case Study to Get Around IPRs and Privacy Constraints Featuring the German JSYNCC Corpus

1 code implementation • LREC 2018 • Christina Lohr, Sven Buechel, Udo Hahn

Paper
Code

Semedico: A Comprehensive Semantic Search Engine for the Life Sciences

no code implementations • ACL 2017 • Erik Faessler, Udo Hahn

Information Retrieval

Paper
Add Code

Exploring Diachronic Lexical Semantics with JeSemE

1 code implementation • ACL 2017 • Johannes Hellrich, Udo Hahn

Information Retrieval Word Embeddings

Paper
Code

Readers vs. Writers vs. Texts: Coping with Different Perspectives of Text Understanding in Emotion Annotation

1 code implementation • WS 2017 • Sven Buechel, Udo Hahn

We here examine how different perspectives of understanding written discourse, like the reader{'}s, the writer{'}s or the text{'}s point of view, affect the quality of emotion annotations.

Reading Comprehension

184

Paper
Code

Bad Company---Neighborhoods in Neural Embedding Spaces Considered Harmful

1 code implementation • COLING 2016 • Johannes Hellrich, Udo Hahn

We assess the reliability and accuracy of (neural) word embeddings for both modern and historical English and German.

Word Embeddings

Paper
Code

Feelings from the Past---Adapting Affective Lexicons for Historical Emotion Analysis

no code implementations • WS 2016 • Sven Buechel, Johannes Hellrich, Udo Hahn

We here describe a novel methodology for measuring affective language in historical text by expanding an affective lexicon and jointly adapting it to prior language stages.

Emotion Recognition Word Embeddings

Paper
Add Code

An Assessment of Experimental Protocols for Tracing Changes in Word Semantics Relative to Accuracy and Reliability

1 code implementation • WS 2016 • Johannes Hellrich, Udo Hahn

Semantic Textual Similarity

Paper
Code

Do Enterprises Have Emotions?

no code implementations • WS 2016 • Sven Buechel, Udo Hahn, Jan Goldenstein, Sebastian G. M. H{\"a}ndschke, Peter Walgenbach

Paper
Add Code

UIMA-Based JCoRe 2.0 Goes GitHub and Maven Central â€• State-of-the-Art Software Resource Engineering and Distribution of NLP Pipelines

no code implementations • LREC 2016 • Udo Hahn, Franz Matthies, Erik Faessler, Johannes Hellrich

We introduce JCoRe 2. 0, the relaunch of a UIMA-based open software repository for full-scale natural language processing originating from the Jena University Language {\&} Information Engineering (JULIE) Lab.

Management

Paper
Add Code

CodE Alltag: A German-Language E-Mail Corpus

no code implementations • LREC 2016 • Ulrike Krieg-Holz, Christian Schuschnig, Franz Matthies, Benjamin Redling, Udo Hahn

We introduce CODE ALLTAG, a text corpus composed of German-language e-mails.

Descriptive

Paper
Add Code

Collaboratively Annotating Multilingual Parallel Corpora in the Biomedical Domain---some MANTRAs

no code implementations • LREC 2014 • Johannes Hellrich, Simon Clematide, Udo Hahn, Dietrich Rebholz-Schuhmann

The coverage of multilingual biomedical resources is high for the English language, yet sparse for non-English languagesâ€•an observation which holds for seemingly well-resourced, yet still dramatically low-resourced ones such as Spanish, French or German but even more so for really under-resourced ones such as Dutch.

Named Entity Recognition (NER) Translation

Paper
Add Code

Disclose Models, Hide the Data - How to Make Use of Confidential Corpora without Seeing Sensitive Raw Data

no code implementations • LREC 2014 • Erik Faessler, Johannes Hellrich, Udo Hahn

Confidential corpora from the medical, enterprise, security or intelligence domains often contain sensitive raw data which lead to severe restrictions as far as the public accessibility and distribution of such language resources are concerned.

POS POS Tagging +1

Paper
Add Code

CALBC: Releasing the Final Corpora

no code implementations • LREC 2012 • {\c{S}}enay Kafkas, Ian Lewin, David Milward, Erik van Mulligen, Jan Kors, Udo Hahn, Dietrich Rebholz-Schuhmann

These usually lead to implementation of trained solutions (1) for a limited number of semantic entity types and (2) lacking in generalization capability.

named-entity-recognition Named Entity Recognition +1

Paper
Add Code

Iterative Refinement and Quality Checking of Annotation Guidelines --- How to Deal Effectively with Semantically Sloppy Named Entity Types, such as Pathological Phenomena

no code implementations • LREC 2012 • Udo Hahn, Elena Beisswanger, Ekaterina Buyko, Erik Faessler, Jenny Traum{\"u}ller, Susann Schr{\"o}der, Kerstin Hornbostel

We here discuss a methodology for dealing with the annotation of semantically hard to delineate, i. e., sloppy, named entity types.

Descriptive Named Entity Recognition (NER)

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.