However, in a second experiment, we show that our model does not generalize equally well when applied to data from time periods and localities other than our training sample.
To ease the difficulty of argument stance classification, the task of same side stance classification (S3C) has been proposed.
We approach aspect-based argument mining as a supervised machine learning task to classify arguments into semantically coherent groups referring to the same defined aspect categories.
This article introduces to the interactive Leipzig Corpus Miner (iLCM) - a newly released, open-source software to perform automatic content analysis.
With the increasing number of user comments in diverse domains, including comments on online journalism and e-commerce websites, the manual content analysis of these comments becomes time-consuming and challenging.
Fine-tuning of pre-trained transformer networks such as BERT yield state-of-the-art results for text classification tasks.
In the case of multi-page documents, the preservation of document contexts is a major requirement.
Since vectors of the same word type can vary depending on the respective context, they implicitly provide a model for word sense disambiguation (WSD).
De-identification is the task of detecting protected health information (PHI) in medical text.
We present a neural network based approach of transfer learning for offensive language detection.
Best results are achieved from pre-training our model on the unsupervised topic clustering of tweets in combination with thematic user cluster information.
We evaluate the performance of different word and character embeddings on two standard German datasets and with a special focus on out-of-vocabulary words.
We introduce an advanced information extraction pipeline to automatically process very large collections of unstructured textual data for the purpose of investigative journalism.
Investigative journalism in recent years is confronted with two major challenges: 1) vast amounts of unstructured data originating from large text collections such as leaks or answers to Freedom of Information requests, and 2) multi-lingual data due to intensified global cooperation and communication in politics, business and civil society.
The iLCM project pursues the development of an integrated research environment for the analysis of structured and unstructured data in a "Software as a Service" architecture (SaaS).
In recent years, (retro-)digitizing paper-based files became a major undertaking for private and public archives as well as an important task in electronic mailroom applications.
This paper presents the "Leipzig Corpus Miner", a technical infrastructure for supporting qualitative and quantitative content analysis.
In terminology work, natural language processing, and digital humanities, several studies address the analysis of variations in context and meaning of terms in order to detect semantic change and the evolution of terms.