Search Results for author: Chris Biemann

Found 125 papers, 25 papers with code

Probing Pre-trained Language Models for Semantic Attributes and their Values

1 code implementation Findings (EMNLP) 2021 Meriem Beloucif, Chris Biemann

Pretrained language models (PTLMs) yield state-of-the-art performance on many natural language processing tasks, including syntax, semantics and commonsense.

Social Media Unrest Prediction during the COVID-19 Pandemic: Neural Implicit Motive Pattern Recognition as Psychometric Signs of Severe Crises

no code implementations COLING (PEOPLES) 2020 Dirk Johannßen, Chris Biemann

We employ this model to investigate a change of language towards social unrest during the COVID-19 pandemic by comparing established psychological predictors on samples of tweets from spring 2019 with spring 2020.

Error Analysis of using BART for Multi-Document Summarization: A Study for English and German Language

1 code implementation NoDaLiDa 2021 Timo Johner, Abhik Jana, Chris Biemann

Recent research using pre-trained language models for multi-document summarization task lacks deep investigation of potential erroneous cases and their possible application on other languages.

Document Summarization Language Modelling +1

Generating Lexical Representations of Frames using Lexical Substitution

no code implementations PaM 2020 Saba Anwar, Artem Shelmanov, Alexander Panchenko, Chris Biemann

We investigate a simple yet effective method, lexical substitution with word representation models, to automatically expand a small set of frame-annotated sentences with new words for their respective roles and LUs.

How Hateful are Movies? A Study and Prediction on Movie Subtitles

1 code implementation KONVENS (WS) 2021 Niklas von Boguszewski, Sana Moin, Anirban Bhowmick, Seid Muhie Yimam, Chris Biemann

Hence, we show that transfer learning from the social media domain is efficacious in classifying hate and offensive speech in movies through subtitles.

Domain Adaptation Transfer Learning

ActiveAnno: General-Purpose Document-Level Annotation Tool with Active Learning Integration

no code implementations NAACL 2021 Max Wiechmann, Seid Muhie Yimam, Chris Biemann

ActiveAnno is built with extensible design and easy deployment in mind, all to enable users to perform annotation tasks with high efficiency and high-quality annotation results.

Active Learning

Word Complexity is in the Eye of the Beholder

no code implementations NAACL 2021 Sian Gooding, Ekaterina Kochmar, Seid Muhie Yimam, Chris Biemann

Lexical complexity is a highly subjective notion, yet this factor is often neglected in lexical simplification and readability systems which use a {''}one-size-fits-all{''} approach.

Lexical Simplification

Towards Multi-Modal Text-Image Retrieval to improve Human Reading

no code implementations NAACL 2021 Florian Schneider, {\"O}zge Ala{\c{c}}am, Xintong Wang, Chris Biemann

In primary school, children{'}s books, as well as in modern language learning apps, multi-modal learning strategies like illustrations of terms and phrases are used to support reading comprehension.

Reading Comprehension Text-Image Retrieval

SCoT: Sense Clustering over Time: a tool for the analysis of lexical change

no code implementations EACL 2021 Christian Haase, Saba Anwar, Seid Muhie Yimam, Alexander Friedrich, Chris Biemann

There are two main approaches to the exploration of dynamic networks: the discrete one compares a series of clustered graphs from separate points in time.

Forum 4.0: An Open-Source User Comment Analysis Framework

no code implementations EACL 2021 Marlo Haering, Jakob Smedegaard Andersen, Chris Biemann, Wiebke Loosen, Benjamin Milde, Tim Pietz, Christian St{\"o}cker, Gregor Wiedemann, Olaf Zukunft, Walid Maalej

With the increasing number of user comments in diverse domains, including comments on online journalism and e-commerce websites, the manual content analysis of these comments becomes time-consuming and challenging.

HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection

5 code implementations18 Dec 2020 Binny Mathew, Punyajoy Saha, Seid Muhie Yimam, Chris Biemann, Pawan Goyal, Animesh Mukherjee

We also observe that models, which utilize the human rationales for training, perform better in reducing unintended bias towards target communities.

Hate Speech Detection Text Classification

Social Media Unrest Prediction during the {COVID}-19 Pandemic: Neural Implicit Motive Pattern Recognition as Psychometric Signs of Severe Crises

no code implementations8 Dec 2020 Dirk Johannßen, Chris Biemann

We employ this model to investigate a change of language towards social unrest during the COVID-19 pandemic by comparing established psychological predictors on samples of tweets from spring 2019 with spring 2020.

Exploring Amharic Sentiment Analysis from Social Media Texts: Building Annotation Tools and Classification Models

no code implementations COLING 2020 Seid Muhie Yimam, Hizkiel Mitiku Alemayehu, Abinew Ayele, Chris Biemann

To advance the sentiment analysis research in Amharic and other related low-resource languages, we release the dataset, the annotation tool, source code, and models publicly under a permissive.

Decision Making Sentiment Analysis

Individual corpora predict fast memory retrieval during reading

no code implementations COLING (CogALex) 2020 Markus J. Hofmann, Lara Müller, Andre Rölke, Ralph Radach, Chris Biemann

Then we trained word2vec models from individual corpora and a 70 million-sentence newspaper corpus to obtain individual and norm-based long-term memory structure.

Language Modelling

Estimating the influence of auxiliary tasks for multi-task learning of sequence tagging tasks

no code implementations ACL 2020 Fynn Schr{\"o}der, Chris Biemann

We propose new methods to automatically assess the similarity of sequence tagging datasets to identify beneficial auxiliary data for MTL or TL setups.

Multi-Task Learning

Neural Entity Linking: A Survey of Models Based on Deep Learning

no code implementations31 May 2020 Ozge Sevgili, Artem Shelmanov, Mikhail Arkhipov, Alexander Panchenko, Chris Biemann

In this survey, we provide a comprehensive description of recent neural entity linking (EL) systems developed since 2015 as a result of the "deep learning revolution" in NLP.

Entity Embeddings Entity Linking

Word Sense Disambiguation for 158 Languages using Word Embeddings Only

no code implementations LREC 2020 Varvara Logacheva, Denis Teslenko, Artem Shelmanov, Steffen Remus, Dmitry Ustalov, Andrey Kutuzov, Ekaterina Artemova, Chris Biemann, Simone Paolo Ponzetto, Alexander Panchenko

We use this method to induce a collection of sense inventories for 158 languages on the basis of the original pre-trained fastText word embeddings by Grave et al. (2018), enabling WSD in these languages.

Word Embeddings Word Sense Disambiguation

Analysis of the Ethiopic Twitter Dataset for Abusive Speech in Amharic

no code implementations9 Dec 2019 Seid Muhie Yimam, Abinew Ali Ayele, Chris Biemann

Since several languages can be written using the Fidel script, we have used the existing Amharic, Tigrinya and Ge'ez corpora to retain only the Amharic tweets.

Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with Contextualized Embeddings

1 code implementation23 Sep 2019 Gregor Wiedemann, Steffen Remus, Avi Chawla, Chris Biemann

Since vectors of the same word type can vary depending on the respective context, they implicitly provide a model for word sense disambiguation (WSD).

General Classification Translation +1

Improving Neural Entity Disambiguation with Graph Embeddings

no code implementations ACL 2019 {\"O}zge Sevgili, Alex Panchenko, er, Chris Biemann

Entity Disambiguation (ED) is the task of linking an ambiguous entity mention to a corresponding entry in a knowledge base.

Entity Disambiguation

TARGER: Neural Argument Mining at Your Fingertips

1 code implementation ACL 2019 Artem Chernodub, Oleksiy Oliynyk, Philipp Heidenreich, Alex Bondarenko, Matthias Hagen, Chris Biemann, Alex Panchenko, er

We present TARGER, an open source neural argument mining framework for tagging arguments in free input texts and for keyword-based retrieval of arguments from an argument-tagged web-scale corpus.

Argument Mining

On the Compositionality Prediction of Noun Phrases using Poincar\'e Embeddings

no code implementations ACL 2019 Abhik Jana, Dima Puzyrev, Alex Panchenko, er, Pawan Goyal, Chris Biemann, Animesh Mukherjee

In particular, we use hypernymy information of the multiword and its constituents encoded in the form of the recently introduced Poincar{\'e} embeddings in addition to the distributional information to detect compositionality for noun phrases.

On the Compositionality Prediction of Noun Phrases using Poincaré Embeddings

no code implementations7 Jun 2019 Abhik Jana, Dmitry Puzyrev, Alexander Panchenko, Pawan Goyal, Chris Biemann, Animesh Mukherjee

In particular, we use hypernymy information of the multiword and its constituents encoded in the form of the recently introduced Poincar\'e embeddings in addition to the distributional information to detect compositionality for noun phrases.

Every child should have parents: a taxonomy refinement algorithm based on hyperbolic term embeddings

1 code implementation ACL 2019 Rami Aly, Shantanu Acharya, Alexander Ossa, Arne Köhn, Chris Biemann, Alexander Panchenko

We introduce the use of Poincar\'e embeddings to improve existing state-of-the-art approaches to domain-specific taxonomy induction from text as a signal for both relocating wrong hyponym terms within a (pre-induced) taxonomy as well as for attaching disconnected terms in a taxonomy.

Reviving a psychometric measure: Classification and prediction of the Operant Motive Test

no code implementations WS 2019 Dirk Johann{\ss}en, Chris Biemann, David Scheffer

In addition, we found a significant correlation of r = . 2 between subsequent academic success and data automatically labeled with our model in an extrinsic evaluation.

General Classification

LT Expertfinder: An Evaluation Framework for Expert Finding Methods

1 code implementation NAACL 2019 Tim Fischer, Steffen Remus, Chris Biemann

Particularly for dynamic systems, where topics are not predefined but formulated as a search query, we believe a more informative approach is to perform user studies for directly comparing different methods in the same view.

Information Retrieval

HHMM at SemEval-2019 Task 2: Unsupervised Frame Induction using Contextualized Word Embeddings

1 code implementation SEMEVAL 2019 Saba Anwar, Dmitry Ustalov, Nikolay Arefyev, Simone Paolo Ponzetto, Chris Biemann, Alexander Panchenko

We present our system for semantic frame induction that showed the best performance in Subtask B. 1 and finished as the runner-up in Subtask A of the SemEval 2019 Task 2 on unsupervised semantic frame induction (QasemiZadeh et al., 2019).

Word Embeddings

Answering Comparative Questions: Better than Ten-Blue-Links?

no code implementations15 Jan 2019 Matthias Schildwächter, Alexander Bondarenko, Julian Zenker, Matthias Hagen, Chris Biemann, Alexander Panchenko

We present CAM (comparative argumentative machine), a novel open-domain IR system to argumentatively compare objects with respect to information extracted from the Common Crawl.

Transfer Learning from LDA to BiLSTM-CNN for Offensive Language Detection in Twitter

no code implementations7 Nov 2018 Gregor Wiedemann, Eugen Ruppert, Raghav Jindal, Chris Biemann

Best results are achieved from pre-training our model on the unsupervised topic clustering of tweets in combination with thematic user cluster information.

General Classification Transfer Learning

microNER: A Micro-Service for German Named Entity Recognition based on BiLSTM-CRF

no code implementations7 Nov 2018 Gregor Wiedemann, Raghav Jindal, Chris Biemann

We evaluate the performance of different word and character embeddings on two standard German datasets and with a special focus on out-of-vocabulary words.

Named Entity Recognition NER +1

Categorizing Comparative Sentences

3 code implementations WS 2019 Alexander Panchenko, Alexander Bondarenko, Mirco Franzek, Matthias Hagen, Chris Biemann

We tackle the tasks of automatically identifying comparative sentences and categorizing the intended preference (e. g., "Python has better NLP libraries than MATLAB" => (Python, better, MATLAB).

Argument Mining Sentence Embeddings

Unsupervised Sense-Aware Hypernymy Extraction

1 code implementation17 Sep 2018 Dmitry Ustalov, Alexander Panchenko, Chris Biemann, Simone Paolo Ponzetto

In this paper, we show how unsupervised sense representations can be used to improve hypernymy extraction.

A Multilingual Information Extraction Pipeline for Investigative Journalism

no code implementations EMNLP 2018 Gregor Wiedemann, Seid Muhie Yimam, Chris Biemann

We introduce an advanced information extraction pipeline to automatically process very large collections of unstructured textual data for the purpose of investigative journalism.

Entity Extraction using GAN

Watset: Local-Global Graph Clustering with Applications in Sense and Frame Induction

2 code implementations CL 2019 Dmitry Ustalov, Alexander Panchenko, Chris Biemann, Simone Paolo Ponzetto

We present a detailed theoretical and computational analysis of the Watset meta-algorithm for fuzzy graph clustering, which has been found to be widely applicable in a variety of domains.

Graph Clustering

New/s/leak 2.0 - Multilingual Information Extraction and Visualization for Investigative Journalism

no code implementations13 Jul 2018 Gregor Wiedemann, Seid Muhie Yimam, Chris Biemann

Investigative journalism in recent years is confronted with two major challenges: 1) vast amounts of unstructured data originating from large text collections such as leaks or answers to Freedom of Information requests, and 2) multi-lingual data due to intensified global cooperation and communication in politics, business and civil society.

Efficient Exploration

Par4Sim -- Adaptive Paraphrasing for Text Simplification

no code implementations COLING 2018 Seid Muhie Yimam, Chris Biemann

Learning from a real-world data stream and continuously updating the model without explicit supervision is a new challenge for NLP applications with machine learning components.

Learning-To-Rank Text Simplification

Document-based Recommender System for Job Postings using Dense Representations

no code implementations NAACL 2018 Ahmed Elsafty, Martin Riedl, Chris Biemann

Detecting the similarity between job advertisements is important for job recommendation systems as it allows, for example, the application of item-to-item based recommendations.

Document Embedding Recommendation Systems

Unsupervised Semantic Frame Induction using Triclustering

1 code implementation ACL 2018 Dmitry Ustalov, Alexander Panchenko, Andrei Kutuzov, Chris Biemann, Simone Paolo Ponzetto

We use dependency triples automatically extracted from a Web-scale corpus to perform unsupervised semantic frame induction.

Unspeech: Unsupervised Speech Context Embeddings

no code implementations18 Apr 2018 Benjamin Milde, Chris Biemann

We introduce "Unspeech" embeddings, which are based on unsupervised learning of context feature representations for spoken language.

Enriching Frame Representations with Distributionally Induced Senses

no code implementations LREC 2018 Stefano Faralli, Alexander Panchenko, Chris Biemann, Simone Paolo Ponzetto

We introduce a new lexical resource that enriches the Framester knowledge graph, which links Framnet, WordNet, VerbNet and other resources, with semantic features from text corpora.

What do we need to build explainable AI systems for the medical domain?

no code implementations28 Dec 2017 Andreas Holzinger, Chris Biemann, Constantinos S. Pattichis, Douglas B. Kell

In this paper we outline some of our research topics in the context of the relatively new area of explainable-AI with a focus on the application in medicine, which is a very special domain.

Autonomous Driving Game of Go +2

A Framework for Enriching Lexical Semantic Resources with Distributional Semantics

no code implementations23 Dec 2017 Chris Biemann, Stefano Faralli, Alexander Panchenko, Simone Paolo Ponzetto

While both kinds of semantic resources are available with high lexical coverage, our aligned resource combines the domain specificity and availability of contextual information from distributional models with the conciseness and high quality of manually crafted lexical networks.

Word Sense Disambiguation

Building a Web-Scale Dependency-Parsed Corpus from CommonCrawl

no code implementations LREC 2018 Alexander Panchenko, Eugen Ruppert, Stefano Faralli, Simone Paolo Ponzetto, Chris Biemann

We present DepCC, the largest-to-date linguistically analyzed corpus in English including 365 million documents, composed of 252 billion tokens and 7. 5 billion of named entity occurrences in 14. 3 billion sentences from a web-scale crawl of the \textsc{Common Crawl} project.

Open Information Extraction Question Answering +1

Fighting with the Sparsity of Synonymy Dictionaries

no code implementations30 Aug 2017 Dmitry Ustalov, Mikhail Chernoskutov, Chris Biemann, Alexander Panchenko

Graph-based synset induction methods, such as MaxMax and Watset, induce synsets by performing a global clustering of a synonymy graph.

Unsupervised, Knowledge-Free, and Interpretable Word Sense Disambiguation

1 code implementation EMNLP 2017 Alexander Panchenko, Fide Marten, Eugen Ruppert, Stefano Faralli, Dmitry Ustalov, Simone Paolo Ponzetto, Chris Biemann

In word sense disambiguation (WSD), knowledge-based systems tend to be much more interpretable than knowledge-free counterparts as they rely on the wealth of manually-encoded elements representing word senses, such as hypernyms, usage examples, and images.

Word Sense Disambiguation

Watset: Automatic Induction of Synsets from a Graph of Synonyms

1 code implementation ACL 2017 Dmitry Ustalov, Alexander Panchenko, Chris Biemann

This paper presents a new graph-based approach that induces synsets using synonymy dictionaries and word embeddings.

Word Embeddings Word Sense Induction

Using Linked Disambiguated Distributional Networks for Word Sense Disambiguation

no code implementations WS 2017 Alex Panchenko, er, Stefano Faralli, Simone Paolo Ponzetto, Chris Biemann

We introduce a new method for unsupervised knowledge-based word sense disambiguation (WSD) based on a resource that links two types of sense-aware lexical networks: one is induced from a corpus using distributional semantics, the other is manually constructed.

Machine Translation Translation +2

Unsupervised Does Not Mean Uninterpretable: The Case for Word Sense Induction and Disambiguation

no code implementations EACL 2017 Alex Panchenko, er, Eugen Ruppert, Stefano Faralli, Simone Paolo Ponzetto, Chris Biemann

On the example of word sense induction and disambiguation (WSID), we show that it is possible to develop an interpretable model that matches the state-of-the-art models in accuracy.

Word Embeddings Word Sense Induction

Towards a resource based on users' knowledge to overcome the Tip of the Tongue problem.

no code implementations WS 2016 Michael Zock, Chris Biemann

To this end, we asked crowdworkers to provide some cues to describe a given target and to specify then how each one of them relates to the target, in the hope that this could help others to find the elusive word.

Vectors or Graphs? On Differences of Representations for Distributional Semantic Models

no code implementations WS 2016 Chris Biemann

Distributional Semantic Models (DSMs) have recently received increased attention, together with the rise of neural architectures for scalable training of dense vector embeddings.

Information Retrieval

Domain-Specific Corpus Expansion with Focused Webcrawling

no code implementations LREC 2016 Steffen Remus, Chris Biemann

This work presents a straightforward method for extending or creating in-domain web corpora by focused webcrawling.

That's sick dude!: Automatic identification of word sense change across different timescales

no code implementations ACL 2014 Sunny Mitra, Ritwik Mitra, Martin Riedl, Chris Biemann, Animesh Mukherjee, Pawan Goyal

In this paper, we propose an unsupervised method to identify noun sense changes based on rigorous analysis of time-varying text data available in the form of millions of digitized books.

Word Sense Disambiguation

Distributed Distributional Similarities of Google Books Over the Centuries

no code implementations LREC 2014 Martin Riedl, Richard Steuer, Chris Biemann

This paper introduces a distributional thesaurus and sense clusters computed on the complete Google Syntactic N-grams, which is extracted from Google Books, a very large corpus of digitized books published between 1520 and 2008.

Graph Clustering

NoSta-D Named Entity Annotation for German: Guidelines and Dataset

no code implementations LREC 2014 Darina Benikova, Chris Biemann, Marc Reznicek

We describe our approach to creating annotation guidelines based on linguistic and semantic considerations, and how we iteratively refined and tested them in the early stages of annotation in order to arrive at the largest publicly available dataset for German NER, consisting of over 31, 000 manually annotated sentences (over 591, 000 tokens) from German Wikipedia and German online news.

Named Entity Recognition NER +2

Turk Bootstrap Word Sense Inventory 2.0: A Large-Scale Resource for Lexical Substitution

no code implementations LREC 2012 Chris Biemann

This lexical resource, created by a crowdsourcing process using Amazon Mechanical Turk (http://www. mturk. com), encompasses a sense inventory for lexical substitution for 1, 012 highly frequent English common nouns.

Machine Translation Question Answering +1

Cannot find the paper you are looking for? You can Submit a new open access paper.