1 code implementation • Findings (EMNLP) 2021 • Timo Spinde, Manuel Plank, Jan-David Krieger, Terry Ruas, Bela Gipp, Akiko Aizawa
Fine-tuning and evaluating the model on our proposed supervised data set, we achieve a macro F1-score of 0. 804, outperforming existing methods.
1 code implementation • 26 Feb 2025 • Jonas Becker, Lars Benedikt Kaesberg, Andreas Stephan, Jan Philip Wahle, Terry Ruas, Bela Gipp
To identify the reasons for this issue, we perform a human study with eight experts on discussions suffering from problem drift, who find the most common issues are a lack of progress (35% of cases), low-quality feedback (26% of cases), and a lack of clarity (25% of cases).
1 code implementation • 26 Feb 2025 • Lars Benedikt Kaesberg, Jonas Becker, Jan Philip Wahle, Terry Ruas, Bela Gipp
Our results show that voting protocols improve performance by 13. 2% in reasoning tasks and consensus protocols by 2. 8% in knowledge tasks over the other decision protocol.
no code implementations • 18 Feb 2025 • Frederic Kirstein, Muneeb Khan, Jan Philip Wahle, Terry Ruas, Bela Gipp
Meeting summarization suffers from limited high-quality data, mainly due to privacy restrictions and expensive collection processes.
1 code implementation • 17 Feb 2025 • Shamsuddeen Hassan Muhammad, Nedjma Ousidhoum, Idris Abdulmumin, Jan Philip Wahle, Terry Ruas, Meriem Beloucif, Christine de Kock, Nirmal Surange, Daniela Teodorescu, Ibrahim Said Ahmad, David Ifeoluwa Adelani, Alham Fikri Aji, Felermino D. M. A. Ali, Ilseyar Alimova, Vladimir Araujo, Nikolay Babakov, Naomi Baes, Ana-Maria Bucur, Andiswa Bukula, Guanqun Cao, Rodrigo Tufino Cardenas, Rendi Chevi, Chiamaka Ijeoma Chukwuneke, Alexandra Ciobotaru, Daryna Dementieva, Murja Sani Gadanya, Robert Geislinger, Bela Gipp, Oumaima Hourrane, Oana Ignat, Falalu Ibrahim Lawan, Rooweither Mabuya, Rahmad Mahendra, Vukosi Marivate, Andrew Piper, Alexander Panchenko, Charles Henrique Porto Ferreira, Vitaly Protasov, Samuel Rutunda, Manish Shrivastava, Aura Cristina Udrea, Lilian Diana Awuor Wanzare, Sophie Wu, Florian Valentin Wunderlich, Hanif Muhammad Zhafran, Tianhui Zhang, Yi Zhou, Saif M. Mohammad
In this paper, we present BRIGHTER-- a collection of multilabeled emotion-annotated datasets in 28 different languages.
no code implementations • 13 Dec 2024 • Anastasia Zhukova, Christian E. Matt, Bela Gipp
Collecting test datasets in a narrow domain is time-consuming and requires skilled human resources with domain knowledge and training for the annotation task.
no code implementations • 27 Nov 2024 • Frederic Kirstein, Terry Ruas, Bela Gipp
The quality of meeting summaries generated by natural language generation (NLG) systems is hard to measure automatically.
no code implementations • 17 Nov 2024 • Tomas Horych, Christoph Mandl, Terry Ruas, Andre Greiner-Petter, Bela Gipp, Akiko Aizawa, Timo Spinde
Our classifier, fine-tuned on this dataset, surpasses all of the annotator LLMs by 5-9 percent in Matthews Correlation Coefficient (MCC) and performs close to or outperforms the model trained on human-labeled data when evaluated on two media bias benchmark datasets (BABE and BASIL).
1 code implementation • 18 Oct 2024 • Frederic Kirstein, Terry Ruas, Robert Kratel, Bela Gipp
Previous attempts to address these issues by considering related supplementary resources (e. g., presentation slides) alongside transcripts are hindered by models' limited context sizes and handling the additional complexities of the multi-source tasks, such as identifying relevant information in additional files and seamlessly aligning it with the meeting content.
1 code implementation • 16 Jul 2024 • Frederic Kirstein, Terry Ruas, Bela Gipp
Meeting summarization has become a critical task since digital encounters have become a common practice.
1 code implementation • 3 Jul 2024 • Lars Benedikt Kaesberg, Terry Ruas, Jan Philip Wahle, Bela Gipp
We present CiteAssist, a system to automate the generation of BibTeX entries for preprints, streamlining the process of bibliographic annotation.
1 code implementation • 2 Jul 2024 • Dominik Meier, Jan Philip Wahle, Terry Ruas, Bela Gipp
The dataset also provides a human preference ranking of paraphrases with different types that can be used to fine-tune models with RLHF and DPO methods.
2 code implementations • 28 Jun 2024 • Jan Philip Wahle, Terry Ruas, Yang Xu, Bela Gipp
In particular, changes in morphology and lexicon, i. e., the vocabulary used, showed promise in improving prompts.
no code implementations • 11 Jun 2024 • Frederic Kirstein, Jan Philip Wahle, Bela Gipp, Terry Ruas
Abstractive dialogue summarization is the task of distilling conversations into informative and concise summaries.
Abstractive Dialogue Summarization
Abstractive Text Summarization
+1
1 code implementation • 24 May 2024 • Jonas Becker, Jan Philip Wahle, Bela Gipp, Terry Ruas
Text generation has become more accessible than ever, and the increasing interest in these systems, especially those using large language models, has spurred an increasing number of related publications.
1 code implementation • 17 Apr 2024 • Frederic Kirstein, Jan Philip Wahle, Terry Ruas, Bela Gipp
Meeting summarization has become a critical task considering the increase in online interactions.
1 code implementation • 30 Mar 2024 • Ankit Satpute, Noah Giessing, Andre Greiner-Petter, Moritz Schubotz, Olaf Teschke, Akiko Aizawa, Bela Gipp
In this study, we adopted a two-step approach for investigating the proficiency of LLMs in answering mathematical questions.
1 code implementation • 27 Feb 2024 • Tomáš Horych, Martin Wessel, Jan Philip Wahle, Terry Ruas, Jerome Waßmuth, André Greiner-Petter, Akiko Aizawa, Bela Gipp, Timo Spinde
MAGPIE confirms that MTL is a promising approach for addressing media bias detection, enhancing the accuracy and efficiency of existing models.
2 code implementations • 19 Feb 2024 • Jan Philip Wahle, Terry Ruas, Mohamed Abdalla, Bela Gipp, Saif M. Mohammad
This study examines the tendency to cite older work across 20 fields of study over 43 years (1980--2023).
1 code implementation • 5 Feb 2024 • Andreas Stephan, Lukas Miklautz, Kevin Sidak, Jan Philip Wahle, Bela Gipp, Claudia Plant, Benjamin Roth
We, therefore, propose Text-Guided Image Clustering, i. e., generating text using image captioning and visual question-answering (VQA) models and subsequently clustering the generated text.
1 code implementation • 30 Jan 2024 • Ankit Satpute, Andre Greiner-Petter, Noah Gießing, Isabel Beckenbach, Moritz Schubotz, Olaf Teschke, Akiko Aizawa, Bela Gipp
Second, we analyze the best-performing approaches to detect plagiarism and mathematical content similarity on the newly established taxonomy.
1 code implementation • 26 Dec 2023 • Timo Spinde, Smi Hinterreiter, Fabian Haak, Terry Ruas, Helge Giese, Norman Meuschke, Bela Gipp
However, we have identified a lack of interdisciplinarity in existing projects, and a need for more awareness of the various types of media bias to support methodologically thorough performance evaluations of media bias detection systems.
1 code implementation • 23 Oct 2023 • Jan Philip Wahle, Terry Ruas, Mohamed Abdalla, Bela Gipp, Saif M. Mohammad
We analyzed ~77k NLP papers, ~3. 1m citations from NLP papers to other papers, and ~1. 8m citations from other papers to NLP papers.
3 code implementations • 23 Oct 2023 • Jan Philip Wahle, Bela Gipp, Terry Ruas
Current approaches in paraphrase generation and detection heavily rely on a single general similarity score, ignoring the intricate linguistic properties of language.
no code implementations • 28 Jun 2023 • Anastasia Zhukova, Lukas von Sperl, Christian E. Matt, Bela Gipp
Generative UX research employs domain users at the initial stages of prototype development, i. e., ideation and concept evaluation, and the last stage for evaluating system usefulness and user utility.
no code implementations • 25 May 2023 • Felix Petersen, Moritz Schubotz, Andre Greiner-Petter, Bela Gipp
We tackle the problem of neural machine translation of mathematical formulae between ambiguous presentation languages and unambiguous content languages.
1 code implementation • 22 May 2023 • Ankit Satpute, André Greiner-Petter, Moritz Schubotz, Norman Meuschke, Akiko Aizawa, Olaf Teschke, Bela Gipp
This demo paper presents the first tool to annotate the reuse of text, images, and mathematical formulae in a document pair -- TEIMMA.
no code implementations • 12 May 2023 • Bela Gipp, André Greiner-Petter, Moritz Schubotz, Norman Meuschke
This project investigated new approaches and technologies to enhance the accessibility of mathematical content and its semantic information for a broad range of information retrieval applications.
1 code implementation • 25 Apr 2023 • Martin Wessel, Tomáš Horych, Terry Ruas, Akiko Aizawa, Bela Gipp, Timo Spinde
A unified benchmark encourages the development of more robust systems and shifts the current paradigm in media bias detection evaluation towards solutions that tackle not one but multiple media bias types simultaneously.
1 code implementation • 24 Mar 2023 • Jonas Becker, Jan Philip Wahle, Terry Ruas, Bela Gipp
Additionally, we identify four datasets as the most diverse and challenging for paraphrase detection.
no code implementations • 17 Mar 2023 • Norman Meuschke, Apurva Jagdale, Timo Spinde, Jelena Mitrović, Bela Gipp
Using the new framework, we benchmark ten freely available tools in extracting document metadata, bibliographic references, tables, and other content elements from academic PDF documents.
1 code implementation • 3 Mar 2023 • Philipp Scharpf, Moritz Schubotz, Howard S. Cohl, Corinna Breitinger, Bela Gipp
Our long-term goal is to generalize citation-based IR methods and apply this generalized method to both classical references and mathematical concepts.
1 code implementation • 15 Nov 2022 • Philipp Scharpf, Moritz Schubotz, Andreas Spitz, Andre Greiner-Petter, Bela Gipp
To address this need, we propose a multilingual Wikimedia framework that allows for collaborative worldwide teacher knowledge engineering and subsequent AI-aided question generation, test, and correction.
1 code implementation • 12 Nov 2022 • Philipp Scharpf, Moritz Schubotz, Bela Gipp
In this paper, we aim to bridge the gap by presenting data mining methods and benchmark results to employ Mathematical Entity Linking (MathEL) and Unsupervised Formula Labeling (UFL) for semantic formula search and mathematical question answering (MathQA) on the arXiv preprint repository, Wikipedia, and Wikidata, which is part of the Wikimedia ecosystem of free knowledge.
no code implementations • 8 Nov 2022 • Moritz Schubotz, Ankit Satpute, Andre Greiner-Petter, Akiko Aizawa, Bela Gipp
In that case, the overall effort to iteratively improve the software and rerun the experiments creates significant time pressure on the researchers.
no code implementations • 7 Nov 2022 • Timo Spinde, Jan-David Krieger, Terry Ruas, Jelena Mitrović, Franz Götz-Hahn, Akiko Aizawa, Bela Gipp
Media has a substantial impact on the public perception of events.
1 code implementation • 26 Oct 2022 • Frederic Kirstein, Jan Philip Wahle, Terry Ruas, Bela Gipp
Further, we find that choice and combinations of task families influence downstream performance more than the training scheme, supporting the use of task families for abstractive text summarization.
2 code implementations • 13 Oct 2022 • Terry Ruas, Jan Philip Wahle, Lennart Küll, Saif M. Mohammad, Bela Gipp
This paper presents CS-Insights, an interactive web application to analyze computer science publications from DBLP through multiple perspectives.
3 code implementations • 7 Oct 2022 • Jan Philip Wahle, Terry Ruas, Frederic Kirstein, Bela Gipp
The recent success of large language models for text generation poses a severe threat to academic integrity, as plagiarists can generate realistic paraphrases indistinguishable from original work.
1 code implementation • 29 Sep 2022 • Timo Spinde, Manuel Plank, Jan-David Krieger, Terry Ruas, Bela Gipp, Akiko Aizawa
Fine-tuning and evaluating the model on our proposed supervised data set, we achieve a macro F1-score of 0. 804, outperforming existing methods.
1 code implementation • 22 May 2022 • Jan-David Krieger, Timo Spinde, Terry Ruas, Juhi Kulshrestha, Bela Gipp
We present DA-RoBERTa, a new state-of-the-art transformer-based model adapted to the media bias domain which identifies sentence-level bias with an F1 score of 0. 814.
1 code implementation • LREC 2022 • Jan Philip Wahle, Terry Ruas, Saif M. Mohammad, Bela Gipp
We present an initial analysis focused on the volume of computer science research (e. g., number of papers, authors, research activity), trends in topics of interest, and citation patterns.
1 code implementation • 28 Mar 2022 • Malte Ostendorff, Till Blume, Terry Ruas, Bela Gipp, Georg Rehm
We compare and analyze three generic document embeddings, six specialized document embeddings and a pairwise classification baseline in the context of research paper recommendations.
1 code implementation • 14 Feb 2022 • Malte Ostendorff, Nils Rethmeier, Isabelle Augenstein, Bela Gipp, Georg Rehm
Learning scientific document representations can be substantially improved through contrastive learning objectives, where the challenge lies in creating positive and negative training samples that encode the desired similarity semantics.
Ranked #1 on
Document Classification
on SciDocs (MeSH)
no code implementations • 15 Dec 2021 • Franziska Weeber, Felix Hamborg, Karsten Donnay, Bela Gipp
Large amounts of annotated data have become more important than ever, especially since the rise of deep learning techniques.
no code implementations • 14 Dec 2021 • Timo Spinde, Christina Kreuter, Wolfgang Gaissmaier, Felix Hamborg, Bela Gipp, Helge Giese
To name an example: Intending to measure bias in a news article, should we ask, "How biased is the article?"
no code implementations • 14 Dec 2021 • Timo Spinde, David Krieger, Manuel Plank, Bela Gipp
Our results demonstrate the existing crowdsourcing approaches' lack of data quality, underlining the need for a trained expert framework to gather a more reliable dataset.
no code implementations • 14 Dec 2021 • Timo Spinde, Kanishka Sinha, Norman Meuschke, Bela Gipp
We present a free and open-source tool for creating web-based surveys that include text annotation tasks.
no code implementations • 14 Dec 2021 • Timo Spinde, Lada Rudnitckaia, Felix Hamborg, Bela Gipp
The underlying idea is that the context of biased words in different news outlets varies more strongly than the one of non-biased words, since the perception of a word as being biased differs depending on its context.
1 code implementation • 13 Dec 2021 • Anastasia Zhukova, Felix Hamborg, Bela Gipp
Named entity recognition (NER) is an important task that aims to resolve universal categories of named entities, e. g., persons, locations, organizations, and times.
1 code implementation • 18 Nov 2021 • Johannes Stegmüller, Fabian Bauer-Marquart, Norman Meuschke, Terry Ruas, Moritz Schubotz, Bela Gipp
Identifying cross-language plagiarism is challenging, especially for distant language pairs and sense-for-sense translations.
1 code implementation • 15 Nov 2021 • Jan Philip Wahle, Nischal Ashok, Terry Ruas, Norman Meuschke, Tirthankar Ghosal, Bela Gipp
We expect that evaluating a broad spectrum of datasets and models will benefit future research in developing misinformation detection systems.
no code implementations • 18 Oct 2021 • Felix Hamborg, Timo Spinde, Kim Heinser, Karsten Donnay, Bela Gipp
We present an in-progress system for news recommendation that is the first to automate the manual procedure of content analysis to reveal person-targeting biases in news articles reporting on policy issues.
no code implementations • 18 Oct 2021 • Felix Hamborg, Kim Heinser, Anastasia Zhukova, Karsten Donnay, Bela Gipp
Our study further suggests that our content-driven identification method detects groups of similarly slanted news articles due to substantial biases present in individual news articles.
1 code implementation • 16 Sep 2021 • Malte Ostendorff, Corinna Breitinger, Bela Gipp
We conclude that users of literature recommendation systems can benefit most from hybrid approaches that combine both link- and text-based approaches, where the user's information needs and preferences should control the weighting for the approaches used.
1 code implementation • LREC 2022 • Anastasia Zhukova, Felix Hamborg, Bela Gipp
In this paper, we qualitatively and quantitatively compare the annotation schemes of ECB+, a CDCR dataset with identity coreference relations, and NewsWCL50, a CDCR dataset with a mix of loose context-dependent and strict coreference relations.
coreference-resolution
Cross Document Coreference Resolution
+1
no code implementations • Information for a Better World: Shaping the Global Future: 17th International Conference, iConference 2022, Virtual Event 2022 • Anastasia Zhukova, Felix Hamborg, Karsten Donnay, Bela Gipp
Bridging and loose coreference relations trigger associations that may lead to exposing news readers to bias by word choice and labeling.
coreference-resolution
Cross Document Coreference Resolution
1 code implementation • 2 Sep 2021 • Philipp Scharpf, Moritz Schubotz, Bela Gipp
The results indicate that mathematical entities have the potential to provide high explainability as they are a crucial part of a STEM document.
no code implementations • 2 Jul 2021 • Anastasia Zhukova, Felix Hamborg, Karsten Donnay, Bela Gipp
Specifically, the approach clusters mentions of groups of persons that act as non-named entity actors in the texts, e. g., "migrant families" = "asylum-seekers."
2 code implementations • 15 Jun 2021 • Jan Philip Wahle, Terry Ruas, Norman Meuschke, Bela Gipp
We present two supervised (pre-)training methods to incorporate gloss definitions from lexical resources into neural language models (LMs).
1 code implementation • 20 May 2021 • Felix Hamborg, Karsten Donnay, Bela Gipp
Extensive research on target-dependent sentiment classification (TSC) has led to strong classification performances in domains where authors tend to explicitly express sentiment about specific entities or topics, such as in reviews or on social media.
1 code implementation • 28 Apr 2021 • Malte Ostendorff, Elliott Ash, Terry Ruas, Bela Gipp, Julian Moreno-Schneider, Georg Rehm
Simultaneously, legal recommender systems are typically evaluated in small-scale user study without any public available benchmark datasets.
no code implementations • 23 Mar 2021 • Jan Philip Wahle, Terry Ruas, Norman Meuschke, Bela Gipp
The rise of language models such as BERT allows for high-quality text paraphrasing.
2 code implementations • 22 Mar 2021 • Jan Philip Wahle, Terry Ruas, Tomáš Foltýnek, Norman Meuschke, Bela Gipp
Employing paraphrasing tools to conceal plagiarized text is a severe threat to academic integrity.
1 code implementation • COLING 2020 • Malte Ostendorff, Terry Ruas, Till Blume, Bela Gipp, Georg Rehm
Our findings motivate future research of aspect-based document similarity and the development of a recommender system based on the evaluated techniques.
1 code implementation • 25 May 2020 • Moritz Schubotz, Philipp Scharpf, Olaf Teschke, Andreas Kuehnemund, Corinna Breitinger, Bela Gipp
Moreover, we find that the method's confidence score allows for reducing the effort by 86% compared to the manual coarse-grained classification effort while maintaining a precision of 81% for automatically classified articles.
1 code implementation • 23 May 2020 • Cornelius Ihle, Moritz Schubotz, Norman Meuschke, Bela Gipp
Plagiarism detection systems are essential tools for safeguarding academic and educational integrity.
no code implementations • 22 May 2020 • Philipp Scharpf, Moritz Schubotz, Abdou Youssef, Felix Hamborg, Norman Meuschke, Bela Gipp
In this paper, we show how selecting and combining encodings of natural and mathematical language affect classification and clustering of documents with mathematical content.
4 code implementations • 22 Mar 2020 • Malte Ostendorff, Terry Ruas, Moritz Schubotz, Georg Rehm, Bela Gipp
In this paper, we model the problem of finding the relationship between two documents as a pairwise document classification task.
no code implementations • 20 Mar 2020 • Moritz Schubotz, André Greiner-Petter, Norman Meuschke, Olaf Teschke, Bela Gipp
This poster summarizes our contributions to Wikimedia's processing pipeline for mathematical formulae.
1 code implementation • 7 Feb 2020 • Andre Greiner-Petter, Moritz Schubotz, Fabian Mueller, Corinna Breitinger, Howard S. Cohl, Akiko Aizawa, Bela Gipp
The contributions of our presented research are as follows: (1) we present the first distributional analysis of mathematical formulae on arXiv and zbMATH; (2) we retrieve relevant mathematical objects for given textual search queries (e. g., linking $P_{n}^{(\alpha, \beta)}\!\left(x\right)$ with `Jacobi polynomial'); (3) we extend zbMATH's search engine by providing relevant mathematical formulae; and (4) we exemplify the applicability of the results by presenting auto-completion for math inputs as the first contribution to math recommendation systems.
no code implementations • 23 Sep 2019 • Felix Hamborg, Philipp Meschenmoser, Moritz Schubotz, Bela Gipp
In scientific publications, citations allow readers to assess the authenticity of the presented information and verify it in the original context.
1 code implementation • KONVENS / GermEval 2019 2019 • Malte Ostendorff, Peter Bourgonje, Maria Berger, Julian Moreno-Schneider, Georg Rehm, Bela Gipp
In this paper, we focus on the classification of books using short descriptive texts (cover blurbs) and additional metadata.
2 code implementations • 6 Sep 2019 • Felix Hamborg, Corinna Breitinger, Bela Gipp
Event extraction from news articles is a commonly required prerequisite for various tasks, such as article summarization, article clustering, and news aggregation.
1 code implementation • 28 Jun 2019 • Moritz Schubotz, Philipp Scharpf, Kaushal Dudhat, Yash Nagar, Felix Hamborg, Bela Gipp
This formulae originate from the knowledge-base Wikidata.
no code implementations • 27 Jun 2019 • Norman Meuschke, Vincent Stange, Moritz Schubotz, Michael Karmer, Bela Gipp
Overall, we show that analyzing the similarity of mathematical content and academic citations is a striking supplement for conventional text-based detection approaches for academic literature in the STEM disciplines.
no code implementations • 20 May 2019 • André Greiner-Petter, Terry Ruas, Moritz Schubotz, Akiko Aizawa, William Grosky, Bela Gipp
Nowadays, Machine Learning (ML) is seen as the universal solution to improve the effectiveness of information retrieval (IR) methods.
no code implementations • 10 Nov 2018 • Felix Petersen, Moritz Schubotz, Bela Gipp
We implemented the first translator for mathematical formulae based on recursive neural networks.