Search Results for author: Iryna Gurevych

Found 364 papers, 207 papers with code

IMPLI: Investigating NLI Models’ Performance on Figurative Language

1 code implementation ACL 2022 Kevin Stowe, Prasetya Utama, Iryna Gurevych

Natural language inference (NLI) has been widely used as a task to train and evaluate models for language understanding.

Natural Language Inference

Evaluating Coreference Resolvers on Community-based Question Answering: From Rule-based to State of the Art

1 code implementation COLING (CRAC) 2022 Haixia Chai, Nafise Sadat Moosavi, Iryna Gurevych, Michael Strube

The results of our extrinsic evaluation show that while there is a significant difference between the performance of the rule-based system vs. state-of-the-art neural model on coreference resolution datasets, we do not observe a considerable difference on their impact on downstream models.

Answer Selection coreference-resolution +1

Exploring Metaphoric Paraphrase Generation

1 code implementation CoNLL (EMNLP) 2021 Kevin Stowe, Nils Beck, Iryna Gurevych

Metaphor generation is a difficult task, and has seen tremendous improvement with the advent of deep pretrained models.

Paraphrase Generation Sentence

Event Coreference Data (Almost) for Free: Mining Hyperlinks from Online News

1 code implementation AKBC 2021 Michael Bugert, Iryna Gurevych

Cross-document event coreference resolution (CDCR) is the task of identifying which event mentions refer to the same events throughout a collection of documents.

coreference-resolution Event Coreference Resolution

FIRE: Fact-checking with Iterative Retrieval and Verification

1 code implementation17 Oct 2024 Zhuohan Xie, Rui Xing, Yuxia Wang, Jiahui Geng, Hasan Iqbal, Dhruv Sahnan, Iryna Gurevych, Preslav Nakov

The typical approach to fact-checking these atomic claims involves retrieving a fixed number of pieces of evidence, followed by a verification step.

Claim Verification Fact Checking +3

Transforming Scholarly Landscapes: Influence of Large Language Models on Academic Fields beyond Computer Science

1 code implementation29 Sep 2024 Aniket Pramanick, Yufang Hou, Saif M. Mohammad, Iryna Gurevych

Large Language Models (LLMs) have ushered in a transformative era in Natural Language Processing (NLP), reshaping research and extending NLP's influence to other fields of study.

Few-Shot Learning

The Nature of NLP: Analyzing Contributions in NLP Papers

1 code implementation29 Sep 2024 Aniket Pramanick, Yufang Hou, Saif M. Mohammad, Iryna Gurevych

In this work, we quantitatively investigate what constitutes NLP by examining research papers.

2k

The Lou Dataset -- Exploring the Impact of Gender-Fair Language in German Text Classification

no code implementations26 Sep 2024 Andreas Waldis, Joel Birrer, Anne Lauscher, Iryna Gurevych

Gender-fair language, an evolving German linguistic variation, fosters inclusion by addressing all genders or using neutral forms.

Stance Detection text-classification +2

Efficient Performance Tracking: Leveraging Large Language Models for Automated Construction of Scientific Leaderboards

1 code implementation19 Sep 2024 Furkan Şahinuç, Thy Thy Tran, Yulia Grishina, Yufang Hou, Bei Chen, Iryna Gurevych

Building on this dataset, we propose three experimental settings that simulate real-world scenarios where TDM triples are fully defined, partially defined, or undefined during leaderboard construction.

Benchmarking

RIRAG: Regulatory Information Retrieval and Answer Generation

3 code implementations9 Sep 2024 Tuba Gokhan, Kexin Wang, Iryna Gurevych, Ted Briscoe

Regulatory documents, issued by governmental regulatory bodies, establish rules, guidelines, and standards that organizations must adhere to for legal compliance.

Answer Generation Information Retrieval +2

Diagnostic Reasoning in Natural Language: Computational Model and Application

no code implementations9 Sep 2024 Nils Dycke, Matej Zečević, Ilia Kuznetsov, Beatrix Suess, Kristian Kersting, Iryna Gurevych

To close this gap, we investigate diagnostic abductive reasoning (DAR) in the context of language-grounded tasks (NL-DAR).

Decision Making

DOCE: Finding the Sweet Spot for Execution-Based Code Generation

1 code implementation25 Aug 2024 Haau-Sing Li, Patrick Fernandes, Iryna Gurevych, André F. T. Martins

Recently, a diverse set of decoding and reranking procedures have been shown effective for LLM-based code generation.

Code Generation

Grounding Fallacies Misrepresenting Scientific Publications in Evidence

1 code implementation23 Aug 2024 Max Glockner, Yufang Hou, Preslav Nakov, Iryna Gurevych

Health-related misinformation claims often falsely cite a credible biomedical publication as evidence, which superficially appears to support the false claim.

Fact Checking Logical Fallacies +3

"Image, Tell me your story!" Predicting the original meta-context of visual misinformation

1 code implementation19 Aug 2024 Jonathan Tonglet, Marie-Francine Moens, Iryna Gurevych

By explaining what is actually true about the image, fact-checkers can better detect misinformation, focus their efforts on check-worthy visual content, engage in counter-messaging before misinformation spreads widely, and make their explanation more convincing.

Fact Checking Misinformation

Problem Solving Through Human-AI Preference-Based Cooperation

no code implementations14 Aug 2024 Subhabrata Dutta, Timo Kaufmann, Goran Glavaš, Ivan Habernal, Kristian Kersting, Frauke Kreuter, Mira Mezini, Iryna Gurevych, Eyke Hüllermeier, Hinrich Schuetze

While there is a widespread belief that artificial general intelligence (AGI) -- or even superhuman AI -- is imminent, complex problems in expert domains are far from being solved.

OpenFactCheck: A Unified Framework for Factuality Evaluation of LLMs

2 code implementations6 Aug 2024 Hasan Iqbal, Yuxia Wang, Minghan Wang, Georgi Georgiev, Jiahui Geng, Iryna Gurevych, Preslav Nakov

The increased use of large language models (LLMs) across a variety of real-world applications calls for automatic tools to check the factual accuracy of their outputs, as LLMs often hallucinate.

Fact Checking

A Course Shared Task on Evaluating LLM Output for Clinical Questions

1 code implementation31 Jul 2024 Yufang Hou, Thy Thy Tran, Doan Nam Long Vu, Yiwen Cao, Kai Li, Lukas Rohde, Iryna Gurevych

This paper presents a shared task that we organized at the Foundations of Language Technology (FoLT) course in 2023/2024 at the Technical University of Darmstadt, which focuses on evaluating the output of Large Language Models (LLMs) in generating harmful answers to health-related clinical questions.

Modular Sentence Encoders: Separating Language Specialization from Cross-Lingual Alignment

1 code implementation20 Jul 2024 Yongxin Huang, Kexin Wang, Goran Glavaš, Iryna Gurevych

Another limitation of multilingual sentence encoders is the trade-off between monolingual and cross-lingual performance.

Contrastive Learning Multiple-choice +3

$\textit{GeoHard}$: Towards Measuring Class-wise Hardness through Modelling Class Semantics

no code implementations17 Jul 2024 Fengyu Cai, Xinran Zhao, Hongming Zhang, Iryna Gurevych, Heinz Koeppl

Recent advances in measuring hardness-wise properties of data guide language models in sample selection within low-resource scenarios.

Natural Language Understanding

Localizing and Mitigating Errors in Long-form Question Answering

1 code implementation16 Jul 2024 Rachneet Sachdeva, Yixiao Song, Mohit Iyyer, Iryna Gurevych

This work introduces HaluQuestQA, the first hallucination dataset with localized error annotations for human-written and model-generated LFQA answers.

Hallucination Long Form Question Answering

InferAct: Inferring Safe Actions for LLM-Based Agents Through Preemptive Evaluation and Human Feedback

no code implementations16 Jul 2024 Haishuo Fang, Xiaodan Zhu, Iryna Gurevych

A crucial requirement for deploying LLM-based agents in real-life applications is the robustness against risky or even irreversible mistakes.

Decision Making

$\texttt{MixGR}$: Enhancing Retriever Generalization for Scientific Domain through Complementary Granularity

1 code implementation15 Jul 2024 Fengyu Cai, Xinran Zhao, Tong Chen, Sihao Chen, Hongming Zhang, Iryna Gurevych, Heinz Koeppl

Recent studies show the growing significance of document retrieval in the generation of LLMs, i. e., RAG, within the scientific domain by bridging their knowledge gap.

Question Answering RAG +1

HDT: Hierarchical Document Transformer

no code implementations11 Jul 2024 Haoyu He, Markus Flicke, Jan Buchmann, Iryna Gurevych, Andreas Geiger

We address the technical challenge of implementing HDT's sample-dependent hierarchical attention pattern by developing a novel sparse attention kernel that considers the hierarchical structure of documents.

Inductive Bias

Attribute or Abstain: Large Language Models as Long Document Assistants

1 code implementation10 Jul 2024 Jan Buchmann, Xiao Liu, Iryna Gurevych

This is crucially different from the long document setting, where retrieval is not needed, but could help.

Attribute RAG +2

Systematic Task Exploration with LLMs: A Study in Citation Text Generation

1 code implementation4 Jul 2024 Furkan Şahinuç, Ilia Kuznetsov, Yufang Hou, Iryna Gurevych

Large language models (LLMs) bring unprecedented flexibility in defining and executing complex, creative natural language generation (NLG) tasks.

Text Generation

LLM Roleplay: Simulating Human-Chatbot Interaction

1 code implementation4 Jul 2024 Hovhannes Tamoyan, Hendrik Schuff, Iryna Gurevych

The development of chatbots requires collecting a large number of human-chatbot dialogues to reflect the breadth of users' sociodemographic backgrounds and conversational goals.

Chatbot

M2QA: Multi-domain Multilingual Question Answering

1 code implementation1 Jul 2024 Leon Engländer, Hannah Sterz, Clifton Poth, Jonas Pfeiffer, Ilia Kuznetsov, Iryna Gurevych

While adapting NLP models to new languages within a single domain, or to new domains within a single language, is widely studied, research in joint adaptation is hampered by the lack of evaluation datasets.

Question Answering

DARA: Decomposition-Alignment-Reasoning Autonomous Language Agent for Question Answering over Knowledge Graphs

1 code implementation11 Jun 2024 Haishuo Fang, Xiaodan Zhu, Iryna Gurevych

Answering Questions over Knowledge Graphs (KGQA) is key to well-functioning autonomous language agents in various real-life applications.

In-Context Learning Knowledge Graphs +1

SpaRC and SpaRP: Spatial Reasoning Characterization and Path Generation for Understanding Spatial Reasoning Capability of Large Language Models

1 code implementation7 Jun 2024 Md Imbesat Hassan Rizvi, Xiaodan Zhu, Iryna Gurevych

In this work, we present a comprehensive study of the capability of current state-of-the-art large language models (LLMs) on spatial reasoning.

Spatial Reasoning

Culturally Aware and Adapted NLP: A Taxonomy and a Survey of the State of the Art

no code implementations6 Jun 2024 Chen Cecilia Liu, Iryna Gurevych, Anna Korhonen

The surge of interest in culturally aware and adapted Natural Language Processing (NLP) has inspired much recent research.

Missci: Reconstructing Fallacies in Misrepresented Science

1 code implementation5 Jun 2024 Max Glockner, Yufang Hou, Preslav Nakov, Iryna Gurevych

Unlike previous fallacy detection datasets, Missci (i) focuses on implicit fallacies between the relevant content of the cited publication and the inaccurate claim, and (ii) requires models to verbalize the fallacious reasoning in addition to classifying it.

Fact Checking Misinformation

Re3: A Holistic Framework and Dataset for Modeling Collaborative Document Revision

no code implementations31 May 2024 Qian Ruan, Ilia Kuznetsov, Iryna Gurevych

Collaborative review and revision of textual documents is the core of knowledge work and a promising target for empirical analysis and NLP assistance.

Holmes: A Benchmark to Assess the Linguistic Competence of Language Models

no code implementations29 Apr 2024 Andreas Waldis, Yotam Perlitz, Leshem Choshen, Yufang Hou, Iryna Gurevych

We introduce Holmes, a new benchmark designed to assess language models (LMs) linguistic competence - their unconscious understanding of linguistic phenomena.

Part-Of-Speech Tagging

Enabling Natural Zero-Shot Prompting on Encoder Models via Statement-Tuning

no code implementations19 Apr 2024 Ahmed Elshabrawy, Yongxin Huang, Iryna Gurevych, Alham Fikri Aji

While Large Language Models (LLMs) exhibit remarkable capabilities in zero-shot and few-shot scenarios, they often require computationally prohibitive sizes.

Diversity Zero-shot Generalization

Constrained C-Test Generation via Mixed-Integer Programming

1 code implementation12 Apr 2024 Ji-Ung Lee, Marc E. Pfetsch, Iryna Gurevych

This work proposes a novel method to generate C-Tests; a deviated form of cloze tests (a gap filling exercise) where only the last part of a word is turned into a gap.

Early Period of Training Impacts Out-of-Distribution Generalization

no code implementations22 Mar 2024 Chen Cecilia Liu, Iryna Gurevych

Prior research has found that differences in the early period of neural network training significantly impact the performance of in-distribution (ID) tasks.

Out-of-Distribution Generalization

Multimodal Large Language Models to Support Real-World Fact-Checking

no code implementations6 Mar 2024 Jiahui Geng, Yova Kementchedjhieva, Preslav Nakov, Iryna Gurevych

To the best of our knowledge, we are the first to evaluate MLLMs for real-world fact-checking.

Fact Checking

Socratic Reasoning Improves Positive Text Rewriting

no code implementations5 Mar 2024 Anmol Goel, Nico Daheim, Iryna Gurevych

In this work, we address this gap by augmenting open-source datasets for positive text rewriting with synthetically-generated Socratic rationales using a novel framework called \textsc{SocraticReframe}.

Language Modelling Large Language Model

Variational Learning is Effective for Large Deep Networks

1 code implementation27 Feb 2024 Yuesong Shen, Nico Daheim, Bai Cong, Peter Nickl, Gian Maria Marconi, Clement Bazan, Rio Yokota, Iryna Gurevych, Daniel Cremers, Mohammad Emtiyaz Khan, Thomas Möllenhoff

We give extensive empirical evidence against the common belief that variational learning is ineffective for large neural networks.

Zero-shot Sentiment Analysis in Low-Resource Languages Using a Multilingual Sentiment Lexicon

no code implementations3 Feb 2024 Fajri Koto, Tilman Beck, Zeerak Talat, Iryna Gurevych, Timothy Baldwin

Improving multilingual language models capabilities in low-resource languages is generally difficult due to the scarcity of large-scale data in those languages.

Sentence Sentiment Analysis

Dive into the Chasm: Probing the Gap between In- and Cross-Topic Generalization

1 code implementation2 Feb 2024 Andreas Waldis, Yufang Hou, Iryna Gurevych

Pre-trained language models (LMs) perform well in In-Topic setups, where training and testing data come from the same topics.

Document Structure in Long Document Transformers

no code implementations31 Jan 2024 Jan Buchmann, Max Eichler, Jan-Micha Bodensohn, Ilia Kuznetsov, Iryna Gurevych

Long documents often exhibit structure with hierarchically organized elements of different functions, such as section headers and paragraphs.

Learning from Implicit User Feedback, Emotions and Demographic Information in Task-Oriented and Document-Grounded Dialogues

1 code implementation17 Jan 2024 Dominic Petrak, Thy Thy Tran, Iryna Gurevych

Implicit user feedback, user emotions and demographic information have shown to be promising sources for improving the accuracy and user engagement of responses generated by dialogue systems.

A Survey of Confidence Estimation and Calibration in Large Language Models

no code implementations14 Nov 2023 Jiahui Geng, Fengyu Cai, Yuxia Wang, Heinz Koeppl, Preslav Nakov, Iryna Gurevych

Assessing their confidence and calibrating them across different tasks can help mitigate risks and enable LLMs to produce better generations.

Language Modelling

A Template Is All You Meme

1 code implementation11 Nov 2023 Luke Bates, Peter Ebert Christensen, Preslav Nakov, Iryna Gurevych

Here, to aid understanding of memes, we release a knowledge base of memes and information found on www. knowyourmeme. com, which we call the Know Your Meme Knowledge Base (KYMKB), composed of more than 54, 000 images.

Exploring Jiu-Jitsu Argumentation for Writing Peer Review Rebuttals

1 code implementation7 Nov 2023 Sukannya Purkayastha, Anne Lauscher, Iryna Gurevych

In this work, we are the first to explore Jiu-Jitsu argumentation for peer review by proposing the novel task of attitude and theme-guided rebuttal generation.

Sentence

Learning From Free-Text Human Feedback -- Collect New Datasets Or Extend Existing Ones?

1 code implementation24 Oct 2023 Dominic Petrak, Nafise Sadat Moosavi, Ye Tian, Nikolai Rozanov, Iryna Gurevych

Learning from free-text human feedback is essential for dialog systems, but annotated data is scarce and usually covers only a small fraction of error types known in conversational AI.

Chatbot Response Generation +1

Model Merging by Uncertainty-Based Gradient Matching

1 code implementation19 Oct 2023 Nico Daheim, Thomas Möllenhoff, Edoardo Maria Ponti, Iryna Gurevych, Mohammad Emtiyaz Khan

Models trained on different datasets can be merged by a weighted-averaging of their parameters, but why does it work and when can it fail?

Task Arithmetic

Measuring Pointwise $\mathcal{V}$-Usable Information In-Context-ly

1 code implementation18 Oct 2023 Sheng Lu, Shan Chen, Yingya Li, Danielle Bitterman, Guergana Savova, Iryna Gurevych

In-context learning (ICL) is a new learning paradigm that has gained popularity along with the development of large language models.

In-Context Learning

Are Multilingual LLMs Culturally-Diverse Reasoners? An Investigation into Multicultural Proverbs and Sayings

1 code implementation15 Sep 2023 Chen Cecilia Liu, Fajri Koto, Timothy Baldwin, Iryna Gurevych

Large language models (LLMs) are highly adept at question answering and reasoning tasks, but when reasoning in a situational context, human expectations vary depending on the relevant cultural common ground.

Question Answering

CATfOOD: Counterfactual Augmented Training for Improving Out-of-Domain Performance and Calibration

1 code implementation14 Sep 2023 Rachneet Sachdeva, Martin Tutek, Iryna Gurevych

In recent years, large language models (LLMs) have shown remarkable capabilities at scale, particularly at generating text conditioned on a prompt.

counterfactual Data Augmentation +3

Sensitivity, Performance, Robustness: Deconstructing the Effect of Sociodemographic Prompting

1 code implementation13 Sep 2023 Tilman Beck, Hendrik Schuff, Anne Lauscher, Iryna Gurevych

However, the available NLP literature disagrees on the efficacy of this technique - it remains unclear for which tasks and scenarios it can help, and the role of the individual factors in sociodemographic prompting is still unexplored.

Hate Speech Detection Zero-Shot Learning

Are Emergent Abilities in Large Language Models just In-Context Learning?

1 code implementation4 Sep 2023 Sheng Lu, Irina Bigoulaeva, Rachneet Sachdeva, Harish Tayyar Madabushi, Iryna Gurevych

Large language models, comprising billions of parameters and pre-trained on extensive web-scale corpora, have been claimed to acquire certain capabilities without having been specifically trained on them.

In-Context Learning Instruction Following +1

SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot Neural Sparse Retrieval

1 code implementation19 Jul 2023 Nandan Thakur, Kexin Wang, Iryna Gurevych, Jimmy Lin

In this work, we provide SPRINT, a unified Python toolkit based on Pyserini and Lucene, supporting a common interface for evaluating neural sparse retrieval.

Information Retrieval Retrieval

Analyzing Dataset Annotation Quality Management in the Wild

2 code implementations16 Jul 2023 Jan-Christoph Klie, Richard Eckart de Castilho, Iryna Gurevych

A majority of the annotated publications apply good or excellent quality management.

Management

Surveying (Dis)Parities and Concerns of Compute Hungry NLP Research

no code implementations29 Jun 2023 Ji-Ung Lee, Haritz Puerto, Betty van Aken, Yuki Arase, Jessica Zosa Forde, Leon Derczynski, Andreas Rücklé, Iryna Gurevych, Roy Schwartz, Emma Strubell, Jesse Dodge

Many recent improvements in NLP stem from the development and use of large pre-trained language models (PLMs) with billions of parameters.

UKP-SQuARE: An Interactive Tool for Teaching Question Answering

1 code implementation31 May 2023 Haishuo Fang, Haritz Puerto, Iryna Gurevych

To evaluate the effectiveness of UKP-SQuARE in teaching scenarios, we adopted it in a postgraduate NLP course and surveyed the students after the course.

Information Retrieval Question Answering +1

Dior-CVAE: Pre-trained Language Models and Diffusion Priors for Variational Dialog Generation

1 code implementation24 May 2023 Tianyu Yang, Thy Thy Tran, Iryna Gurevych

These models also suffer from posterior collapse, i. e., the decoder tends to ignore latent variables and directly access information captured in the encoder through the cross-attention mechanism.

Decoder Diversity +2

DAPR: A Benchmark on Document-Aware Passage Retrieval

3 code implementations23 May 2023 Kexin Wang, Nils Reimers, Iryna Gurevych

This drives us to build a benchmark for this task including multiple datasets from heterogeneous domains.

Passage Retrieval Retrieval

MathDial: A Dialogue Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems

1 code implementation23 May 2023 Jakub Macina, Nico Daheim, Sankalan Pal Chowdhury, Tanmay Sinha, Manu Kapur, Iryna Gurevych, Mrinmaya Sachan

While automatic dialogue tutors hold great potential in making education personalized and more accessible, research on such systems has been hampered by a lack of sufficiently large and high-quality datasets.

Language Modelling Large Language Model +1

A Diachronic Analysis of Paradigm Shifts in NLP Research: When, How, and Why?

no code implementations22 May 2023 Aniket Pramanick, Yufang Hou, Saif M. Mohammad, Iryna Gurevych

In this study, we propose a systematic framework for analyzing the evolution of research topics in a scientific field using causal discovery and inference techniques.

Causal Discovery

Romanization-based Large-scale Adaptation of Multilingual Language Models

no code implementations18 Apr 2023 Sukannya Purkayastha, Sebastian Ruder, Jonas Pfeiffer, Iryna Gurevych, Ivan Vulić

In order to boost the capacity of mPLMs to deal with low-resource and unseen languages, we explore the potential of leveraging transliteration on a massive scale.

Cross-Lingual Transfer Transliteration

UKP-SQuARE v3: A Platform for Multi-Agent QA Research

1 code implementation31 Mar 2023 Haritz Puerto, Tim Baumgärtner, Rachneet Sachdeva, Haishuo Fang, Hao Zhang, Sewin Tariverdian, Kexin Wang, Iryna Gurevych

To ease research in multi-agent models, we extend UKP-SQuARE, an online platform for QA research, to support three families of multi-agent systems: i) agent selection, ii) early-fusion of agents, and iii) late-fusion of agents.

Question Answering

Elastic Weight Removal for Faithful and Abstractive Dialogue Generation

1 code implementation30 Mar 2023 Nico Daheim, Nouha Dziri, Mrinmaya Sachan, Iryna Gurevych, Edoardo M. Ponti

We evaluate our method -- using different variants of Flan-T5 as a backbone language model -- on multiple datasets for information-seeking dialogue generation and compare our method with state-of-the-art techniques for faithfulness, such as CTRL, Quark, DExperts, and Noisy Channel reranking.

Dialogue Generation Language Modelling

CARE: Collaborative AI-Assisted Reading Environment

1 code implementation24 Feb 2023 Dennis Zyska, Nils Dycke, Jan Buchmann, Ilia Kuznetsov, Iryna Gurevych

Recent years have seen impressive progress in AI-assisted writing, yet the developments in AI-assisted reading are lacking.

Question Answering text-classification +1

Like a Good Nearest Neighbor: Practical Content Moderation and Text Classification

1 code implementation17 Feb 2023 Luke Bates, Iryna Gurevych

Few-shot text classification systems have impressive capabilities but are infeasible to deploy and use reliably due to their dependence on prompting and billion-parameter language models.

Contrastive Learning Few-Shot Text Classification +2

Opportunities and Challenges in Neural Dialog Tutoring

1 code implementation24 Jan 2023 Jakub Macina, Nico Daheim, Lingzhi Wang, Tanmay Sinha, Manu Kapur, Iryna Gurevych, Mrinmaya Sachan

Designing dialog tutors has been challenging as it involves modeling the diverse and complex pedagogical strategies employed by human tutors.

FUN with Fisher: Improving Generalization of Adapter-Based Cross-lingual Transfer with Scheduled Unfreezing

1 code implementation13 Jan 2023 Chen Cecilia Liu, Jonas Pfeiffer, Ivan Vulić, Iryna Gurevych

Our experiments reveal that scheduled unfreezing induces different learning dynamics compared to standard fine-tuning, and provide evidence that the dynamics of Fisher Information during training correlate with cross-lingual generalization performance.

Cross-Lingual Transfer Transfer Learning

Python Code Generation by Asking Clarification Questions

1 code implementation19 Dec 2022 Haau-Sing Li, Mohsen Mesgar, André F. T. Martins, Iryna Gurevych

We hypothesize that the under-specification of a natural language description can be resolved by asking clarification questions.

Code Generation Language Modelling

CiteBench: A benchmark for Scientific Citation Text Generation

1 code implementation19 Dec 2022 Martin Funkquist, Ilia Kuznetsov, Yufang Hou, Iryna Gurevych

To address this challenge, we propose CiteBench: a benchmark for citation text generation that unifies multiple diverse datasets and enables standardized evaluation of citation text generation models across task designs and domains.

Text Generation

NLP meets psychotherapy: Using predicted client emotions and self-reported client emotions to measure emotional coherence

no code implementations22 Nov 2022 Neha Warikoo, Tobias Mayer, Dana Atzil-Slonim, Amir Eliassaf, Shira Haimovitz, Iryna Gurevych

No study has examined EC between the subjective experience of emotions and emotion expression in therapy or whether this coherence is associated with clients' well being.

Emotion Recognition

GDPR Compliant Collection of Therapist-Patient-Dialogues

no code implementations22 Nov 2022 Tobias Mayer, Neha Warikoo, Oliver Grimm, Andreas Reif, Iryna Gurevych

While these conversations are part of the daily routine of clinicians, gathering them is usually hindered by various ethical (purpose of data usage), legal (data privacy) and technical (data formatting) limitations.

NLPeer: A Unified Resource for the Computational Study of Peer Review

1 code implementation12 Nov 2022 Nils Dycke, Ilia Kuznetsov, Iryna Gurevych

Peer review constitutes a core component of scholarly publishing; yet it demands substantial expertise and training, and is susceptible to errors and biases.

An Inclusive Notion of Text

no code implementations10 Nov 2022 Ilia Kuznetsov, Iryna Gurevych

Natural language processing (NLP) researchers develop models of grammar, meaning and communication based on written text.

Effective Cross-Task Transfer Learning for Explainable Natural Language Inference with T5

1 code implementation31 Oct 2022 Irina Bigoulaeva, Rachneet Sachdeva, Harish Tayyar Madabushi, Aline Villavicencio, Iryna Gurevych

We compare sequential fine-tuning with a model for multi-task learning in the context where we are interested in boosting performance on two tasks, one of which depends on the other.

Multi-Task Learning Natural Language Inference

Missing Counter-Evidence Renders NLP Fact-Checking Unrealistic for Misinformation

1 code implementation25 Oct 2022 Max Glockner, Yufang Hou, Iryna Gurevych

In our analysis, we show that, by design, existing NLP task definitions for fact-checking cannot refute misinformation as professional fact-checkers do for the majority of claims.

Fact Checking Misinformation

Incorporating Relevance Feedback for Information-Seeking Retrieval using Few-Shot Document Re-Ranking

1 code implementation19 Oct 2022 Tim Baumgärtner, Leonardo F. R. Ribeiro, Nils Reimers, Iryna Gurevych

Pairing a lexical retriever with a neural re-ranking model has set state-of-the-art performance on large-scale information retrieval datasets.

Argument Retrieval Information Retrieval +4

One does not fit all! On the Complementarity of Vision Encoders for Vision and Language Tasks

no code implementations12 Oct 2022 Gregor Geigle, Chen Cecilia Liu, Jonas Pfeiffer, Iryna Gurevych

While many VEs -- of different architectures, trained on different data and objectives -- are publicly available, they are not designed for the downstream V+L tasks.

The Devil is in the Details: On Models and Training Regimes for Few-Shot Intent Classification

no code implementations12 Oct 2022 Mohsen Mesgar, Thy Thy Tran, Goran Glavas, Iryna Gurevych

First, the unexplored combination of the cross-encoder architecture (with parameterized similarity scoring function) and episodic meta-learning consistently yields the best FSIC performance.

intent-classification Intent Classification +1

Transformers with Learnable Activation Functions

2 code implementations30 Aug 2022 Haishuo Fang, Ji-Ung Lee, Nafise Sadat Moosavi, Iryna Gurevych

In contrast to conventional, predefined activation functions, RAFs can adaptively learn optimal activation functions during training according to input data.

UKP-SQuARE v2: Explainability and Adversarial Attacks for Trustworthy QA

1 code implementation19 Aug 2022 Rachneet Sachdeva, Haritz Puerto, Tim Baumgärtner, Sewin Tariverdian, Hao Zhang, Kexin Wang, Hossain Shaikh Saadi, Leonardo F. R. Ribeiro, Iryna Gurevych

In this paper, we introduce SQuARE v2, the new version of SQuARE, to provide an explainability infrastructure for comparing models based on methods such as saliency maps and graph-based explanations.

Adversarial Attack Explainable Models +2

TexPrax: A Messaging Application for Ethical, Real-time Data Collection and Annotation

1 code implementation16 Aug 2022 Lorenz Stangier, Ji-Ung Lee, Yuxi Wang, Marvin Müller, Nicholas Frick, Joachim Metternich, Iryna Gurevych

We evaluate TexPrax in a user-study with German factory employees who ask their colleagues for solutions on problems that arise during their daily work.

Chatbot Sentence

Mining Legal Arguments in Court Decisions

1 code implementation12 Aug 2022 Ivan Habernal, Daniel Faber, Nicola Recchia, Sebastian Bretthauer, Iryna Gurevych, Indra Spiecker genannt Döhmann, Christoph Burchard

Identifying, classifying, and analyzing arguments in legal discourse has been a prominent area of research since the inception of the argument mining field.

Argument Mining

Annotation Error Detection: Analyzing the Past and Present for a More Coherent Future

1 code implementation5 Jun 2022 Jan-Christoph Klie, Bonnie Webber, Iryna Gurevych

While researchers show that their approaches work well on their newly introduced datasets, they rarely compare their methods to previous work or on the same datasets.

text-classification Text Classification

Diversity Over Size: On the Effect of Sample and Topic Sizes for Topic-Dependent Argument Mining Datasets

1 code implementation23 May 2022 Benjamin Schiller, Johannes Daxenberger, Andreas Waldis, Iryna Gurevych

The task of Argument Mining, that is extracting and classifying argument components for a specific topic from large document sources, is an inherently difficult task for machine learning models and humans alike, as large Argument Mining datasets are rare and recognition of argument components requires expert knowledge.

Argument Mining Benchmarking +2

Arithmetic-Based Pretraining -- Improving Numeracy of Pretrained Language Models

2 code implementations13 May 2022 Dominic Petrak, Nafise Sadat Moosavi, Iryna Gurevych

In this paper, we propose a new extended pretraining approach called Arithmetic-Based Pretraining that jointly addresses both in one extended pretraining step without requiring architectural changes or pretraining from scratch.

Contrastive Learning Reading Comprehension +1

Adaptable Adapters

1 code implementation NAACL 2022 Nafise Sadat Moosavi, Quentin Delfosse, Kristian Kersting, Iryna Gurevych

The resulting adapters (a) contain about 50% of the learning parameters of the standard adapter and are therefore more efficient at training and inference, and require less storage space, and (b) achieve considerably higher performances in low-data settings.

Revise and Resubmit: An Intertextual Model of Text-based Collaboration in Peer Review

1 code implementation22 Apr 2022 Ilia Kuznetsov, Jan Buchmann, Max Eichler, Iryna Gurevych

While existing NLP studies focus on the analysis of individual texts, editorial assistance often requires modeling interactions between pairs of texts -- yet general frameworks and datasets to support this scenario are missing.

FactGraph: Evaluating Factuality in Summarization with Semantic Graph Representations

3 code implementations NAACL 2022 Leonardo F. R. Ribeiro, Mengwen Liu, Iryna Gurevych, Markus Dreyer, Mohit Bansal

Despite recent improvements in abstractive summarization, most current approaches generate summaries that are not factually consistent with the source document, severely restricting their trust and usage in real-world applications.

Abstractive Text Summarization ARC

UKP-SQUARE: An Online Platform for Question Answering Research

1 code implementation ACL 2022 Tim Baumgärtner, Kexin Wang, Rachneet Sachdeva, Max Eichler, Gregor Geigle, Clifton Poth, Hannah Sterz, Haritz Puerto, Leonardo F. R. Ribeiro, Jonas Pfeiffer, Nils Reimers, Gözde Gül Şahin, Iryna Gurevych

Recent advances in NLP and information retrieval have given rise to a diverse set of question answering tasks that are of different formats (e. g., extractive, abstractive), require different model architectures (e. g., generative, discriminative), and setups (e. g., with or without retrieval).

Explainable Models Information Retrieval +2

Delving Deeper into Cross-lingual Visual Question Answering

1 code implementation15 Feb 2022 Chen Liu, Jonas Pfeiffer, Anna Korhonen, Ivan Vulić, Iryna Gurevych

2) We analyze cross-lingual VQA across different question types of varying complexity for different multilingual multimodal Transformers, and identify question types that are the most difficult to improve on.

Inductive Bias Question Answering +1

ArgSciChat: A Dataset for Argumentative Dialogues on Scientific Papers

2 code implementations14 Feb 2022 Federico Ruggeri, Mohsen Mesgar, Iryna Gurevych

The applications of conversational agents for scientific disciplines (as expert domains) are understudied due to the lack of dialogue data to train such agents.

Fact Selection Response Generation

Yes-Yes-Yes: Proactive Data Collection for ACL Rolling Review and Beyond

1 code implementation27 Jan 2022 Nils Dycke, Ilia Kuznetsov, Iryna Gurevych

The shift towards publicly available text sources has enabled language processing at unprecedented scale, yet leaves under-serviced the domains where public and openly licensed data is scarce.

MetaQA: Combining Expert Agents for Multi-Skill Question Answering

1 code implementation3 Dec 2021 Haritz Puerto, Gözde Gül Şahin, Iryna Gurevych

The recent explosion of question answering (QA) datasets and models has increased the interest in the generalization of models across multiple domains and formats by either training on multiple datasets or by combining multiple models.

Question Answering

TxT: Crossmodal End-to-End Learning with Transformers

no code implementations9 Sep 2021 Jan-Martin O. Steitz, Jonas Pfeiffer, Iryna Gurevych, Stefan Roth

Reasoning over multiple modalities, e. g. in Visual Question Answering (VQA), requires an alignment of semantic concepts across domains.

Multimodal Reasoning Question Answering +1

Avoiding Inference Heuristics in Few-shot Prompt-based Finetuning

1 code implementation EMNLP 2021 Prasetya Ajie Utama, Nafise Sadat Moosavi, Victor Sanh, Iryna Gurevych

Recent prompt-based approaches allow pretrained language models to achieve strong performances on few-shot finetuning by reformulating downstream tasks as a language modeling problem.

Language Modelling Sentence +1

Assisting Decision Making in Scholarly Peer Review: A Preference Learning Perspective

no code implementations2 Sep 2021 Nils Dycke, Edwin Simpson, Ilia Kuznetsov, Iryna Gurevych

Peer review is the primary means of quality control in academia; as an outcome of a peer review process, program and area chairs make acceptance decisions for each paper based on the review reports and scores they received.

Decision Making Fairness

AdapterHub Playground: Simple and Flexible Few-Shot Learning with Adapters

1 code implementation ACL 2022 Tilman Beck, Bela Bohlender, Christina Viehmann, Vincent Hane, Yanik Adamson, Jaber Khuri, Jonas Brossmann, Jonas Pfeiffer, Iryna Gurevych

The open-access dissemination of pretrained language models through online repositories has led to a democratization of state-of-the-art natural language processing (NLP) research.

Few-Shot Learning Transfer Learning

Scientia Potentia Est -- On the Role of Knowledge in Computational Argumentation

no code implementations1 Jul 2021 Anne Lauscher, Henning Wachsmuth, Iryna Gurevych, Goran Glavaš

Despite extensive research efforts in recent years, computational argumentation (CA) remains one of the most challenging areas of natural language processing.

Common Sense Reasoning Natural Language Understanding

Annotation Curricula to Implicitly Train Non-Expert Annotators

1 code implementation CL (ACL) 2022 Ji-Ung Lee, Jan-Christoph Klie, Iryna Gurevych

Annotation studies often require annotators to familiarize themselves with the task, its annotation scheme, and the data domain.

Sentence

Metaphor Generation with Conceptual Mappings

1 code implementation ACL 2021 Kevin Stowe, Tuhin Chakrabarty, Nanyun Peng, Smaranda Muresan, Iryna Gurevych

Guided by conceptual metaphor theory, we propose to control the generation process by encoding conceptual mappings between cognitive domains to generate meaningful metaphoric expressions.

Sentence

Investigating label suggestions for opinion mining in German Covid-19 social media

1 code implementation ACL 2021 Tilman Beck, Ji-Ung Lee, Christina Viehmann, Marcus Maurer, Oliver Quiring, Iryna Gurevych

This work investigates the use of interactively updated label suggestions to improve upon the efficiency of gathering annotations on the task of opinion mining in German Covid-19 social media data.

Opinion Mining Transfer Learning

BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models

3 code implementations17 Apr 2021 Nandan Thakur, Nils Reimers, Andreas Rücklé, Abhishek Srivastava, Iryna Gurevych

To address this, and to facilitate researchers to broadly evaluate the effectiveness of their models, we introduce Benchmarking-IR (BEIR), a robust and heterogeneous evaluation benchmark for information retrieval.

Argument Retrieval Benchmarking +11

Learning to Reason for Text Generation from Scientific Tables

2 code implementations16 Apr 2021 Nafise Sadat Moosavi, Andreas Rücklé, Dan Roth, Iryna Gurevych

In this paper, we introduce SciGen, a new challenge dataset for the task of reasoning-aware data-to-text generation consisting of tables from scientific articles and their corresponding descriptions.

Arithmetic Reasoning Data-to-Text Generation

What to Pre-Train on? Efficient Intermediate Task Selection

1 code implementation EMNLP 2021 Clifton Poth, Jonas Pfeiffer, Andreas Rücklé, Iryna Gurevych

Our best methods achieve an average Regret@3 of less than 1% across all target tasks, demonstrating that we are able to efficiently identify the best datasets for intermediate training.

Multiple-choice Question Answering +1

Retrieve Fast, Rerank Smart: Cooperative and Joint Approaches for Improved Cross-Modal Retrieval

1 code implementation22 Mar 2021 Gregor Geigle, Jonas Pfeiffer, Nils Reimers, Ivan Vulić, Iryna Gurevych

Current state-of-the-art approaches to cross-modal retrieval process text and visual input jointly, relying on Transformer-based architectures with cross-attention mechanisms that attend over all words and objects in an image.

Cross-Modal Retrieval Retrieval

Structural Adapters in Pretrained Language Models for AMR-to-text Generation

1 code implementation EMNLP 2021 Leonardo F. R. Ribeiro, Yue Zhang, Iryna Gurevych

Pretrained language models (PLM) have recently advanced graph-to-text generation, where the input graph is linearized into a sequence and fed into the PLM to obtain its representation.

AMR-to-Text Generation Data-to-Text Generation