Search Results for author: Daniel Deutsch

Found 24 papers, 10 papers with code

Understanding the Extent to which Content Quality Metrics Measure the Information Quality of Summaries

no code implementations • CoNLL (EMNLP) 2021 • Daniel Deutsch, Dan Roth

Reference-based metrics such as ROUGE or BERTScore evaluate the content quality of a summary by comparing the summary to a reference.

Question Answering

Paper
Add Code

On the Role of Summary Content Units in Text Summarization Evaluation

1 code implementation • 2 Apr 2024 • Marcel Nawrath, Agnieszka Nowak, Tristan Ratz, Danilo C. Walenta, Juri Opitz, Leonardo F. R. Ribeiro, João Sedoc, Daniel Deutsch, Simon Mille, Yixin Liu, Lining Zhang, Sebastian Gehrmann, Saad Mahamood, Miruna Clinciu, Khyathi Chandu, Yufang Hou

At the heart of the Pyramid evaluation method for text summarization lie human written summary content units (SCUs).

Natural Language Inference Sentence +1

Paper
Code

Finding Replicable Human Evaluations via Stable Ranking Probability

no code implementations • 1 Apr 2024 • Parker Riley, Daniel Deutsch, George Foster, Viresh Ratnakar, Ali Dabirmoghaddam, Markus Freitag

Reliable human evaluation is critical to the development of successful natural language generation models, but achieving it is notoriously difficult.

Machine Translation Text Generation

Paper
Add Code

LLMRefine: Pinpointing and Refining Large Language Models via Fine-Grained Actionable Feedback

no code implementations • 15 Nov 2023 • Wenda Xu, Daniel Deutsch, Mara Finkelstein, Juraj Juraska, Biao Zhang, Zhongtao Liu, William Yang Wang, Lei LI, Markus Freitag

Recent large language models (LLM) are leveraging human feedback to improve their generation quality.

Long Form Question Answering Machine Translation +2

Paper
Add Code

There's no Data Like Better Data: Using QE Metrics for MT Data Filtering

no code implementations • 9 Nov 2023 • Jan-Thorsten Peter, David Vilar, Daniel Deutsch, Mara Finkelstein, Juraj Juraska, Markus Freitag

Quality Estimation (QE), the evaluation of machine translation output without the need of explicit references, has seen big improvements in the last years with the use of neural metrics.

Machine Translation NMT +2

Paper
Add Code

The Eval4NLP 2023 Shared Task on Prompting Large Language Models as Explainable Metrics

1 code implementation • 30 Oct 2023 • Christoph Leiter, Juri Opitz, Daniel Deutsch, Yang Gao, Rotem Dror, Steffen Eger

Specifically, we propose a novel competition setting in which we select a list of allowed LLMs and disallow fine-tuning to ensure a focus on prompting.

Machine Translation Text Generation

Paper
Code

Training and Meta-Evaluating Machine Translation Evaluation Metrics at the Paragraph Level

no code implementations • 25 Aug 2023 • Daniel Deutsch, Juraj Juraska, Mara Finkelstein, Markus Freitag

As research on machine translation moves to translating text beyond the sentence level, it remains unclear how effective automatic evaluation metrics are at scoring longer translations.

Machine Translation Sentence

Paper
Add Code

The Devil is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation

no code implementations • 14 Aug 2023 • Patrick Fernandes, Daniel Deutsch, Mara Finkelstein, Parker Riley, André F. T. Martins, Graham Neubig, Ankush Garg, Jonathan H. Clark, Markus Freitag, Orhan Firat

Automatic evaluation of machine translation (MT) is a critical tool driving the rapid iterative development of MT systems.

In-Context Learning Informativeness +1

Paper
Add Code

Ties Matter: Meta-Evaluating Modern Metrics with Pairwise Accuracy and Tie Calibration

1 code implementation • 23 May 2023 • Daniel Deutsch, George Foster, Markus Freitag

Kendall's tau is frequently used to meta-evaluate how well machine translation (MT) evaluation metrics score individual translations.

Machine Translation

Paper
Code

Needle in a Haystack: An Analysis of High-Agreement Workers on MTurk for Summarization

1 code implementation • 20 Dec 2022 • Lining Zhang, Simon Mille, Yufang Hou, Daniel Deutsch, Elizabeth Clark, Yixin Liu, Saad Mahamood, Sebastian Gehrmann, Miruna Clinciu, Khyathi Chandu, João Sedoc

To prevent the costly and inefficient use of resources on low-quality annotations, we want a method for creating a pool of dependable annotators who can effectively complete difficult tasks, such as evaluating automatic summarization.

Paper
Code

On the Limitations of Reference-Free Evaluations of Generated Text

no code implementations • 22 Oct 2022 • Daniel Deutsch, Rotem Dror, Dan Roth

There is significant interest in developing evaluation metrics which accurately estimate the quality of generated text without the aid of a human-written reference text, which can be time consuming and expensive to collect or entirely unavailable in online applications.

Machine Translation

Paper
Add Code

GEMv2: Multilingual NLG Benchmarking in a Single Line of Code

no code implementations • 22 Jun 2022 • Sebastian Gehrmann, Abhik Bhattacharjee, Abinaya Mahendiran, Alex Wang, Alexandros Papangelis, Aman Madaan, Angelina McMillan-Major, Anna Shvets, Ashish Upadhyay, Bingsheng Yao, Bryan Wilie, Chandra Bhagavatula, Chaobin You, Craig Thomson, Cristina Garbacea, Dakuo Wang, Daniel Deutsch, Deyi Xiong, Di Jin, Dimitra Gkatzia, Dragomir Radev, Elizabeth Clark, Esin Durmus, Faisal Ladhak, Filip Ginter, Genta Indra Winata, Hendrik Strobelt, Hiroaki Hayashi, Jekaterina Novikova, Jenna Kanerva, Jenny Chim, Jiawei Zhou, Jordan Clive, Joshua Maynez, João Sedoc, Juraj Juraska, Kaustubh Dhole, Khyathi Raghavi Chandu, Laura Perez-Beltrachini, Leonardo F. R. Ribeiro, Lewis Tunstall, Li Zhang, Mahima Pushkarna, Mathias Creutz, Michael White, Mihir Sanjay Kale, Moussa Kamal Eddine, Nico Daheim, Nishant Subramani, Ondrej Dusek, Paul Pu Liang, Pawan Sasanka Ammanamanchi, Qi Zhu, Ratish Puduppully, Reno Kriz, Rifat Shahriyar, Ronald Cardenas, Saad Mahamood, Salomey Osei, Samuel Cahyawijaya, Sanja Štajner, Sebastien Montella, Shailza, Shailza Jolly, Simon Mille, Tahmid Hasan, Tianhao Shen, Tosin Adewumi, Vikas Raunak, Vipul Raheja, Vitaly Nikolaev, Vivian Tsai, Yacine Jernite, Ying Xu, Yisi Sang, Yixin Liu, Yufang Hou

This problem is especially pertinent in natural language generation which requires ever-improving suites of datasets, metrics, and human evaluation to make definitive claims.

Benchmarking Text Generation

Paper
Add Code

Repro: An Open-Source Library for Improving the Reproducibility and Usability of Publicly Available Research Code

1 code implementation • 29 Apr 2022 • Daniel Deutsch, Dan Roth

We introduce Repro, an open-source library which aims at improving the reproducibility and usability of research code.

Paper
Code

Benchmarking Answer Verification Methods for Question Answering-Based Summarization Evaluation Metrics

no code implementations • Findings (ACL) 2022 • Daniel Deutsch, Dan Roth

Question answering-based summarization evaluation metrics must automatically determine whether the QA model's prediction is correct or not, a task known as answer verification.

Attribute Benchmarking +1

Paper
Add Code

Re-Examining System-Level Correlations of Automatic Summarization Evaluation Metrics

no code implementations • NAACL 2022 • Daniel Deutsch, Rotem Dror, Dan Roth

How reliably an automatic summarization evaluation metric replicates human judgments of summary quality is quantified by system-level correlations.

Paper
Add Code

Incorporating Question Answering-Based Signals into Abstractive Summarization via Salient Span Selection

no code implementations • 15 Nov 2021 • Daniel Deutsch, Dan Roth

In this work, we propose a method for incorporating question-answering (QA) signals into a summarization model.

Abstractive Text Summarization Question Answering

Paper
Add Code

A Statistical Analysis of Summarization Evaluation Metrics using Resampling Methods

1 code implementation • 31 Mar 2021 • Daniel Deutsch, Rotem Dror, Dan Roth

After evaluating which of the proposed methods is most appropriate for summarization through two simulation experiments, we analyze the results of applying these methods to several different automatic evaluation metrics across three sets of human annotations.

Paper
Code

Is Killed More Significant than Fled? A Contextual Model for Salient Event Detection

no code implementations • COLING 2020 • Disha Jindal, Daniel Deutsch, Dan Roth

Identifying the key events in a document is critical to holistically understanding its important information.

Event Detection

Paper
Add Code

Understanding the Extent to which Summarization Evaluation Metrics Measure the Information Quality of Summaries

1 code implementation • 23 Oct 2020 • Daniel Deutsch, Dan Roth

Reference-based metrics such as ROUGE or BERTScore evaluate the content quality of a summary by comparing the summary to a reference.

Paper
Code

Towards Question-Answering as an Automatic Metric for Evaluating the Content Quality of a Summary

2 code implementations • 1 Oct 2020 • Daniel Deutsch, Tania Bedrax-Weiss, Dan Roth

A desirable property of a reference-based evaluation metric that measures the content quality of a summary is that it should estimate how much information that summary has in common with a reference.

Question Answering

Paper
Code

SacreROUGE: An Open-Source Library for Using and Developing Summarization Evaluation Metrics

1 code implementation • EMNLP (NLPOSS) 2020 • Daniel Deutsch, Dan Roth

We present SacreROUGE, an open-source library for using and developing summarization evaluation metrics.

129

Paper
Code

A General-Purpose Algorithm for Constrained Sequential Inference

no code implementations • CONLL 2019 • Daniel Deutsch, Shyam Upadhyay, Dan Roth

We experimentally show the benefits of our algorithm on constituency parsing and semantic role labeling.

Constituency Parsing Semantic Role Labeling +2

Paper
Add Code

Summary Cloze: A New Task for Content Selection in Topic-Focused Summarization

no code implementations • IJCNLP 2019 • Daniel Deutsch, Dan Roth

A key challenge in topic-focused summarization is determining what information should be included in the summary, a problem known as content selection.

Sentence

Paper
Add Code

A Distributional and Orthographic Aggregation Model for English Derivational Morphology

1 code implementation • ACL 2018 • Daniel Deutsch, John Hewitt, Dan Roth

Modeling derivational morphology to generate words with particular semantics is useful in many text generation tasks, such as machine translation or abstractive question answering.

abstractive question answering Machine Translation +3

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.