Search Results for author: Elizabeth Clark

Found 18 papers, 3 papers with code

Don't Take This Out of Context! On the Need for Contextual Models and Evaluations for Stylistic Rewriting

no code implementations24 May 2023 Akhila Yerukola, Xuhui Zhou, Elizabeth Clark, Maarten Sap

Most existing stylistic text rewriting methods and evaluation metrics operate on a sentence level, but ignoring the broader context of the text can lead to preferring generic, ambiguous, and incoherent rewrites.

Sentence

mFACE: Multilingual Summarization with Factual Consistency Evaluation

no code implementations20 Dec 2022 Roee Aharoni, Shashi Narayan, Joshua Maynez, Jonathan Herzig, Elizabeth Clark, Mirella Lapata

Abstractive summarization has enjoyed renewed interest in recent years, thanks to pre-trained language models and the availability of large-scale datasets.

Abstractive Text Summarization

Needle in a Haystack: An Analysis of High-Agreement Workers on MTurk for Summarization

1 code implementation20 Dec 2022 Lining Zhang, Simon Mille, Yufang Hou, Daniel Deutsch, Elizabeth Clark, Yixin Liu, Saad Mahamood, Sebastian Gehrmann, Miruna Clinciu, Khyathi Chandu, João Sedoc

To prevent the costly and inefficient use of resources on low-quality annotations, we want a method for creating a pool of dependable annotators who can effectively complete difficult tasks, such as evaluating automatic summarization.

Dialect-robust Evaluation of Generated Text

no code implementations2 Nov 2022 Jiao Sun, Thibault Sellam, Elizabeth Clark, Tu Vu, Timothy Dozat, Dan Garrette, Aditya Siddhant, Jacob Eisenstein, Sebastian Gehrmann

Evaluation metrics that are not robust to dialect variation make it impossible to tell how well systems perform for many groups of users, and can even penalize systems for producing text in lower-resource dialects.

nlg evaluation

GEMv2: Multilingual NLG Benchmarking in a Single Line of Code

no code implementations22 Jun 2022 Sebastian Gehrmann, Abhik Bhattacharjee, Abinaya Mahendiran, Alex Wang, Alexandros Papangelis, Aman Madaan, Angelina McMillan-Major, Anna Shvets, Ashish Upadhyay, Bingsheng Yao, Bryan Wilie, Chandra Bhagavatula, Chaobin You, Craig Thomson, Cristina Garbacea, Dakuo Wang, Daniel Deutsch, Deyi Xiong, Di Jin, Dimitra Gkatzia, Dragomir Radev, Elizabeth Clark, Esin Durmus, Faisal Ladhak, Filip Ginter, Genta Indra Winata, Hendrik Strobelt, Hiroaki Hayashi, Jekaterina Novikova, Jenna Kanerva, Jenny Chim, Jiawei Zhou, Jordan Clive, Joshua Maynez, João Sedoc, Juraj Juraska, Kaustubh Dhole, Khyathi Raghavi Chandu, Laura Perez-Beltrachini, Leonardo F. R. Ribeiro, Lewis Tunstall, Li Zhang, Mahima Pushkarna, Mathias Creutz, Michael White, Mihir Sanjay Kale, Moussa Kamal Eddine, Nico Daheim, Nishant Subramani, Ondrej Dusek, Paul Pu Liang, Pawan Sasanka Ammanamanchi, Qi Zhu, Ratish Puduppully, Reno Kriz, Rifat Shahriyar, Ronald Cardenas, Saad Mahamood, Salomey Osei, Samuel Cahyawijaya, Sanja Štajner, Sebastien Montella, Shailza, Shailza Jolly, Simon Mille, Tahmid Hasan, Tianhao Shen, Tosin Adewumi, Vikas Raunak, Vipul Raheja, Vitaly Nikolaev, Vivian Tsai, Yacine Jernite, Ying Xu, Yisi Sang, Yixin Liu, Yufang Hou

This problem is especially pertinent in natural language generation which requires ever-improving suites of datasets, metrics, and human evaluation to make definitive claims.

Benchmarking Text Generation

Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated Text

no code implementations14 Feb 2022 Sebastian Gehrmann, Elizabeth Clark, Thibault Sellam

We summarize, categorize, and discuss how researchers have been addressing these issues and what their findings mean for the current state of model evaluations.

nlg evaluation Text Generation

All That's `Human' Is Not Gold: Evaluating Human Evaluation of Generated Text

no code implementations ACL 2021 Elizabeth Clark, Tal August, Sofia Serrano, Nikita Haduong, Suchin Gururangan, Noah A. Smith

Human evaluations are typically considered the gold standard in natural language generation, but as models{'} fluency improves, how well can evaluators detect and judge machine-generated text?

nlg evaluation Text Generation

All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text

no code implementations30 Jun 2021 Elizabeth Clark, Tal August, Sofia Serrano, Nikita Haduong, Suchin Gururangan, Noah A. Smith

Human evaluations are typically considered the gold standard in natural language generation, but as models' fluency improves, how well can evaluators detect and judge machine-generated text?

nlg evaluation Text Generation

Exploring the Effect of Author and Reader Identity in Online Story Writing: the STORIESINTHEWILD Corpus.

no code implementations WS 2020 Tal August, Maarten Sap, Elizabeth Clark, Katharina Reinecke, Noah A. Smith

We analyze the effect of author and reader characteristics and story writing setup on the quality of stories in a short storytelling task.

Evaluation of Text Generation: A Survey

no code implementations26 Jun 2020 Asli Celikyilmaz, Elizabeth Clark, Jianfeng Gao

The paper surveys evaluation methods of natural language generation (NLG) systems that have been developed in the last few years.

nlg evaluation Text Generation +1

Counterfactual Story Reasoning and Generation

1 code implementation IJCNLP 2019 Lianhui Qin, Antoine Bosselut, Ari Holtzman, Chandra Bhagavatula, Elizabeth Clark, Yejin Choi

Counterfactual reasoning requires predicting how alternative events, contrary to what actually happened, might have resulted in different outcomes.

counterfactual Counterfactual Reasoning +1

Sentence Mover's Similarity: Automatic Evaluation for Multi-Sentence Texts

no code implementations ACL 2019 Elizabeth Clark, Asli Celikyilmaz, Noah A. Smith

For evaluating machine-generated texts, automatic methods hold the promise of avoiding collection of human judgments, which can be expensive and time-consuming.

Semantic Similarity Semantic Textual Similarity +2

Cannot find the paper you are looking for? You can Submit a new open access paper.