Search Results for author: Tanya Goyal

Found 20 papers, 15 papers with code

Contemporary NLP Modeling in Six Comprehensive Programming Assignments

no code implementations • NAACL (TeachingNLP) 2021 • Greg Durrett, Jifan Chen, Shrey Desai, Tanya Goyal, Lucas Kabela, Yasumasa Onoe, Jiacheng Xu

We present a series of programming assignments, adaptable to a range of experience levels from advanced undergraduate to PhD, to teach students design and implementation of modern NLP systems.

Paper
Add Code

FABLES: Evaluating faithfulness and content selection in book-length summarization

3 code implementations • 1 Apr 2024 • Yekyung Kim, Yapei Chang, Marzena Karpinska, Aparna Garimella, Varun Manjunatha, Kyle Lo, Tanya Goyal, Mohit Iyyer

While LLM-based auto-raters have proven reliable for factuality and coherence in other settings, we implement several LLM raters of faithfulness and find that none correlates strongly with human annotations, especially with regard to detecting unfaithful claims.

Long-Context Understanding

1,085

Paper
Code

Evaluating Large Language Models at Evaluating Instruction Following

1 code implementation • 11 Oct 2023 • Zhiyuan Zeng, Jiatong Yu, Tianyu Gao, Yu Meng, Tanya Goyal, Danqi Chen

As research in large language models (LLMs) continues to accelerate, LLM-based evaluation has emerged as a scalable and cost-effective alternative to human evaluations for comparing the ever increasing list of models.

Instruction Following

Paper
Code

A Long Way to Go: Investigating Length Correlations in RLHF

1 code implementation • 5 Oct 2023 • Prasann Singhal, Tanya Goyal, Jiacheng Xu, Greg Durrett

Furthermore, we find that even running RLHF with a reward based solely on length can reproduce most of the downstream improvements over the initial policy model, showing that reward models in these settings have a long way to go.

Question Answering

Paper
Code

BooookScore: A systematic exploration of book-length summarization in the era of LLMs

2 code implementations • 1 Oct 2023 • Yapei Chang, Kyle Lo, Tanya Goyal, Mohit Iyyer

We find that closed-source LLMs such as GPT-4 and Claude 2 produce summaries with higher BooookScore than those generated by open-source models.

Paper
Code

WiCE: Real-World Entailment for Claims in Wikipedia

1 code implementation • 2 Mar 2023 • Ryo Kamoi, Tanya Goyal, Juan Diego Rodriguez, Greg Durrett

Textual entailment models are increasingly applied in settings like fact-checking, presupposition verification in question answering, or summary evaluation.

Fact Checking Natural Language Inference +3

Paper
Code

Shortcomings of Question Answering Based Factuality Frameworks for Error Localization

1 code implementation • 13 Oct 2022 • Ryo Kamoi, Tanya Goyal, Greg Durrett

Despite recent progress in abstractive summarization, models often generate summaries with factual errors.

Abstractive Text Summarization Question Answering +2

Paper
Code

News Summarization and Evaluation in the Era of GPT-3

1 code implementation • 26 Sep 2022 • Tanya Goyal, Junyi Jessy Li, Greg Durrett

Finally, we evaluate models on a setting beyond generic summarization, specifically keyword-based summarization, and show how dominant fine-tuning approaches compare to prompting.

News Summarization Text Summarization

Paper
Code

Understanding Factual Errors in Summarization: Errors, Summarizers, Datasets, Error Detectors

1 code implementation • 25 May 2022 • Liyan Tang, Tanya Goyal, Alexander R. Fabbri, Philippe Laban, Jiacheng Xu, Semih Yavuz, Wojciech Kryściński, Justin F. Rousseau, Greg Durrett

We compare performance of state-of-the-art factuality metrics, including recent ChatGPT-based metrics, on this stratified benchmark and show that their performance varies significantly across different types of summarization models.

Abstractive Text Summarization

Paper
Code

SNaC: Coherence Error Detection for Narrative Summarization

1 code implementation • 19 May 2022 • Tanya Goyal, Junyi Jessy Li, Greg Durrett

In this work, we introduce SNaC, a narrative coherence evaluation framework rooted in fine-grained annotations for long summaries.

Benchmarking Coherence Evaluation +1

Paper
Code

NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation

2 code implementations • 6 Dec 2021 • Kaustubh D. Dhole, Varun Gangal, Sebastian Gehrmann, Aadesh Gupta, Zhenhao Li, Saad Mahamood, Abinaya Mahendiran, Simon Mille, Ashish Shrivastava, Samson Tan, Tongshuang Wu, Jascha Sohl-Dickstein, Jinho D. Choi, Eduard Hovy, Ondrej Dusek, Sebastian Ruder, Sajant Anand, Nagender Aneja, Rabin Banjade, Lisa Barthe, Hanna Behnke, Ian Berlot-Attwell, Connor Boyle, Caroline Brun, Marco Antonio Sobrevilla Cabezudo, Samuel Cahyawijaya, Emile Chapuis, Wanxiang Che, Mukund Choudhary, Christian Clauss, Pierre Colombo, Filip Cornell, Gautier Dagan, Mayukh Das, Tanay Dixit, Thomas Dopierre, Paul-Alexis Dray, Suchitra Dubey, Tatiana Ekeinhor, Marco Di Giovanni, Tanya Goyal, Rishabh Gupta, Louanes Hamla, Sang Han, Fabrice Harel-Canada, Antoine Honore, Ishan Jindal, Przemyslaw K. Joniak, Denis Kleyko, Venelin Kovatchev, Kalpesh Krishna, Ashutosh Kumar, Stefan Langer, Seungjae Ryan Lee, Corey James Levinson, Hualou Liang, Kaizhao Liang, Zhexiong Liu, Andrey Lukyanenko, Vukosi Marivate, Gerard de Melo, Simon Meoni, Maxime Meyer, Afnan Mir, Nafise Sadat Moosavi, Niklas Muennighoff, Timothy Sum Hon Mun, Kenton Murray, Marcin Namysl, Maria Obedkova, Priti Oli, Nivranshu Pasricha, Jan Pfister, Richard Plant, Vinay Prabhu, Vasile Pais, Libo Qin, Shahab Raji, Pawan Kumar Rajpoot, Vikas Raunak, Roy Rinberg, Nicolas Roberts, Juan Diego Rodriguez, Claude Roux, Vasconcellos P. H. S., Ananya B. Sai, Robin M. Schmidt, Thomas Scialom, Tshephisho Sefara, Saqib N. Shamsi, Xudong Shen, Haoyue Shi, Yiwen Shi, Anna Shvets, Nick Siegel, Damien Sileo, Jamie Simon, Chandan Singh, Roman Sitelew, Priyank Soni, Taylor Sorensen, William Soto, Aman Srivastava, KV Aditya Srivatsa, Tony Sun, Mukund Varma T, A Tabassum, Fiona Anting Tan, Ryan Teehan, Mo Tiwari, Marie Tolkiehn, Athena Wang, Zijian Wang, Gloria Wang, Zijie J. Wang, Fuxuan Wei, Bryan Wilie, Genta Indra Winata, Xinyi Wu, Witold Wydmański, Tianbao Xie, Usama Yaseen, Michael A. Yee, Jing Zhang, Yue Zhang

Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on.

Data Augmentation

759

Paper
Code

Training Dynamics for Text Summarization Models

no code implementations • Findings (ACL) 2022 • Tanya Goyal, Jiacheng Xu, Junyi Jessy Li, Greg Durrett

Across different datasets (CNN/DM, XSum, MediaSum) and summary properties, such as abstractiveness and hallucination, we study what the model learns at different stages of its fine-tuning process.

Hallucination News Summarization +1

Paper
Add Code

HydraSum: Disentangling Stylistic Features in Text Summarization using Multi-Decoder Models

1 code implementation • 8 Oct 2021 • Tanya Goyal, Nazneen Fatema Rajani, Wenhao Liu, Wojciech Kryściński

Summarization systems make numerous "decisions" about summary properties during inference, e. g. degree of copying, specificity and length of outputs, etc.

Abstractive Text Summarization Specificity

Paper
Code

HydraSum - Disentangling Stylistic Features in Text Summarization using Multi-Decoder Models

no code implementations • 29 Sep 2021 • Tanya Goyal, Nazneen Rajani, Wenhao Liu, Wojciech Maciej Kryscinski

Existing abstractive summarization models lack explicit control mechanisms that would allow users to influence the stylistic features of the model outputs.

Abstractive Text Summarization Specificity

Paper
Add Code

Annotating and Modeling Fine-grained Factuality in Summarization

2 code implementations • NAACL 2021 • Tanya Goyal, Greg Durrett

Recent pre-trained abstractive summarization systems have started to achieve credible performance, but a major barrier to their use in practice is their propensity to output summaries that are not faithful to the input and that contain factual errors.

Abstractive Text Summarization Sentence

Paper
Code

Evaluating Factuality in Generation with Dependency-level Entailment

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Tanya Goyal, Greg Durrett

Experiments show that our dependency arc entailment model trained on this data can identify factual inconsistencies in paraphrasing and summarization better than sentence-level methods or those based on question generation, while additionally localizing the erroneous parts of the generation.

Natural Language Inference Question Generation +3

Paper
Code

Neural Syntactic Preordering for Controlled Paraphrase Generation

2 code implementations • ACL 2020 • Tanya Goyal, Greg Durrett

Paraphrasing natural language sentences is a multifaceted process: it might involve replacing individual words or short phrases, local rearrangement of content, or high-level restructuring like topicalization or passivization.

Machine Translation Paraphrase Generation +2