Search Results for author: Rebecca Qian

Found 11 papers, 3 papers with code

TRAIL: Trace Reasoning and Agentic Issue Localization

no code implementations13 May 2025 Darshan Deshpande, Varun Gangal, Hersh Mehta, Jitin Krishnan, Anand Kannappan, Rebecca Qian

The increasing adoption of agentic workflows across diverse domains brings a critical need to scalably and systematically evaluate the complex traces these systems generate.

Information Retrieval

Browsing Lost Unformed Recollections: A Benchmark for Tip-of-the-Tongue Search and Reasoning

no code implementations24 Mar 2025 Sky CH-Wang, Darshan Deshpande, Smaranda Muresan, Anand Kannappan, Rebecca Qian

We introduce Browsing Lost Unformed Recollections, a tip-of-the-tongue known-item search and reasoning benchmark for general AI assistants.

FinanceBench: A New Benchmark for Financial Question Answering

2 code implementations20 Nov 2023 Pranab Islam, Anand Kannappan, Douwe Kiela, Rebecca Qian, Nino Scherrer, Bertie Vidgen

We test 16 state of the art model configurations (including GPT-4-Turbo, Llama2 and Claude2, with vector stores and long context prompts) on a sample of 150 cases from FinanceBench, and manually review their answers (n=2, 400).

How to refund a wrong transaction in PhonePe Question Answering +2

SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models

no code implementations14 Nov 2023 Bertie Vidgen, Nino Scherrer, Hannah Rose Kirk, Rebecca Qian, Anand Kannappan, Scott A. Hale, Paul Röttger

While some of the models do not give a single unsafe response, most give unsafe responses to more than 20% of the prompts, with over 50% unsafe responses in the extreme.

Perturbation Augmentation for Fairer NLP

1 code implementation25 May 2022 Rebecca Qian, Candace Ross, Jude Fernandes, Eric Smith, Douwe Kiela, Adina Williams

Unwanted and often harmful social biases are becoming ever more salient in NLP research, affecting both models and datasets.

Fairness

Many Episode Learning in a Modular Embodied Agent via End-to-End Interaction

no code implementations19 Apr 2022 Yuxuan Sun, Ethan Carlson, Rebecca Qian, Kavya Srinet, Arthur Szlam

In this work we give a case study of an embodied machine-learning (ML) powered agent that improves itself via interactions with crowd-workers.

droidlet: modular, heterogenous, multi-modal agents

1 code implementation25 Jan 2021 Anurag Pratik, Soumith Chintala, Kavya Srinet, Dhiraj Gandhi, Rebecca Qian, Yuxuan Sun, Ryan Drew, Sara Elkafrawy, Anoushka Tiwari, Tucker Hart, Mary Williamson, Abhinav Gupta, Arthur Szlam

In recent years, there have been significant advances in building end-to-end Machine Learning (ML) systems that learn at scale.

Cannot find the paper you are looking for? You can Submit a new open access paper.