Search Results for author: Valerie Chen

Found 15 papers, 4 papers with code

The RealHumanEval: Evaluating Large Language Models' Abilities to Support Programmers

1 code implementation3 Apr 2024 Hussein Mozannar, Valerie Chen, Mohammed Alsobay, Subhro Das, Sebastian Zhao, Dennis Wei, Manish Nagireddy, Prasanna Sattigeri, Ameet Talwalkar, David Sontag

Evaluation of large language models (LLMs) for code has primarily relied on static benchmarks, including HumanEval (Chen et al., 2021), which measure the ability of LLMs to generate complete code that passes unit tests.

Do LLMs exhibit human-like response biases? A case study in survey design

1 code implementation7 Nov 2023 Lindia Tjuatja, Valerie Chen, Sherry Tongshuang Wu, Ameet Talwalkar, Graham Neubig

As large language models (LLMs) become more capable, there is growing excitement about the possibility of using LLMs as proxies for humans in real-world tasks where subjective labels are desired, such as in surveys and opinion polling.

AdvisingNets: Learning to Distinguish Correct and Wrong Classifications via Nearest-Neighbor Explanations

no code implementations25 Aug 2023 Giang Nguyen, Valerie Chen, Anh Nguyen

Besides providing insights into how an image classifier makes its predictions, nearest-neighbor examples also help humans make more accurate decisions.

Image Classification

Learning Personalized Decision Support Policies

no code implementations13 Apr 2023 Umang Bhatt, Valerie Chen, Katherine M. Collins, Parameswaran Kamalaruban, Emma Kallina, Adrian Weller, Ameet Talwalkar

In this work, we propose learning a decision support policy that, for a given input, chooses which form of support, if any, to provide.

Multi-Armed Bandits

Assisting Human Decisions in Document Matching

1 code implementation16 Feb 2023 Joon Sik Kim, Valerie Chen, Danish Pruthi, Nihar B. Shah, Ameet Talwalkar

Many practical applications, ranging from paper-reviewer assignment in peer review to job-applicant matching for hiring, require human decision makers to identify relevant matches by combining their expertise with predictions from machine learning models.

A Case Study on Designing Evaluations of ML Explanations with Simulated User Studies

no code implementations15 Feb 2023 Ada Martin, Valerie Chen, Sérgio Jesus, Pedro Saleiro

We hope that this work motivates further study of when and how SimEvals should be used to aid in the design of real-world evaluations.

Decision Making Fraud Detection

Understanding the Role of Human Intuition on Reliance in Human-AI Decision-Making with Explanations

no code implementations18 Jan 2023 Valerie Chen, Q. Vera Liao, Jennifer Wortman Vaughan, Gagan Bansal

AI explanations are often mentioned as a way to improve human-AI decision-making, but empirical studies have not found consistent evidence of explanations' effectiveness and, on the contrary, suggest that they can increase overreliance when the AI system is wrong.

Decision Making

On the Importance of Application-Grounded Experimental Design for Evaluating Explainable ML Methods

no code implementations24 Jun 2022 Kasun Amarasinghe, Kit T. Rodolfa, Sérgio Jesus, Valerie Chen, Vladimir Balayan, Pedro Saleiro, Pedro Bizarro, Ameet Talwalkar, Rayid Ghani

Most existing evaluations of explainable machine learning (ML) methods rely on simplifying assumptions or proxies that do not reflect real-world use cases; the handful of more robust evaluations on real-world settings have shortcomings in their design, resulting in limited conclusions of methods' real-world utility.

Experimental Design Fraud Detection

Use-Case-Grounded Simulations for Explanation Evaluation

no code implementations5 Jun 2022 Valerie Chen, Nari Johnson, Nicholay Topin, Gregory Plumb, Ameet Talwalkar

SimEvals involve training algorithmic agents that take as input the information content (such as model explanations) that would be presented to each participant in a human subject study, to predict answers to the use case of interest.

counterfactual Counterfactual Reasoning

Perspectives on Incorporating Expert Feedback into Model Updates

no code implementations13 May 2022 Valerie Chen, Umang Bhatt, Hoda Heidari, Adrian Weller, Ameet Talwalkar

A practitioner may receive feedback from an expert at the observation- or domain-level, and convert this feedback into updates to the dataset, loss function, or parameter space.

Bayesian Persuasion for Algorithmic Recourse

no code implementations12 Dec 2021 Keegan Harris, Valerie Chen, Joon Sik Kim, Ameet Talwalkar, Hoda Heidari, Zhiwei Steven Wu

While the decision maker's problem of finding the optimal Bayesian incentive-compatible (BIC) signaling policy takes the form of optimization over infinitely-many variables, we show that this optimization can be cast as a linear program over finitely-many regions of the space of possible assessment rules.

Decision Making

Interpretable Machine Learning: Moving From Mythos to Diagnostics

no code implementations10 Mar 2021 Valerie Chen, Jeffrey Li, Joon Sik Kim, Gregory Plumb, Ameet Talwalkar

Despite increasing interest in the field of Interpretable Machine Learning (IML), a significant gap persists between the technical objectives targeted by researchers' methods and the high-level goals of consumers' use cases.

BIG-bench Machine Learning Interpretable Machine Learning

Ask Your Humans: Using Human Instructions to Improve Generalization in Reinforcement Learning

1 code implementation ICLR 2021 Valerie Chen, Abhinav Gupta, Kenneth Marino

We also find that incorporating natural language allows the model to generalize to unseen tasks in a zero-shot setting and to learn quickly from a few demonstrations.

Multi-Task Learning reinforcement-learning +1

Novelty Detection via Network Saliency in Visual-based Deep Learning

no code implementations9 Jun 2019 Valerie Chen, Man-Ki Yoon, Zhong Shao

Machine-learning driven safety-critical autonomous systems, such as self-driving cars, must be able to detect situations where its trained model is not able to make a trustworthy prediction.

Novelty Detection Self-Driving Cars

Cannot find the paper you are looking for? You can Submit a new open access paper.