Search Results for author: Ece Kamar

Found 34 papers, 4 papers with code

Teaching Language Models to Hallucinate Less with Synthetic Tasks

no code implementations10 Oct 2023 Erik Jones, Hamid Palangi, Clarisse Simões, Varun Chandrasekaran, Subhabrata Mukherjee, Arindam Mitra, Ahmed Awadallah, Ece Kamar

We also find that optimizing the system message rather than the model weights can be critical; fine-tuning the entire model on the synthetic task can counterintuitively increase hallucination.

Abstractive Text Summarization Hallucination +3

Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models

no code implementations26 Sep 2023 Mert Yuksekgonul, Varun Chandrasekaran, Erik Jones, Suriya Gunasekar, Ranjita Naik, Hamid Palangi, Ece Kamar, Besmira Nushi

We investigate the internal behavior of Transformer-based Large Language Models (LLMs) when they generate factually incorrect text.

Sparks of Artificial General Intelligence: Early experiments with GPT-4

2 code implementations22 Mar 2023 Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, Yi Zhang

We contend that (this early version of) GPT-4 is part of a new cohort of LLMs (along with ChatGPT and Google's PaLM for example) that exhibit more general intelligence than previous AI models.

Arithmetic Reasoning Math Word Problem Solving

Benchmarking Spatial Relationships in Text-to-Image Generation

1 code implementation20 Dec 2022 Tejas Gokhale, Hamid Palangi, Besmira Nushi, Vibhav Vineet, Eric Horvitz, Ece Kamar, Chitta Baral, Yezhou Yang

We investigate the ability of T2I models to generate correct spatial relationships among objects and present VISOR, an evaluation metric that captures how accurately the spatial relationship described in text is generated in the image.

Benchmarking Text-to-Image Generation

Artificial Intelligence and Life in 2030: The One Hundred Year Study on Artificial Intelligence

no code implementations31 Oct 2022 Peter Stone, Rodney Brooks, Erik Brynjolfsson, Ryan Calo, Oren Etzioni, Greg Hager, Julia Hirschberg, Shivaram Kalyanakrishnan, Ece Kamar, Sarit Kraus, Kevin Leyton-Brown, David Parkes, William Press, AnnaLee Saxenian, Julie Shah, Milind Tambe, Astro Teller

In September 2016, Stanford's "One Hundred Year Study on Artificial Intelligence" project (AI100) issued the first report of its planned long-term periodic assessment of artificial intelligence (AI) and its impact on society.

ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection

1 code implementation ACL 2022 Thomas Hartvigsen, Saadia Gabriel, Hamid Palangi, Maarten Sap, Dipankar Ray, Ece Kamar

To help mitigate these issues, we create ToxiGen, a new large-scale and machine-generated dataset of 274k toxic and benign statements about 13 minority groups.

Hate Speech Detection Language Modelling

Investigations of Performance and Bias in Human-AI Teamwork in Hiring

no code implementations21 Feb 2022 Andi Peng, Besmira Nushi, Emre Kiciman, Kori Inkpen, Ece Kamar

In AI-assisted decision-making, effective hybrid (human-AI) teamwork is not solely dependent on AI performance alone, but also on its impact on human decision-making.

Decision Making

A Bayesian Approach to Identifying Representational Errors

no code implementations28 Mar 2021 Ramya Ramakrishnan, Vaibhav Unhelkar, Ece Kamar, Julie Shah

Trained AI systems and expert decision makers can make errors that are often difficult to identify and understand.

Bayesian Inference

Designing Disaggregated Evaluations of AI Systems: Choices, Considerations, and Tradeoffs

no code implementations10 Mar 2021 Solon Barocas, Anhong Guo, Ece Kamar, Jacquelyn Krones, Meredith Ringel Morris, Jennifer Wortman Vaughan, Duncan Wadsworth, Hanna Wallach

Disaggregated evaluations of AI systems, in which system performance is assessed and reported separately for different groups of people, are conceptually simple.

Algorithmic Recourse in the Wild: Understanding the Impact of Data and Model Shifts

no code implementations22 Dec 2020 Kaivalya Rawal, Ece Kamar, Himabindu Lakkaraju

Our theoretical results establish a lower bound on the probability of recourse invalidation due to model shifts, and show the existence of a tradeoff between this invalidation probability and typical notions of "cost" minimized by modern recourse generation algorithms.

Understanding Failures of Deep Networks via Robust Feature Extraction

1 code implementation CVPR 2021 Sahil Singla, Besmira Nushi, Shital Shah, Ece Kamar, Eric Horvitz

Traditional evaluation metrics for learned models that report aggregate scores over a test set are insufficient for surfacing important and informative patterns of failure over features and instances.

Avoiding Negative Side Effects due to Incomplete Knowledge of AI Systems

no code implementations24 Aug 2020 Sandhya Saisubramanian, Shlomo Zilberstein, Ece Kamar

Learning to recognize and avoid such negative side effects of an agent's actions is critical to improve the safety and reliability of autonomous systems.

An Empirical Analysis of Backward Compatibility in Machine Learning Systems

no code implementations11 Aug 2020 Megha Srivastava, Besmira Nushi, Ece Kamar, Shital Shah, Eric Horvitz

In many applications of machine learning (ML), updates are performed with the goal of enhancing model performance.

BIG-bench Machine Learning

Security and Machine Learning in the Real World

no code implementations13 Jul 2020 Ivan Evtimov, Weidong Cui, Ece Kamar, Emre Kiciman, Tadayoshi Kohno, Jerry Li

Machine learning (ML) models deployed in many safety- and business-critical systems are vulnerable to exploitation through adversarial examples.

BIG-bench Machine Learning

Learning to Complement Humans

no code implementations1 May 2020 Bryan Wilder, Eric Horvitz, Ece Kamar

A rising vision for AI in the open world centers on the development of systems that can complement humans for perceptual, diagnostic, and reasoning tasks.

BIG-bench Machine Learning Medical Diagnosis

Is the Most Accurate AI the Best Teammate? Optimizing AI for Teamwork

no code implementations27 Apr 2020 Gagan Bansal, Besmira Nushi, Ece Kamar, Eric Horvitz, Daniel S. Weld

To optimize the team performance for this setting we maximize the team's expected utility, expressed in terms of the quality of the final decision, cost of verifying, and individual accuracies of people and machines.

Decision Making

Personalization in Human-AI Teams: Improving the Compatibility-Accuracy Tradeoff

no code implementations5 Apr 2020 Jonathan Martinez, Kobi Gal, Ece Kamar, Levi H. S. Lelis

AI systems that model and interact with users can update their models over time to reflect new information and changes in the environment.

SQuINTing at VQA Models: Introspecting VQA Models with Sub-Questions

no code implementations CVPR 2020 Ramprasaath R. Selvaraju, Purva Tendulkar, Devi Parikh, Eric Horvitz, Marco Ribeiro, Besmira Nushi, Ece Kamar

We quantify the extent to which this phenomenon occurs by creating a new Reasoning split of the VQA dataset and collecting VQA-introspect, a new dataset1 which consists of 238K new perception questions which serve as sub questions corresponding to the set of perceptual tasks needed to effectively answer the complex reasoning questions in the Reasoning split.

Visual Question Answering (VQA)

What You See Is What You Get? The Impact of Representation Criteria on Human Bias in Hiring

no code implementations8 Sep 2019 Andi Peng, Besmira Nushi, Emre Kiciman, Kori Inkpen, Siddharth Suri, Ece Kamar

Although systematic biases in decision-making are widely documented, the ways in which they emerge from different sources is less understood.

Decision Making

A Case for Backward Compatibility for Human-AI Teams

no code implementations4 Jun 2019 Gagan Bansal, Besmira Nushi, Ece Kamar, Dan Weld, Walter Lasecki, Eric Horvitz

We introduce the notion of the compatibility of an AI update with prior user experience and present methods for studying the role of compatibility in human-AI teams.

Decision Making

Towards Accountable AI: Hybrid Human-Machine Analyses for Characterizing System Failure

no code implementations19 Sep 2018 Besmira Nushi, Ece Kamar, Eric Horvitz

We present Pandora, a set of hybrid human-machine methods and tools for describing and explaining system failures.

BIG-bench Machine Learning Image Captioning

Investigating Human + Machine Complementarity for Recidivism Predictions

no code implementations28 Aug 2018 Sarah Tan, Julius Adebayo, Kori Inkpen, Ece Kamar

Dressel and Farid (2018) asked Mechanical Turk workers to evaluate a subset of defendants in the ProPublica COMPAS data for risk of recidivism, and concluded that COMPAS predictions were no more accurate or fair than predictions made by humans.

Decision Making Fairness

Discovering Blind Spots in Reinforcement Learning

no code implementations23 May 2018 Ramya Ramakrishnan, Ece Kamar, Debadeepta Dey, Julie Shah, Eric Horvitz

Agents trained in simulation may make errors in the real world due to mismatches between training and execution environments.

reinforcement-learning Reinforcement Learning (RL)

Interpretable & Explorable Approximations of Black Box Models

no code implementations4 Jul 2017 Himabindu Lakkaraju, Ece Kamar, Rich Caruana, Jure Leskovec

To the best of our knowledge, this is the first approach which can produce global explanations of the behavior of any given black box model through joint optimization of unambiguity, fidelity, and interpretability, while also allowing users to explore model behavior based on their preferences.

Identifying Unknown Unknowns in the Open World: Representations and Policies for Guided Exploration

no code implementations28 Oct 2016 Himabindu Lakkaraju, Ece Kamar, Rich Caruana, Eric Horvitz

Predictive models deployed in the real world may assign incorrect labels to instances with high confidence.

Metareasoning for Planning Under Uncertainty

no code implementations3 May 2015 Christopher H. Lin, Andrey Kolobov, Ece Kamar, Eric Horvitz

Our work subsumes previously studied special cases of metareasoning and shows that in the general case, metareasoning is at most polynomially harder than solving MDPs with any given algorithm that disregards the cost of thinking.

Stochastic Privacy

no code implementations22 Apr 2014 Adish Singla, Eric Horvitz, Ece Kamar, Ryen White

Users may be willing to share private information in return for better quality of service or for incentives, or in return for assurances about the nature and extend of the logging of data.

Cannot find the paper you are looking for? You can Submit a new open access paper.