no code implementations • 7 Nov 2024 • Adam Fourney, Gagan Bansal, Hussein Mozannar, Cheng Tan, Eduardo Salinas, Erkang, Zhu, Friederike Niedtner, Grace Proebsting, Griffin Bassman, Jack Gerrits, Jacob Alber, Peter Chang, Ricky Loynd, Robert West, Victor Dibia, Ahmed Awadallah, Ece Kamar, Rafah Hosn, Saleema Amershi
Magentic-One uses a multi-agent architecture where a lead agent, the Orchestrator, plans, tracks progress, and re-plans to recover from errors.
no code implementations • 14 Jun 2024 • Susan Landau, James X. Dempsey, Ece Kamar, Steven M. Bellovin, Robert Pool
In an October 2023 executive order (EO), President Biden issued a detailed but largely aspirational road map for the safe and responsible development and use of artificial intelligence (AI).
no code implementations • 4 Mar 2024 • Susan Landau, James X. Dempsey, Ece Kamar, Steven M. Bellovin
Contestability -- the ability to effectively challenge a decision -- is critical to the implementation of fairness.
no code implementations • 10 Oct 2023 • Erik Jones, Hamid Palangi, Clarisse Simões, Varun Chandrasekaran, Subhabrata Mukherjee, Arindam Mitra, Ahmed Awadallah, Ece Kamar
We also find that optimizing the system message rather than the model weights can be critical; fine-tuning the entire model on the synthetic task can counterintuitively increase hallucination.
1 code implementation • 26 Sep 2023 • Mert Yuksekgonul, Varun Chandrasekaran, Erik Jones, Suriya Gunasekar, Ranjita Naik, Hamid Palangi, Ece Kamar, Besmira Nushi
We investigate the internal behavior of Transformer-based Large Language Models (LLMs) when they generate factually incorrect text.
no code implementations • 7 Jun 2023 • John Joon Young Chung, Ece Kamar, Saleema Amershi
However, creating high-quality datasets with LLMs can be challenging.
3 code implementations • 22 Mar 2023 • Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, Yi Zhang
We contend that (this early version of) GPT-4 is part of a new cohort of LLMs (along with ChatGPT and Google's PaLM for example) that exhibit more general intelligence than previous AI models.
Ranked #33 on Arithmetic Reasoning on GSM8K
1 code implementation • 20 Dec 2022 • Tejas Gokhale, Hamid Palangi, Besmira Nushi, Vibhav Vineet, Eric Horvitz, Ece Kamar, Chitta Baral, Yezhou Yang
We investigate the ability of T2I models to generate correct spatial relationships among objects and present VISOR, an evaluation metric that captures how accurately the spatial relationship described in text is generated in the image.
no code implementations • 31 Oct 2022 • Peter Stone, Rodney Brooks, Erik Brynjolfsson, Ryan Calo, Oren Etzioni, Greg Hager, Julia Hirschberg, Shivaram Kalyanakrishnan, Ece Kamar, Sarit Kraus, Kevin Leyton-Brown, David Parkes, William Press, AnnaLee Saxenian, Julie Shah, Milind Tambe, Astro Teller
In September 2016, Stanford's "One Hundred Year Study on Artificial Intelligence" project (AI100) issued the first report of its planned long-term periodic assessment of artificial intelligence (AI) and its impact on society.
1 code implementation • ACL 2022 • Thomas Hartvigsen, Saadia Gabriel, Hamid Palangi, Maarten Sap, Dipankar Ray, Ece Kamar
To help mitigate these issues, we create ToxiGen, a new large-scale and machine-generated dataset of 274k toxic and benign statements about 13 minority groups.
no code implementations • 21 Feb 2022 • Andi Peng, Besmira Nushi, Emre Kiciman, Kori Inkpen, Ece Kamar
In AI-assisted decision-making, effective hybrid (human-AI) teamwork is not solely dependent on AI performance alone, but also on its impact on human decision-making.
no code implementations • 28 Mar 2021 • Ramya Ramakrishnan, Vaibhav Unhelkar, Ece Kamar, Julie Shah
Trained AI systems and expert decision makers can make errors that are often difficult to identify and understand.
no code implementations • 10 Mar 2021 • Solon Barocas, Anhong Guo, Ece Kamar, Jacquelyn Krones, Meredith Ringel Morris, Jennifer Wortman Vaughan, Duncan Wadsworth, Hanna Wallach
Disaggregated evaluations of AI systems, in which system performance is assessed and reported separately for different groups of people, are conceptually simple.
no code implementations • 22 Dec 2020 • Kaivalya Rawal, Ece Kamar, Himabindu Lakkaraju
Our theoretical results establish a lower bound on the probability of recourse invalidation due to model shifts, and show the existence of a tradeoff between this invalidation probability and typical notions of "cost" minimized by modern recourse generation algorithms.
1 code implementation • CVPR 2021 • Sahil Singla, Besmira Nushi, Shital Shah, Ece Kamar, Eric Horvitz
Traditional evaluation metrics for learned models that report aggregate scores over a test set are insufficient for surfacing important and informative patterns of failure over features and instances.
no code implementations • 24 Aug 2020 • Sandhya Saisubramanian, Shlomo Zilberstein, Ece Kamar
Learning to recognize and avoid such negative side effects of an agent's actions is critical to improve the safety and reliability of autonomous systems.
no code implementations • 11 Aug 2020 • Megha Srivastava, Besmira Nushi, Ece Kamar, Shital Shah, Eric Horvitz
In many applications of machine learning (ML), updates are performed with the goal of enhancing model performance.
no code implementations • 13 Jul 2020 • Ivan Evtimov, Weidong Cui, Ece Kamar, Emre Kiciman, Tadayoshi Kohno, Jerry Li
Machine learning (ML) models deployed in many safety- and business-critical systems are vulnerable to exploitation through adversarial examples.
no code implementations • 26 Jun 2020 • Gagan Bansal, Tongshuang Wu, Joyce Zhou, Raymond Fok, Besmira Nushi, Ece Kamar, Marco Tulio Ribeiro, Daniel S. Weld
However, prior studies observed improvements from explanations only when the AI, alone, outperformed both the human and the best team.
no code implementations • 1 May 2020 • Bryan Wilder, Eric Horvitz, Ece Kamar
A rising vision for AI in the open world centers on the development of systems that can complement humans for perceptual, diagnostic, and reasoning tasks.
no code implementations • 27 Apr 2020 • Gagan Bansal, Besmira Nushi, Ece Kamar, Eric Horvitz, Daniel S. Weld
To optimize the team performance for this setting we maximize the team's expected utility, expressed in terms of the quality of the final decision, cost of verifying, and individual accuracies of people and machines.
no code implementations • 5 Apr 2020 • Jonathan Martinez, Kobi Gal, Ece Kamar, Levi H. S. Lelis
AI systems that model and interact with users can update their models over time to reflect new information and changes in the environment.
no code implementations • CVPR 2020 • Ramprasaath R. Selvaraju, Purva Tendulkar, Devi Parikh, Eric Horvitz, Marco Ribeiro, Besmira Nushi, Ece Kamar
We quantify the extent to which this phenomenon occurs by creating a new Reasoning split of the VQA dataset and collecting VQA-introspect, a new dataset1 which consists of 238K new perception questions which serve as sub questions corresponding to the set of perceptual tasks needed to effectively answer the complex reasoning questions in the Reasoning split.
no code implementations • 8 Sep 2019 • Andi Peng, Besmira Nushi, Emre Kiciman, Kori Inkpen, Siddharth Suri, Ece Kamar
Although systematic biases in decision-making are widely documented, the ways in which they emerge from different sources is less understood.
no code implementations • 4 Jul 2019 • Anhong Guo, Ece Kamar, Jennifer Wortman Vaughan, Hanna Wallach, Meredith Ringel Morris
AI technologies have the potential to dramatically impact the lives of people with disabilities (PWD).
no code implementations • 4 Jun 2019 • Gagan Bansal, Besmira Nushi, Ece Kamar, Dan Weld, Walter Lasecki, Eric Horvitz
We introduce the notion of the compatibility of an AI update with prior user experience and present methods for studying the role of compatibility in human-AI teams.
no code implementations • 19 Sep 2018 • Besmira Nushi, Ece Kamar, Eric Horvitz
We present Pandora, a set of hybrid human-machine methods and tools for describing and explaining system failures.
no code implementations • 28 Aug 2018 • Sarah Tan, Julius Adebayo, Kori Inkpen, Ece Kamar
Dressel and Farid (2018) asked Mechanical Turk workers to evaluate a subset of defendants in the ProPublica COMPAS data for risk of recidivism, and concluded that COMPAS predictions were no more accurate or fair than predictions made by humans.
no code implementations • 23 May 2018 • Ramya Ramakrishnan, Ece Kamar, Debadeepta Dey, Julie Shah, Eric Horvitz
Agents trained in simulation may make errors in the real world due to mismatches between training and execution environments.
no code implementations • 4 Jul 2017 • Himabindu Lakkaraju, Ece Kamar, Rich Caruana, Jure Leskovec
To the best of our knowledge, this is the first approach which can produce global explanations of the behavior of any given black box model through joint optimization of unambiguity, fidelity, and interpretability, while also allowing users to explore model behavior based on their preferences.
no code implementations • 24 Nov 2016 • Besmira Nushi, Ece Kamar, Eric Horvitz, Donald Kossmann
We study the problem of troubleshooting machine learning systems that rely on analytical pipelines of distinct components.
no code implementations • 28 Oct 2016 • Himabindu Lakkaraju, Ece Kamar, Rich Caruana, Eric Horvitz
Predictive models deployed in the real world may assign incorrect labels to instances with high confidence.
no code implementations • 3 May 2015 • Christopher H. Lin, Andrey Kolobov, Ece Kamar, Eric Horvitz
Our work subsumes previously studied special cases of metareasoning and shows that in the general case, metareasoning is at most polynomially harder than solving MDPs with any given algorithm that disregards the cost of thinking.
no code implementations • 22 Apr 2014 • Adish Singla, Eric Horvitz, Ece Kamar, Ryen White
Users may be willing to share private information in return for better quality of service or for incentives, or in return for assurances about the nature and extend of the logging of data.