no code implementations • 15 Jul 2024 • David Abel, Mark K. Ho, Anna Harutyunyan
Modern reinforcement learning has been conditioned by at least three dogmas.
2 code implementations • 30 Nov 2023 • Carlos G. Correa, Sophia Sanborn, Mark K. Ho, Frederick Callaway, Nathaniel D. Daw, Thomas L. Griffiths
Human behavior is often assumed to be hierarchically structured, made up of abstract actions that can be decomposed into concrete actions.
no code implementations • 3 Oct 2023 • Ruiqi He, Carlos G. Correa, Thomas L. Griffiths, Mark K. Ho
How are people able to plan so efficiently despite limited cognitive resources?
no code implementations • 5 May 2023 • Dilip Arumugam, Mark K. Ho, Noah D. Goodman, Benjamin Van Roy
All biological and artificial agents must learn and make decisions given limits on their ability to process information.
no code implementations • 7 Nov 2022 • Carlos G. Correa, Mark K. Ho, Frederick Callaway, Nathaniel D. Daw, Thomas L. Griffiths
Human behavior emerges from planning over elaborate decompositions of tasks into goals, subgoals, and low-level actions.
no code implementations • 30 Oct 2022 • Dilip Arumugam, Mark K. Ho, Noah D. Goodman, Benjamin Van Roy
Throughout the cognitive-science literature, there is widespread agreement that decision-making agents operating in the real world do so under limited information-processing capabilities and without access to unbounded cognitive or computational resources.
no code implementations • 11 Apr 2022 • Theodore R. Sumers, Robert D. Hawkins, Mark K. Ho, Thomas L. Griffiths, Dylan Hadfield-Menell
We then define a pragmatic listener which performs inverse reward design by jointly inferring the speaker's latent horizon and rewards.
no code implementations • NeurIPS 2021 • David Abel, Will Dabney, Anna Harutyunyan, Mark K. Ho, Michael L. Littman, Doina Precup, Satinder Singh
We then provide a set of polynomial-time algorithms that construct a Markov reward function that allows an agent to optimize tasks of each of these three types, and correctly determine when no such reward function exists.
no code implementations • 1 Sep 2021 • Mark K. Ho, Thomas L. Griffiths
Those designing autonomous systems that interact with humans will invariably face questions about how humans think and make decisions.
1 code implementation • 25 May 2021 • Theodore R. Sumers, Robert D. Hawkins, Mark K. Ho, Thomas L. Griffiths
Speakers communicate to influence their partner's beliefs and shape their actions.
no code implementations • 14 May 2021 • Mark K. Ho, David Abel, Carlos G. Correa, Michael L. Littman, Jonathan D. Cohen, Thomas L. Griffiths
We propose a computational account of this simplification process and, in a series of pre-registered behavioral experiments, show that it is subject to online cognitive control and that people optimally balance the complexity of a task representation and its utility for planning and acting.
no code implementations • 16 Dec 2020 • Theodore R. Sumers, Mark K. Ho, Thomas L. Griffiths
Nonetheless, a teacher and learner may not always experience or attend to the same aspects of the environment.
1 code implementation • 30 Sep 2020 • Theodore R. Sumers, Mark K. Ho, Robert D. Hawkins, Karthik Narasimhan, Thomas L. Griffiths
The sentiment models outperform the inference network, with the "pragmatic" model approaching human performance.
Aspect-Based Sentiment Analysis
Aspect-Based Sentiment Analysis (ABSA)
+3
no code implementations • 5 Sep 2020 • Yun-Shiuan Chuang, Xuezhou Zhang, Yuzhe ma, Mark K. Ho, Joseph L. Austerweil, Xiaojin Zhu
To solve the machine teaching optimization problem, we use a deep learning approximation method which simulates learners in the environment and learns to predict how feedback affects the learner's internal states.
no code implementations • 27 Jul 2020 • Carlos G. Correa, Mark K. Ho, Fred Callaway, Thomas L. Griffiths
That is, rather than planning over a monolithic representation of a task, they decompose the task into simpler subtasks and then plan to accomplish those.
no code implementations • 13 Feb 2020 • Mark K. Ho, David Abel, Jonathan D. Cohen, Michael L. Littman, Thomas L. Griffiths
Thus, people should plan their actions, but they should also be smart about how they deploy resources used for planning their actions.
2 code implementations • NeurIPS 2019 • Micah Carroll, Rohin Shah, Mark K. Ho, Thomas L. Griffiths, Sanjit A. Seshia, Pieter Abbeel, Anca Dragan
While we would like agents that can coordinate with humans, current algorithms such as self-play and population-based training create agents that can coordinate with themselves.
no code implementations • 3 Jun 2019 • Mark K. Ho, Joanna Korman, Thomas L. Griffiths
Speech-acts can have literal meaning as well as pragmatic meaning, but these both involve consequences typically intended by a speaker.
no code implementations • NeurIPS 2018 • Marcell Vazquez-Chanlatte, Susmit Jha, Ashish Tiwari, Mark K. Ho, Sanjit A. Seshia
In this paper, we formulate the specification inference task as a maximum a posteriori (MAP) probability inference problem, apply the principle of maximum entropy to derive an analytic demonstration likelihood model and give an efficient approach to search for the most likely specification in a large candidate pool of specifications.
no code implementations • ICML 2017 • James MacGlashan, Mark K. Ho, Robert Loftin, Bei Peng, Guan Wang, David Roberts, Matthew E. Taylor, Michael L. Littman
This paper investigates the problem of interactively learning behaviors communicated by a human teacher using positive and negative feedback.
no code implementations • NeurIPS 2016 • Mark K. Ho, Michael Littman, James Macglashan, Fiery Cushman, Joseph L. Austerweil
Stark differences arise when demonstrators are intentionally teaching a task versus simply performing a task.