no code implementations • 12 Apr 2024 • Shreyas Chaudhari, Pranjal Aggarwal, Vishvak Murahari, Tanmay Rajpurohit, Ashwin Kalyan, Karthik Narasimhan, Ameet Deshpande, Bruno Castro da Silva
A promising approach is reinforcement learning from human feedback (RLHF), which leverages human feedback to update the model in accordance with human preferences and mitigate issues like toxicity and hallucinations.
no code implementations • 20 Dec 2023 • Dhawal Gupta, Scott M. Jordan, Shreyas Chaudhari, Bo Liu, Philip S. Thomas, Bruno Castro da Silva
In this paper, we introduce a fresh perspective on the challenges of credit assignment and policy evaluation.
no code implementations • 16 May 2023 • James E. Kostas, Scott M. Jordan, Yash Chandak, Georgios Theocharous, Dhawal Gupta, Martha White, Bruno Castro da Silva, Philip S. Thomas
However, the coagent framework is not just an alternative to BDL; the two approaches can be blended: BDL can be combined with coagent learning rules to create architectures with the advantages of both approaches.
1 code implementation • 24 Jan 2023 • Yash Chandak, Shiv Shankar, Nathaniel D. Bastian, Bruno Castro da Silva, Emma Brunskil, Philip S. Thomas
Methods for sequential decision-making are often built upon a foundational assumption that the underlying decision process is stationary.
no code implementations • 30 Aug 2022 • Rushiv Arora, Bruno Castro da Silva, Eliot Moss
We found that an optimal policy trained on the discovered dynamics of the underlying system can generalize well.
Model-based Reinforcement Learning reinforcement-learning +1
no code implementations • 24 Aug 2022 • Aline Weber, Blossom Metevier, Yuriy Brun, Philip S. Thomas, Bruno Castro da Silva
Recent research has shown that seemingly fair machine learning models, when used to inform decisions that have an impact on peoples' lives or well-being (e. g., applications involving education, employment, and lending), can inadvertently increase social inequality in the long term.
no code implementations • ICLR 2022 • Stephen Giguere, Blossom Metevier, Yuriy Brun, Philip S. Thomas, Scott Niekum, Bruno Castro da Silva
Recent studies have demonstrated that using machine learning for social applications can lead to injustice in the form of racist, sexist, and otherwise unfair and discriminatory outcomes.
1 code implementation • NeurIPS 2021 • Yash Chandak, Scott Niekum, Bruno Castro da Silva, Erik Learned-Miller, Emma Brunskill, Philip S. Thomas
When faced with sequential decision-making problems, it is often useful to be able to predict what would happen if decisions were made using a new policy.
no code implementations • 27 Nov 2020 • Vieri Giuliano Santucci, Davide Montella, Bruno Castro da Silva, Gianluca Baldassarre
These situations pose two challenges: (a) to recognise the different contexts that need different policies; (b) quickly learn the policies to accomplish the same tasks in the new discovered contexts.
no code implementations • 6 Jan 2020 • Manuel Del Verme, Bruno Castro da Silva, Gianluca Baldassarre
Reinforcement learning can greatly benefit from the use of options as a way of encoding recurring behaviours and to foster exploration.
no code implementations • 7 May 2019 • Vieri Giuliano Santucci, Emilio Cartoni, Bruno Castro da Silva, Gianluca Baldassarre
Autonomy is fundamental for artificial agents acting in complex real-world scenarios.
no code implementations • 17 Aug 2017 • Philip S. Thomas, Bruno Castro da Silva, Andrew G. Barto, Emma Brunskill
We propose a new framework for designing machine learning algorithms that simplifies the problem of specifying and regulating undesirable behaviors.