Search Results for author: Bruno Castro da Silva

Found 12 papers, 2 papers with code

RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs

no code implementations • 12 Apr 2024 • Shreyas Chaudhari, Pranjal Aggarwal, Vishvak Murahari, Tanmay Rajpurohit, Ashwin Kalyan, Karthik Narasimhan, Ameet Deshpande, Bruno Castro da Silva

A promising approach is reinforcement learning from human feedback (RLHF), which leverages human feedback to update the model in accordance with human preferences and mitigate issues like toxicity and hallucinations.

Language Modelling reinforcement-learning

Paper
Add Code

From Past to Future: Rethinking Eligibility Traces

no code implementations • 20 Dec 2023 • Dhawal Gupta, Scott M. Jordan, Shreyas Chaudhari, Bo Liu, Philip S. Thomas, Bruno Castro da Silva

In this paper, we introduce a fresh perspective on the challenges of credit assignment and policy evaluation.

Paper
Add Code

Coagent Networks: Generalized and Scaled

no code implementations • 16 May 2023 • James E. Kostas, Scott M. Jordan, Yash Chandak, Georgios Theocharous, Dhawal Gupta, Martha White, Bruno Castro da Silva, Philip S. Thomas

However, the coagent framework is not just an alternative to BDL; the two approaches can be blended: BDL can be combined with coagent learning rules to create architectures with the advantages of both approaches.

Reinforcement Learning (RL)

Paper
Add Code

Off-Policy Evaluation for Action-Dependent Non-Stationary Environments

1 code implementation • 24 Jan 2023 • Yash Chandak, Shiv Shankar, Nathaniel D. Bastian, Bruno Castro da Silva, Emma Brunskil, Philip S. Thomas

Methods for sequential decision-making are often built upon a foundational assumption that the underlying decision process is stationary.

counterfactual Counterfactual Reasoning +2

Paper
Code

Model-Based Reinforcement Learning with SINDy

no code implementations • 30 Aug 2022 • Rushiv Arora, Bruno Castro da Silva, Eliot Moss

We found that an optimal policy trained on the discovered dynamics of the underlying system can generalize well.

Model-based Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Enforcing Delayed-Impact Fairness Guarantees

no code implementations • 24 Aug 2022 • Aline Weber, Blossom Metevier, Yuriy Brun, Philip S. Thomas, Bruno Castro da Silva

Recent research has shown that seemingly fair machine learning models, when used to inform decisions that have an impact on peoples' lives or well-being (e. g., applications involving education, employment, and lending), can inadvertently increase social inequality in the long term.

Fairness

Paper
Add Code

Fairness Guarantees under Demographic Shift

no code implementations • ICLR 2022 • Stephen Giguere, Blossom Metevier, Yuriy Brun, Philip S. Thomas, Scott Niekum, Bruno Castro da Silva

Recent studies have demonstrated that using machine learning for social applications can lead to injustice in the form of racist, sexist, and otherwise unfair and discriminatory outcomes.

Fairness

Paper
Add Code

Universal Off-Policy Evaluation

1 code implementation • NeurIPS 2021 • Yash Chandak, Scott Niekum, Bruno Castro da Silva, Erik Learned-Miller, Emma Brunskill, Philip S. Thomas

When faced with sequential decision-making problems, it is often useful to be able to predict what would happen if decisions were made using a new policy.

counterfactual Decision Making +1

Paper
Code

Autonomous learning of multiple, context-dependent tasks

no code implementations • 27 Nov 2020 • Vieri Giuliano Santucci, Davide Montella, Bruno Castro da Silva, Gianluca Baldassarre

These situations pose two challenges: (a) to recognise the different contexts that need different policies; (b) quickly learn the policies to accomplish the same tasks in the new discovered contexts.

Transfer Learning

Paper
Add Code

Optimal Options for Multi-Task Reinforcement Learning Under Time Constraints

no code implementations • 6 Jan 2020 • Manuel Del Verme, Bruno Castro da Silva, Gianluca Baldassarre

Reinforcement learning can greatly benefit from the use of options as a way of encoding recurring behaviours and to foster exploration.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Autonomous Open-Ended Learning of Interdependent Tasks

no code implementations • 7 May 2019 • Vieri Giuliano Santucci, Emilio Cartoni, Bruno Castro da Silva, Gianluca Baldassarre

Autonomy is fundamental for artificial agents acting in complex real-world scenarios.

Decision Making Open-Ended Question Answering

Paper
Add Code

On Ensuring that Intelligent Machines Are Well-Behaved

no code implementations • 17 Aug 2017 • Philip S. Thomas, Bruno Castro da Silva, Andrew G. Barto, Emma Brunskill

We propose a new framework for designing machine learning algorithms that simplifies the problem of specifying and regulating undesirable behaviors.

BIG-bench Machine Learning

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.