Search Results for author: Francesco Faccio

Found 21 papers, 16 papers with code

On the Convergence and Stability of Upside-Down Reinforcement Learning, Goal-Conditioned Supervised Learning, and Online Decision Transformers

1 code implementation8 Feb 2025 Miroslav Štrupl, Oleg Szehr, Francesco Faccio, Dylan R. Ashley, Rupesh Kumar Srivastava, Jürgen Schmidhuber

This article provides a rigorous analysis of convergence and stability of Episodic Upside-Down Reinforcement Learning, Goal-Conditioned Supervised Learning and Online Decision Transformers.

reinforcement-learning Reinforcement Learning

Upside Down Reinforcement Learning with Policy Generators

1 code implementation27 Jan 2025 Jacopo Di Ventura, Dylan R. Ashley, Vincent Herrmann, Francesco Faccio, Jürgen Schmidhuber

Our method, dubbed Upside Down Reinforcement Learning with Policy Generators (UDRLPG), streamlines comparable techniques by removing the need for an evaluator or critic to update the weights of the generator.

reinforcement-learning Reinforcement Learning

How to Correctly do Semantic Backpropagation on Language-based Agentic Systems

1 code implementation4 Dec 2024 Wenyi Wang, Hisham A. Alyahya, Dylan R. Ashley, Oleg Serikov, Dmitrii Khizbullin, Francesco Faccio, Jürgen Schmidhuber

Language-based agentic systems have shown great promise in recent years, transitioning from solving small-scale research problems to being deployed in challenging real-world tasks.

GSM8K

Scaling Value Iteration Networks to 5000 Layers for Extreme Long-Term Planning

no code implementations12 Jun 2024 Yuhui Wang, Qingyuan Wu, Weida Li, Dylan R. Ashley, Francesco Faccio, Chao Huang, Jürgen Schmidhuber

The Value Iteration Network (VIN) is an end-to-end differentiable architecture that performs value iteration on a latent MDP for planning in reinforcement learning (RL).

Reinforcement Learning (RL)

Highway Value Iteration Networks

no code implementations5 Jun 2024 Yuhui Wang, Weida Li, Francesco Faccio, Qingyuan Wu, Jürgen Schmidhuber

To address this problem, we embed highway value iteration -- a recent algorithm designed to facilitate long-term credit assignment -- into the structure of VINs.

Diversity Safe Exploration

Highway Reinforcement Learning

no code implementations28 May 2024 Yuhui Wang, Miroslav Strupl, Francesco Faccio, Qingyuan Wu, Haozhe Liu, Michał Grudzień, Xiaoyang Tan, Jürgen Schmidhuber

We show, however, that such IS-free methods underestimate the optimal value function (VF), especially for large $n$, restricting their capacity to efficiently utilize information from distant future time steps.

Q-Learning reinforcement-learning +2

Towards a Robust Soft Baby Robot With Rich Interaction Ability for Advanced Machine Learning Algorithms

1 code implementation11 Apr 2024 Mohannad Alhakami, Dylan R. Ashley, Joel Dunham, Yanning Dai, Francesco Faccio, Eric Feron, Jürgen Schmidhuber

Advanced machine learning algorithms require platforms that are extremely robust and equipped with rich sensory feedback to handle extensive trial-and-error learning without relying on strong inductive biases.

Language Agents as Optimizable Graphs

2 code implementations26 Feb 2024 Mingchen Zhuge, Wenyi Wang, Louis Kirsch, Francesco Faccio, Dmitrii Khizbullin, Jürgen Schmidhuber

Various human-designed prompt engineering techniques have been proposed to improve problem solvers based on Large Language Models (LLMs), yielding many disparate code bases.

Prompt Engineering

Learning to Identify Critical States for Reinforcement Learning from Videos

1 code implementation ICCV 2023 Haozhe Liu, Mingchen Zhuge, Bing Li, Yuhui Wang, Francesco Faccio, Bernard Ghanem, Jürgen Schmidhuber

Recent work on deep reinforcement learning (DRL) has pointed out that algorithmic information about good policies can be extracted from offline data which lack explicit information about executed actions.

Deep Reinforcement Learning reinforcement-learning

General Policy Evaluation and Improvement by Learning to Identify Few But Crucial States

1 code implementation4 Jul 2022 Francesco Faccio, Aditya Ramesh, Vincent Herrmann, Jean Harb, Jürgen Schmidhuber

In continuous control problems with infinitely many states, our value function minimizes its prediction error by simultaneously learning a small set of `probing states' and a mapping from actions produced in probing states to the policy's return.

continuous-control Continuous Control +2

Neural Differential Equations for Learning to Program Neural Nets Through Continuous Learning Rules

2 code implementations3 Jun 2022 Kazuki Irie, Francesco Faccio, Jürgen Schmidhuber

Neural ordinary differential equations (ODEs) have attracted much attention as continuous-time counterparts of deep residual neural networks (NNs), and numerous extensions for recurrent NNs have been proposed.

Time Series Time Series Analysis +1

Upside-Down Reinforcement Learning Can Diverge in Stochastic Environments With Episodic Resets

1 code implementation13 May 2022 Miroslav Štrupl, Francesco Faccio, Dylan R. Ashley, Jürgen Schmidhuber, Rupesh Kumar Srivastava

Upside-Down Reinforcement Learning (UDRL) is an approach for solving RL problems that does not require value functions and uses only supervised learning, where the targets for given inputs in a dataset do not change over time.

reinforcement-learning Reinforcement Learning (RL)

Reward-Weighted Regression Converges to a Global Optimum

1 code implementation19 Jul 2021 Miroslav Štrupl, Francesco Faccio, Dylan R. Ashley, Rupesh Kumar Srivastava, Jürgen Schmidhuber

Reward-Weighted Regression (RWR) belongs to a family of widely known iterative Reinforcement Learning algorithms based on the Expectation-Maximization framework.

regression Reinforcement Learning (RL)

Bayesian brains and the Rényi divergence

no code implementations12 Jul 2021 Noor Sajid, Francesco Faccio, Lancelot Da Costa, Thomas Parr, Jürgen Schmidhuber, Karl Friston

Under the Bayesian brain hypothesis, behavioural variations can be attributed to different priors over generative model parameters.

Bayesian Inference Variational Inference

Parameter-Based Value Functions

1 code implementation ICLR 2021 Francesco Faccio, Louis Kirsch, Jürgen Schmidhuber

We introduce a class of value functions called Parameter-Based Value Functions (PBVFs) whose inputs include the policy parameters.

continuous-control Continuous Control +1

Cannot find the paper you are looking for? You can Submit a new open access paper.