1 code implementation • 8 Feb 2025 • Miroslav Štrupl, Oleg Szehr, Francesco Faccio, Dylan R. Ashley, Rupesh Kumar Srivastava, Jürgen Schmidhuber
This article provides a rigorous analysis of convergence and stability of Episodic Upside-Down Reinforcement Learning, Goal-Conditioned Supervised Learning and Online Decision Transformers.
1 code implementation • 27 Jan 2025 • Jacopo Di Ventura, Dylan R. Ashley, Vincent Herrmann, Francesco Faccio, Jürgen Schmidhuber
Our method, dubbed Upside Down Reinforcement Learning with Policy Generators (UDRLPG), streamlines comparable techniques by removing the need for an evaluator or critic to update the weights of the generator.
1 code implementation • 4 Dec 2024 • Wenyi Wang, Hisham A. Alyahya, Dylan R. Ashley, Oleg Serikov, Dmitrii Khizbullin, Francesco Faccio, Jürgen Schmidhuber
Language-based agentic systems have shown great promise in recent years, transitioning from solving small-scale research problems to being deployed in challenging real-world tasks.
no code implementations • 12 Jun 2024 • Yuhui Wang, Qingyuan Wu, Weida Li, Dylan R. Ashley, Francesco Faccio, Chao Huang, Jürgen Schmidhuber
The Value Iteration Network (VIN) is an end-to-end differentiable architecture that performs value iteration on a latent MDP for planning in reinforcement learning (RL).
no code implementations • 5 Jun 2024 • Yuhui Wang, Weida Li, Francesco Faccio, Qingyuan Wu, Jürgen Schmidhuber
To address this problem, we embed highway value iteration -- a recent algorithm designed to facilitate long-term credit assignment -- into the structure of VINs.
no code implementations • 28 May 2024 • Yuhui Wang, Miroslav Strupl, Francesco Faccio, Qingyuan Wu, Haozhe Liu, Michał Grudzień, Xiaoyang Tan, Jürgen Schmidhuber
We show, however, that such IS-free methods underestimate the optimal value function (VF), especially for large $n$, restricting their capacity to efficiently utilize information from distant future time steps.
1 code implementation • 11 Apr 2024 • Mohannad Alhakami, Dylan R. Ashley, Joel Dunham, Yanning Dai, Francesco Faccio, Eric Feron, Jürgen Schmidhuber
Advanced machine learning algorithms require platforms that are extremely robust and equipped with rich sensory feedback to handle extensive trial-and-error learning without relying on strong inductive biases.
1 code implementation • 3 Apr 2024 • Haozhe Liu, Wentian Zhang, Jinheng Xie, Francesco Faccio, Mengmeng Xu, Tao Xiang, Mike Zheng Shou, Juan-Manuel Perez-Rua, Jürgen Schmidhuber
We explore the role of attention mechanism during inference in text-conditional diffusion models.
1 code implementation • 18 Mar 2024 • Vincent Herrmann, Francesco Faccio, Jürgen Schmidhuber
Recurrent Neural Networks (RNNs) are general-purpose parallel-sequential computers.
2 code implementations • 26 Feb 2024 • Mingchen Zhuge, Wenyi Wang, Louis Kirsch, Francesco Faccio, Dmitrii Khizbullin, Jürgen Schmidhuber
Various human-designed prompt engineering techniques have been proposed to improve problem solvers based on Large Language Models (LLMs), yielding many disparate code bases.
1 code implementation • 20 Sep 2023 • Aleksandar Stanić, Dylan Ashley, Oleg Serikov, Louis Kirsch, Francesco Faccio, Jürgen Schmidhuber, Thomas Hofmann, Imanol Schlag
We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours.
1 code implementation • ICCV 2023 • Haozhe Liu, Mingchen Zhuge, Bing Li, Yuhui Wang, Francesco Faccio, Bernard Ghanem, Jürgen Schmidhuber
Recent work on deep reinforcement learning (DRL) has pointed out that algorithmic information about good policies can be extracted from offline data which lack explicit information about executed actions.
no code implementations • 26 May 2023 • Mingchen Zhuge, Haozhe Liu, Francesco Faccio, Dylan R. Ashley, Róbert Csordás, Anand Gopalakrishnan, Abdullah Hamdi, Hasan Abed Al Kader Hammoud, Vincent Herrmann, Kazuki Irie, Louis Kirsch, Bing Li, Guohao Li, Shuming Liu, Jinjie Mai, Piotr Piękos, Aditya Ramesh, Imanol Schlag, Weimin Shi, Aleksandar Stanić, Wenyi Wang, Yuhui Wang, Mengmeng Xu, Deng-Ping Fan, Bernard Ghanem, Jürgen Schmidhuber
What should be the social structure of an NLSOM?
1 code implementation • 4 Jul 2022 • Francesco Faccio, Vincent Herrmann, Aditya Ramesh, Louis Kirsch, Jürgen Schmidhuber
A form of weight-sharing HyperNetworks and policy embeddings scales our method to generate deep NNs.
1 code implementation • 4 Jul 2022 • Francesco Faccio, Aditya Ramesh, Vincent Herrmann, Jean Harb, Jürgen Schmidhuber
In continuous control problems with infinitely many states, our value function minimizes its prediction error by simultaneously learning a small set of `probing states' and a mapping from actions produced in probing states to the policy's return.
2 code implementations • 3 Jun 2022 • Kazuki Irie, Francesco Faccio, Jürgen Schmidhuber
Neural ordinary differential equations (ODEs) have attracted much attention as continuous-time counterparts of deep residual neural networks (NNs), and numerous extensions for recurrent NNs have been proposed.
1 code implementation • 13 May 2022 • Miroslav Štrupl, Francesco Faccio, Dylan R. Ashley, Jürgen Schmidhuber, Rupesh Kumar Srivastava
Upside-Down Reinforcement Learning (UDRL) is an approach for solving RL problems that does not require value functions and uses only supervised learning, where the targets for given inputs in a dataset do not change over time.
1 code implementation • 19 Jul 2021 • Miroslav Štrupl, Francesco Faccio, Dylan R. Ashley, Rupesh Kumar Srivastava, Jürgen Schmidhuber
Reward-Weighted Regression (RWR) belongs to a family of widely known iterative Reinforcement Learning algorithms based on the Expectation-Maximization framework.
no code implementations • 12 Jul 2021 • Noor Sajid, Francesco Faccio, Lancelot Da Costa, Thomas Parr, Jürgen Schmidhuber, Karl Friston
Under the Bayesian brain hypothesis, behavioural variations can be attributed to different priors over generative model parameters.
1 code implementation • ICLR 2021 • Francesco Faccio, Louis Kirsch, Jürgen Schmidhuber
We introduce a class of value functions called Parameter-Based Value Functions (PBVFs) whose inputs include the policy parameters.
2 code implementations • NeurIPS 2018 • Alberto Maria Metelli, Matteo Papini, Francesco Faccio, Marcello Restelli
Policy optimization is an effective reinforcement learning approach to solve continuous control tasks.