Search Results for author: Herke van Hoof

Found 40 papers, 21 papers with code

Planning with a Learned Policy Basis to Optimally Solve Complex Tasks

no code implementations22 Mar 2024 Guillermo Infante, David Kuric, Anders Jonsson, Vicenç Gómez, Herke van Hoof

Conventional reinforcement learning (RL) methods can successfully solve a wide range of sequential decision problems.

Reinforcement Learning (RL)

Hierarchical Reinforcement Learning for Power Network Topology Control

1 code implementation3 Nov 2023 Blazej Manczak, Jan Viebahn, Herke van Hoof

Whereas at the highest level a purely rule-based policy is still chosen for all agents in this study, at the intermediate level the policy is trained using different state-of-the-art RL algorithms.

Hierarchical Reinforcement Learning reinforcement-learning +2

Learning Objective-Specific Active Learning Strategies with Attentive Neural Processes

1 code implementation11 Sep 2023 Tim Bakker, Herke van Hoof, Max Welling

In this work, we propose a novel LAL method for classification that exploits symmetry and independence properties of the active learning problem with an Attentive Conditional Neural Process model.

Active Learning

Uncoupled Learning of Differential Stackelberg Equilibria with Commitments

1 code implementation7 Feb 2023 Robert Loftin, Mustafa Mert Çelikok, Herke van Hoof, Samuel Kaski, Frans A. Oliehoek

A natural solution concept in such settings is the Stackelberg equilibrium, in which the ``leader'' agent selects the strategy that maximizes its own payoff given that the ``follower'' agent will choose their best response to this strategy.

Deep Reinforcement Learning Multi-agent Reinforcement Learning

Reusable Options through Gradient-based Meta Learning

1 code implementation22 Dec 2022 David Kuric, Herke van Hoof

Subsequently, we formulate the desiderata for reusable options and use these to frame the problem of learning options as a gradient-based meta-learning problem.

Meta-Learning

Exposure-Aware Recommendation using Contextual Bandits

no code implementations4 Sep 2022 Masoud Mansoury, Bamshad Mobasher, Herke van Hoof

This is especially problematic when bias is amplified over time as a few items (e. g., popular ones) are repeatedly over-represented in recommendation lists and users' interactions with those items will amplify bias towards those items over time resulting in a feedback loop.

Multi-Armed Bandits Recommendation Systems

Calculus on MDPs: Potential Shaping as a Gradient

no code implementations20 Aug 2022 Erik Jenner, Herke van Hoof, Adam Gleave

In reinforcement learning, different reward functions can be equivalent in terms of the optimal policies they induce.

Math

Neural Topological Ordering for Computation Graphs

no code implementations13 Jul 2022 Mukul Gagrani, Corrado Rainone, Yang Yang, Harris Teague, Wonseok Jeon, Herke van Hoof, Weiliang Will Zeng, Piero Zappi, Christopher Lott, Roberto Bondesan

Recent works on machine learning for combinatorial optimization have shown that learning based approaches can outperform heuristic methods in terms of speed and performance.

2k BIG-bench Machine Learning +3

Logic-based AI for Interpretable Board Game Winner Prediction with Tsetlin Machine

no code implementations8 Mar 2022 Charul Giri, Ole-Christoffer Granmo, Herke van Hoof, Christian D. Blakely

Hex is a turn-based two-player connection game with a high branching factor, making the game arbitrarily complex with increasing board sizes.

Fast and Data Efficient Reinforcement Learning from Pixels via Non-Parametric Value Approximation

no code implementations7 Mar 2022 Alexander Long, Alan Blair, Herke van Hoof

We present Nonparametric Approximation of Inter-Trace returns (NAIT), a Reinforcement Learning algorithm for discrete action, pixel-based environments that is both highly sample and computation efficient.

Atari Games 100k reinforcement-learning +1

Leveraging class abstraction for commonsense reinforcement learning via residual policy gradient methods

1 code implementation28 Jan 2022 Niklas Höpner, Ilaria Tiddi, Herke van Hoof

Enabling reinforcement learning (RL) agents to leverage a knowledge base while learning from experience promises to advance RL in knowledge intensive domains.

Knowledge Graphs Policy Gradient Methods +2

Multi-Agent MDP Homomorphic Networks

1 code implementation ICLR 2022 Elise van der Pol, Herke van Hoof, Frans A. Oliehoek, Max Welling

This paper introduces Multi-Agent MDP Homomorphic Networks, a class of networks that allows distributed execution using only local information, yet is able to share experience between global symmetries in the joint state-action space of cooperative multi-agent systems.

Value Refinement Network (VRN)

no code implementations29 Sep 2021 Jan Wöhlke, Felix Schmitt, Herke van Hoof

Combining the benefits of planning and learning values, we propose the Value Refinement Network (VRN), an architecture that locally refines a plan in a (simpler) state space abstraction, represented by a pre-computed value function, with respect to the full agent state.

Q-Learning Reinforcement Learning (RL)

Hierarchies of Planning and Reinforcement Learning for Robot Navigation

no code implementations23 Sep 2021 Jan Wöhlke, Felix Schmitt, Herke van Hoof

In simulated robotic navigation tasks, VI-RL results in consistent strong improvement over vanilla RL, is on par with vanilla hierarchal RL on single layouts but more broadly applicable to multiple layouts, and is on par with trainable HL path planning baselines except for a parking task with difficult non-holonomic dynamics where it shows marked improvements.

reinforcement-learning Reinforcement Learning +2

A Survey of Exploration Methods in Reinforcement Learning

no code implementations1 Sep 2021 Susan Amin, Maziar Gomrokchi, Harsh Satija, Herke van Hoof, Doina Precup

Exploration is an essential component of reinforcement learning algorithms, where agents need to learn how to predict and control unknown and often stochastic environments.

reinforcement-learning Reinforcement Learning +2

Combining Reward Information from Multiple Sources

no code implementations22 Mar 2021 Dmitrii Krasheninnikov, Rohin Shah, Herke van Hoof

We study this problem in the setting with two conflicting reward functions learned from different sources.

Informativeness

Model-based Meta Reinforcement Learning using Graph Structured Surrogate Models

no code implementations16 Feb 2021 Qi Wang, Herke van Hoof

Reinforcement learning is a promising paradigm for solving sequential decision-making problems, but low data efficiency and weak generalization across tasks are bottlenecks in real-world applications.

Decision Making Meta Reinforcement Learning +5

Deep Coherent Exploration For Continuous Control

no code implementations1 Jan 2021 Yijie Zhang, Herke van Hoof

In policy search methods for reinforcement learning (RL), exploration is often performed by injecting noise either in action space at each step independently or in parameter space over each full trajectory.

continuous-control Continuous Control +1

Experimental design for MRI by greedy policy search

2 code implementations NeurIPS 2020 Tim Bakker, Herke van Hoof, Max Welling

In today's clinical practice, magnetic resonance imaging (MRI) is routinely accelerated through subsampling of the associated Fourier domain.

Experimental Design Policy Gradient Methods

Doubly Stochastic Variational Inference for Neural Processes with Hierarchical Latent Variables

no code implementations ICML 2020 Qi. Wang, Herke van Hoof

Neural processes (NPs) constitute a family of variational approximate models for stochastic processes with promising properties in computational efficiency and uncertainty quantification.

Computational Efficiency regression +2

Social Navigation with Human Empowerment driven Deep Reinforcement Learning

1 code implementation18 Mar 2020 Tessa van der Heiden, Florian Mirus, Herke van Hoof

In contrast to self-empowerment, a robot employing our approach strives for the empowerment of people in its environment, so they are not disturbed by the robot's presence and motion.

Deep Reinforcement Learning reinforcement-learning +2

Estimating Gradients for Discrete Random Variables by Sampling without Replacement

1 code implementation ICLR 2020 Wouter Kool, Herke van Hoof, Max Welling

We derive an unbiased estimator for expectations over discrete random variables based on sampling without replacement, which reduces variance as it avoids duplicate samples.

Structured Prediction

Unifying Variational Inference and PAC-Bayes for Supervised Learning that Scales

1 code implementation23 Oct 2019 Sanjay Thakur, Herke van Hoof, Gunshi Gupta, David Meger

PAC Bayes is a generalized framework which is more resistant to overfitting and that yields performance bounds that hold with arbitrarily high probability even on the unjustified extrapolations.

Variational Inference

Reinforcement Learning with Non-uniform State Representations for Adaptive Search

no code implementations15 Jun 2019 Sandeep Manjanna, Herke van Hoof, Gregory Dudek

In this paper, we present a search algorithm that generates efficient trajectories that optimize the rate at which probability mass is covered by a searcher.

reinforcement-learning Reinforcement Learning +1

Buy 4 REINFORCE Samples, Get a Baseline for Free!

no code implementations ICLR Workshop drlStructPred 2019 Wouter Kool, Herke van Hoof, Max Welling

REINFORCE can be used to train models in structured prediction settings to directly optimize the test-time objective.

Structured Prediction

Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement

4 code implementations14 Mar 2019 Wouter Kool, Herke van Hoof, Max Welling

We show how to implicitly apply this 'Gumbel-Top-$k$' trick on a factorized distribution over sequences, allowing to draw exact samples without replacement using a Stochastic Beam Search.

Sentence Translation

Uncertainty Aware Learning from Demonstrations in Multiple Contexts using Bayesian Neural Networks

1 code implementation13 Mar 2019 Sanjay Thakur, Herke van Hoof, Juan Camilo Gamboa Higuera, Doina Precup, David Meger

Learned controllers such as neural networks typically do not have a notion of uncertainty that allows to diagnose an offset between training and testing conditions, and potentially intervene.

Diversity

Deep Generative Modeling of LiDAR Data

1 code implementation4 Dec 2018 Lucas Caccia, Herke van Hoof, Aaron Courville, Joelle Pineau

In this work, we show that one can adapt deep generative models for this task by unravelling lidar scans into a 2D point map.

Point Cloud Generation

BanditSum: Extractive Summarization as a Contextual Bandit

1 code implementation EMNLP 2018 Yue Dong, Yikang Shen, Eric Crawford, Herke van Hoof, Jackie Chi Kit Cheung

In this work, we propose a novel method for training neural networks to perform single-document extractive summarization without heuristically-generated extractive labels.

Extractive Summarization Extractive Text Summarization +1

Attention, Learn to Solve Routing Problems!

15 code implementations ICLR 2019 Wouter Kool, Herke van Hoof, Max Welling

The recently presented idea to learn heuristics for combinatorial optimization problems is promising as it can save costly development.

Combinatorial Optimization

Addressing Function Approximation Error in Actor-Critic Methods

67 code implementations ICML 2018 Scott Fujimoto, Herke van Hoof, David Meger

In value-based reinforcement learning methods such as deep Q-learning, function approximation errors are known to lead to overestimated value estimates and suboptimal policies.

Continuous Control OpenAI Gym +4

An inference-based policy gradient method for learning options

no code implementations ICLR 2018 Matthew J. A. Smith, Herke van Hoof, Joelle Pineau

In this work we develop a novel policy gradient method for the automatic learning of policies with options.

Policy Search with High-Dimensional Context Variables

no code implementations10 Nov 2016 Voot Tangkaratt, Herke van Hoof, Simone Parisi, Gerhard Neumann, Jan Peters, Masashi Sugiyama

A naive application of unsupervised dimensionality reduction methods to the context variables, such as principal component analysis, is insufficient as task-relevant input may be ignored.

Dimensionality Reduction Vocal Bursts Intensity Prediction

Cannot find the paper you are looking for? You can Submit a new open access paper.