Search Results for author: Herke van Hoof

Found 30 papers, 15 papers with code

Logic-based AI for Interpretable Board Game Winner Prediction with Tsetlin Machine

no code implementations8 Mar 2022 Charul Giri, Ole-Christoffer Granmo, Herke van Hoof, Christian D. Blakely

Hex is a turn-based two-player connection game with a high branching factor, making the game arbitrarily complex with increasing board sizes.

Fast and Data Efficient Reinforcement Learning from Pixels via Non-Parametric Value Approximation

no code implementations7 Mar 2022 Alexander Long, Alan Blair, Herke van Hoof

We present Nonparametric Approximation of Inter-Trace returns (NAIT), a Reinforcement Learning algorithm for discrete action, pixel-based environments that is both highly sample and computation efficient.

Atari Games 100k reinforcement-learning

Leveraging class abstraction for commonsense reinforcement learning via residual policy gradient methods

1 code implementation28 Jan 2022 Niklas Höpner, Ilaria Tiddi, Herke van Hoof

Enabling reinforcement learning (RL) agents to leverage a knowledge base while learning from experience promises to advance RL in knowledge intensive domains.

Knowledge Graphs Policy Gradient Methods +1

Multi-Agent MDP Homomorphic Networks

1 code implementation ICLR 2022 Elise van der Pol, Herke van Hoof, Frans A. Oliehoek, Max Welling

This paper introduces Multi-Agent MDP Homomorphic Networks, a class of networks that allows distributed execution using only local information, yet is able to share experience between global symmetries in the joint state-action space of cooperative multi-agent systems.

Value Refinement Network (VRN)

no code implementations29 Sep 2021 Jan Wöhlke, Felix Schmitt, Herke van Hoof

Combining the benefits of planning and learning values, we propose the Value Refinement Network (VRN), an architecture that locally refines a plan in a (simpler) state space abstraction, represented by a pre-computed value function, with respect to the full agent state.


Hierarchies of Planning and Reinforcement Learning for Robot Navigation

no code implementations23 Sep 2021 Jan Wöhlke, Felix Schmitt, Herke van Hoof

In simulated robotic navigation tasks, VI-RL results in consistent strong improvement over vanilla RL, is on par with vanilla hierarchal RL on single layouts but more broadly applicable to multiple layouts, and is on par with trainable HL path planning baselines except for a parking task with difficult non-holonomic dynamics where it shows marked improvements.

reinforcement-learning Robot Navigation

A Survey of Exploration Methods in Reinforcement Learning

no code implementations1 Sep 2021 Susan Amin, Maziar Gomrokchi, Harsh Satija, Herke van Hoof, Doina Precup

Exploration is an essential component of reinforcement learning algorithms, where agents need to learn how to predict and control unknown and often stochastic environments.


Combining Reward Information from Multiple Sources

no code implementations22 Mar 2021 Dmitrii Krasheninnikov, Rohin Shah, Herke van Hoof

We study this problem in the setting with two conflicting reward functions learned from different sources.


Model-based Meta Reinforcement Learning using Graph Structured Surrogate Models

no code implementations16 Feb 2021 Qi Wang, Herke van Hoof

Reinforcement learning is a promising paradigm for solving sequential decision-making problems, but low data efficiency and weak generalization across tasks are bottlenecks in real-world applications.

Decision Making Meta Reinforcement Learning +1

Deep Coherent Exploration For Continuous Control

no code implementations1 Jan 2021 Yijie Zhang, Herke van Hoof

In policy search methods for reinforcement learning (RL), exploration is often performed by injecting noise either in action space at each step independently or in parameter space over each full trajectory.

Continuous Control

Experimental design for MRI by greedy policy search

1 code implementation NeurIPS 2020 Tim Bakker, Herke van Hoof, Max Welling

In today's clinical practice, magnetic resonance imaging (MRI) is routinely accelerated through subsampling of the associated Fourier domain.

Experimental Design Policy Gradient Methods

Doubly Stochastic Variational Inference for Neural Processes with Hierarchical Latent Variables

no code implementations ICML 2020 Qi. Wang, Herke van Hoof

Neural processes (NPs) constitute a family of variational approximate models for stochastic processes with promising properties in computational efficiency and uncertainty quantification.

Variational Inference

An Autonomous Free Airspace En-route Controller using Deep Reinforcement Learning Techniques

no code implementations3 Jul 2020 Joris Mollinga, Herke van Hoof

Air traffic control is becoming a more and more complex task due to the increasing number of aircraft.


Social Navigation with Human Empowerment driven Deep Reinforcement Learning

1 code implementation18 Mar 2020 Tessa van der Heiden, Florian Mirus, Herke van Hoof

In contrast to self-empowerment, a robot employing our approach strives for the empowerment of people in its environment, so they are not disturbed by the robot's presence and motion.

reinforcement-learning Robot Navigation

Estimating Gradients for Discrete Random Variables by Sampling without Replacement

1 code implementation ICLR 2020 Wouter Kool, Herke van Hoof, Max Welling

We derive an unbiased estimator for expectations over discrete random variables based on sampling without replacement, which reduces variance as it avoids duplicate samples.

Structured Prediction

Unifying Variational Inference and PAC-Bayes for Supervised Learning that Scales

1 code implementation23 Oct 2019 Sanjay Thakur, Herke van Hoof, Gunshi Gupta, David Meger

PAC Bayes is a generalized framework which is more resistant to overfitting and that yields performance bounds that hold with arbitrarily high probability even on the unjustified extrapolations.

Variational Inference

Reinforcement Learning with Non-uniform State Representations for Adaptive Search

no code implementations15 Jun 2019 Sandeep Manjanna, Herke van Hoof, Gregory Dudek

In this paper, we present a search algorithm that generates efficient trajectories that optimize the rate at which probability mass is covered by a searcher.


Buy 4 REINFORCE Samples, Get a Baseline for Free!

no code implementations ICLR Workshop drlStructPred 2019 Wouter Kool, Herke van Hoof, Max Welling

REINFORCE can be used to train models in structured prediction settings to directly optimize the test-time objective.

Structured Prediction

Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement

3 code implementations14 Mar 2019 Wouter Kool, Herke van Hoof, Max Welling

We show how to implicitly apply this 'Gumbel-Top-$k$' trick on a factorized distribution over sequences, allowing to draw exact samples without replacement using a Stochastic Beam Search.


Uncertainty Aware Learning from Demonstrations in Multiple Contexts using Bayesian Neural Networks

1 code implementation13 Mar 2019 Sanjay Thakur, Herke van Hoof, Juan Camilo Gamboa Higuera, Doina Precup, David Meger

Learned controllers such as neural networks typically do not have a notion of uncertainty that allows to diagnose an offset between training and testing conditions, and potentially intervene.

Deep Generative Modeling of LiDAR Data

1 code implementation4 Dec 2018 Lucas Caccia, Herke van Hoof, Aaron Courville, Joelle Pineau

In this work, we show that one can adapt deep generative models for this task by unravelling lidar scans into a 2D point map.

Point Cloud Generation

BanditSum: Extractive Summarization as a Contextual Bandit

1 code implementation EMNLP 2018 Yue Dong, Yikang Shen, Eric Crawford, Herke van Hoof, Jackie Chi Kit Cheung

In this work, we propose a novel method for training neural networks to perform single-document extractive summarization without heuristically-generated extractive labels.

Extractive Summarization Extractive Text Summarization +1

Attention, Learn to Solve Routing Problems!

11 code implementations ICLR 2019 Wouter Kool, Herke van Hoof, Max Welling

The recently presented idea to learn heuristics for combinatorial optimization problems is promising as it can save costly development.

Combinatorial Optimization

Addressing Function Approximation Error in Actor-Critic Methods

47 code implementations ICML 2018 Scott Fujimoto, Herke van Hoof, David Meger

In value-based reinforcement learning methods such as deep Q-learning, function approximation errors are known to lead to overestimated value estimates and suboptimal policies.

OpenAI Gym Q-Learning +1

An inference-based policy gradient method for learning options

no code implementations ICLR 2018 Matthew J. A. Smith, Herke van Hoof, Joelle Pineau

In this work we develop a novel policy gradient method for the automatic learning of policies with options.

Policy Search with High-Dimensional Context Variables

no code implementations10 Nov 2016 Voot Tangkaratt, Herke van Hoof, Simone Parisi, Gerhard Neumann, Jan Peters, Masashi Sugiyama

A naive application of unsupervised dimensionality reduction methods to the context variables, such as principal component analysis, is insufficient as task-relevant input may be ignored.

Dimensionality Reduction

Cannot find the paper you are looking for? You can Submit a new open access paper.