no code implementations • 8 Mar 2022 • Charul Giri, Ole-Christoffer Granmo, Herke van Hoof, Christian D. Blakely
Hex is a turn-based two-player connection game with a high branching factor, making the game arbitrarily complex with increasing board sizes.
no code implementations • 7 Mar 2022 • Alexander Long, Alan Blair, Herke van Hoof
We present Nonparametric Approximation of Inter-Trace returns (NAIT), a Reinforcement Learning algorithm for discrete action, pixel-based environments that is both highly sample and computation efficient.
Ranked #6 on
Atari Games 100k
on Atari 100k
1 code implementation • 7 Mar 2022 • Tessa van der Heiden, Herke van Hoof, Efstratios Gavves, Christoph Salge
We consider multi-agent reinforcement learning (MARL) for cooperative communication and coordination tasks.
1 code implementation • 28 Jan 2022 • Niklas Höpner, Ilaria Tiddi, Herke van Hoof
Enabling reinforcement learning (RL) agents to leverage a knowledge base while learning from experience promises to advance RL in knowledge intensive domains.
1 code implementation • ICLR 2022 • Elise van der Pol, Herke van Hoof, Frans A. Oliehoek, Max Welling
This paper introduces Multi-Agent MDP Homomorphic Networks, a class of networks that allows distributed execution using only local information, yet is able to share experience between global symmetries in the joint state-action space of cooperative multi-agent systems.
no code implementations • 29 Sep 2021 • Jan Wöhlke, Felix Schmitt, Herke van Hoof
Combining the benefits of planning and learning values, we propose the Value Refinement Network (VRN), an architecture that locally refines a plan in a (simpler) state space abstraction, represented by a pre-computed value function, with respect to the full agent state.
no code implementations • 23 Sep 2021 • Jan Wöhlke, Felix Schmitt, Herke van Hoof
In simulated robotic navigation tasks, VI-RL results in consistent strong improvement over vanilla RL, is on par with vanilla hierarchal RL on single layouts but more broadly applicable to multiple layouts, and is on par with trainable HL path planning baselines except for a parking task with difficult non-holonomic dynamics where it shows marked improvements.
no code implementations • 1 Sep 2021 • Susan Amin, Maziar Gomrokchi, Harsh Satija, Herke van Hoof, Doina Precup
Exploration is an essential component of reinforcement learning algorithms, where agents need to learn how to predict and control unknown and often stochastic environments.
no code implementations • NeurIPS 2021 • David Kuric, Herke van Hoof
Hierarchical methods have the potential to allow reinforcement learning to scale to larger environments.
no code implementations • 22 Mar 2021 • Dmitrii Krasheninnikov, Rohin Shah, Herke van Hoof
We study this problem in the setting with two conflicting reward functions learned from different sources.
1 code implementation • NeurIPS 2021 • Wouter Kool, Herke van Hoof, Joaquim Gromicho, Max Welling
Routing problems are a class of combinatorial problems with many practical applications.
no code implementations • 16 Feb 2021 • Qi Wang, Herke van Hoof
Reinforcement learning is a promising paradigm for solving sequential decision-making problems, but low data efficiency and weak generalization across tasks are bottlenecks in real-world applications.
no code implementations • 1 Jan 2021 • Yijie Zhang, Herke van Hoof
In policy search methods for reinforcement learning (RL), exploration is often performed by injecting noise either in action space at each step independently or in parameter space over each full trajectory.
1 code implementation • NeurIPS 2020 • Tim Bakker, Herke van Hoof, Max Welling
In today's clinical practice, magnetic resonance imaging (MRI) is routinely accelerated through subsampling of the associated Fourier domain.
no code implementations • ICML 2020 • Qi. Wang, Herke van Hoof
Neural processes (NPs) constitute a family of variational approximate models for stochastic processes with promising properties in computational efficiency and uncertainty quantification.
no code implementations • 3 Jul 2020 • Joris Mollinga, Herke van Hoof
Air traffic control is becoming a more and more complex task due to the increasing number of aircraft.
2 code implementations • NeurIPS 2020 • Elise van der Pol, Daniel E. Worrall, Herke van Hoof, Frans A. Oliehoek, Max Welling
MDP homomorphic networks are neural networks that are equivariant under symmetries in the joint state-action space of an MDP.
1 code implementation • 18 Mar 2020 • Tessa van der Heiden, Florian Mirus, Herke van Hoof
In contrast to self-empowerment, a robot employing our approach strives for the empowerment of people in its environment, so they are not disturbed by the robot's presence and motion.
1 code implementation • ICLR 2020 • Wouter Kool, Herke van Hoof, Max Welling
We derive an unbiased estimator for expectations over discrete random variables based on sampling without replacement, which reduces variance as it avoids duplicate samples.
1 code implementation • 23 Oct 2019 • Sanjay Thakur, Herke van Hoof, Gunshi Gupta, David Meger
PAC Bayes is a generalized framework which is more resistant to overfitting and that yields performance bounds that hold with arbitrarily high probability even on the unjustified extrapolations.
no code implementations • 15 Jun 2019 • Sandeep Manjanna, Herke van Hoof, Gregory Dudek
In this paper, we present a search algorithm that generates efficient trajectories that optimize the rate at which probability mass is covered by a searcher.
no code implementations • ICLR Workshop drlStructPred 2019 • Wouter Kool, Herke van Hoof, Max Welling
REINFORCE can be used to train models in structured prediction settings to directly optimize the test-time objective.
3 code implementations • 14 Mar 2019 • Wouter Kool, Herke van Hoof, Max Welling
We show how to implicitly apply this 'Gumbel-Top-$k$' trick on a factorized distribution over sequences, allowing to draw exact samples without replacement using a Stochastic Beam Search.
1 code implementation • 13 Mar 2019 • Sanjay Thakur, Herke van Hoof, Juan Camilo Gamboa Higuera, Doina Precup, David Meger
Learned controllers such as neural networks typically do not have a notion of uncertainty that allows to diagnose an offset between training and testing conditions, and potentially intervene.
1 code implementation • 4 Dec 2018 • Lucas Caccia, Herke van Hoof, Aaron Courville, Joelle Pineau
In this work, we show that one can adapt deep generative models for this task by unravelling lidar scans into a 2D point map.
1 code implementation • EMNLP 2018 • Yue Dong, Yikang Shen, Eric Crawford, Herke van Hoof, Jackie Chi Kit Cheung
In this work, we propose a novel method for training neural networks to perform single-document extractive summarization without heuristically-generated extractive labels.
Ranked #9 on
Extractive Text Summarization
on CNN / Daily Mail
11 code implementations • ICLR 2019 • Wouter Kool, Herke van Hoof, Max Welling
The recently presented idea to learn heuristics for combinatorial optimization problems is promising as it can save costly development.
47 code implementations • ICML 2018 • Scott Fujimoto, Herke van Hoof, David Meger
In value-based reinforcement learning methods such as deep Q-learning, function approximation errors are known to lead to overestimated value estimates and suboptimal policies.
no code implementations • ICLR 2018 • Matthew J. A. Smith, Herke van Hoof, Joelle Pineau
In this work we develop a novel policy gradient method for the automatic learning of policies with options.
no code implementations • 10 Nov 2016 • Voot Tangkaratt, Herke van Hoof, Simone Parisi, Gerhard Neumann, Jan Peters, Masashi Sugiyama
A naive application of unsupervised dimensionality reduction methods to the context variables, such as principal component analysis, is insufficient as task-relevant input may be ignored.