1 code implementation • 8 Aug 2024 • Masoud Mansoury, Bamshad Mobasher, Herke van Hoof
In this paper, we study exposure bias in a class of well-known contextual bandit algorithms known as Linear Cascading Bandits.
1 code implementation • 29 Apr 2024 • Jin Huang, Harrie Oosterhuis, Masoud Mansoury, Herke van Hoof, Maarten de Rijke
Debiasing methods aim to mitigate the effect of selection bias on the evaluation and optimization of RSs.
no code implementations • 22 Mar 2024 • Guillermo Infante, David Kuric, Anders Jonsson, Vicenç Gómez, Herke van Hoof
Conventional reinforcement learning (RL) methods can successfully solve a wide range of sequential decision problems.
1 code implementation • 3 Nov 2023 • Blazej Manczak, Jan Viebahn, Herke van Hoof
Whereas at the highest level a purely rule-based policy is still chosen for all agents in this study, at the intermediate level the policy is trained using different state-of-the-art RL algorithms.
Hierarchical Reinforcement Learning reinforcement-learning +2
1 code implementation • 11 Sep 2023 • Tim Bakker, Herke van Hoof, Max Welling
In this work, we propose a novel LAL method for classification that exploits symmetry and independence properties of the active learning problem with an Attentive Conditional Neural Process model.
1 code implementation • 7 Feb 2023 • Robert Loftin, Mustafa Mert Çelikok, Herke van Hoof, Samuel Kaski, Frans A. Oliehoek
A natural solution concept in such settings is the Stackelberg equilibrium, in which the ``leader'' agent selects the strategy that maximizes its own payoff given that the ``follower'' agent will choose their best response to this strategy.
Deep Reinforcement Learning Multi-agent Reinforcement Learning
1 code implementation • 22 Dec 2022 • David Kuric, Herke van Hoof
Subsequently, we formulate the desiderata for reusable options and use these to frame the problem of learning options as a gradient-based meta-learning problem.
no code implementations • 4 Sep 2022 • Masoud Mansoury, Bamshad Mobasher, Herke van Hoof
This is especially problematic when bias is amplified over time as a few items (e. g., popular ones) are repeatedly over-represented in recommendation lists and users' interactions with those items will amplify bias towards those items over time resulting in a feedback loop.
no code implementations • 20 Aug 2022 • Erik Jenner, Herke van Hoof, Adam Gleave
In reinforcement learning, different reward functions can be equivalent in terms of the optimal policies they induce.
no code implementations • 13 Jul 2022 • Mukul Gagrani, Corrado Rainone, Yang Yang, Harris Teague, Wonseok Jeon, Herke van Hoof, Weiliang Will Zeng, Piero Zappi, Christopher Lott, Roberto Bondesan
Recent works on machine learning for combinatorial optimization have shown that learning based approaches can outperform heuristic methods in terms of speed and performance.
no code implementations • 8 Mar 2022 • Charul Giri, Ole-Christoffer Granmo, Herke van Hoof, Christian D. Blakely
Hex is a turn-based two-player connection game with a high branching factor, making the game arbitrarily complex with increasing board sizes.
no code implementations • 7 Mar 2022 • Alexander Long, Alan Blair, Herke van Hoof
We present Nonparametric Approximation of Inter-Trace returns (NAIT), a Reinforcement Learning algorithm for discrete action, pixel-based environments that is both highly sample and computation efficient.
Ranked #14 on Atari Games 100k on Atari 100k
1 code implementation • 7 Mar 2022 • Tessa van der Heiden, Herke van Hoof, Efstratios Gavves, Christoph Salge
We consider multi-agent reinforcement learning (MARL) for cooperative communication and coordination tasks.
Multi-agent Reinforcement Learning reinforcement-learning +1
1 code implementation • 28 Jan 2022 • Niklas Höpner, Ilaria Tiddi, Herke van Hoof
Enabling reinforcement learning (RL) agents to leverage a knowledge base while learning from experience promises to advance RL in knowledge intensive domains.
1 code implementation • ICLR 2022 • Elise van der Pol, Herke van Hoof, Frans A. Oliehoek, Max Welling
This paper introduces Multi-Agent MDP Homomorphic Networks, a class of networks that allows distributed execution using only local information, yet is able to share experience between global symmetries in the joint state-action space of cooperative multi-agent systems.
no code implementations • 29 Sep 2021 • Jan Wöhlke, Felix Schmitt, Herke van Hoof
Combining the benefits of planning and learning values, we propose the Value Refinement Network (VRN), an architecture that locally refines a plan in a (simpler) state space abstraction, represented by a pre-computed value function, with respect to the full agent state.
no code implementations • 23 Sep 2021 • Jan Wöhlke, Felix Schmitt, Herke van Hoof
In simulated robotic navigation tasks, VI-RL results in consistent strong improvement over vanilla RL, is on par with vanilla hierarchal RL on single layouts but more broadly applicable to multiple layouts, and is on par with trainable HL path planning baselines except for a parking task with difficult non-holonomic dynamics where it shows marked improvements.
no code implementations • 1 Sep 2021 • Susan Amin, Maziar Gomrokchi, Harsh Satija, Herke van Hoof, Doina Precup
Exploration is an essential component of reinforcement learning algorithms, where agents need to learn how to predict and control unknown and often stochastic environments.
no code implementations • NeurIPS 2021 • David Kuric, Herke van Hoof
Hierarchical methods have the potential to allow reinforcement learning to scale to larger environments.
no code implementations • 22 Mar 2021 • Dmitrii Krasheninnikov, Rohin Shah, Herke van Hoof
We study this problem in the setting with two conflicting reward functions learned from different sources.
2 code implementations • NeurIPS 2021 • Wouter Kool, Herke van Hoof, Joaquim Gromicho, Max Welling
Routing problems are a class of combinatorial problems with many practical applications.
no code implementations • 16 Feb 2021 • Qi Wang, Herke van Hoof
Reinforcement learning is a promising paradigm for solving sequential decision-making problems, but low data efficiency and weak generalization across tasks are bottlenecks in real-world applications.
no code implementations • 1 Jan 2021 • Yijie Zhang, Herke van Hoof
In policy search methods for reinforcement learning (RL), exploration is often performed by injecting noise either in action space at each step independently or in parameter space over each full trajectory.
2 code implementations • NeurIPS 2020 • Tim Bakker, Herke van Hoof, Max Welling
In today's clinical practice, magnetic resonance imaging (MRI) is routinely accelerated through subsampling of the associated Fourier domain.
no code implementations • ICML 2020 • Qi. Wang, Herke van Hoof
Neural processes (NPs) constitute a family of variational approximate models for stochastic processes with promising properties in computational efficiency and uncertainty quantification.
no code implementations • 3 Jul 2020 • Joris Mollinga, Herke van Hoof
Air traffic control is becoming a more and more complex task due to the increasing number of aircraft.
2 code implementations • NeurIPS 2020 • Elise van der Pol, Daniel E. Worrall, Herke van Hoof, Frans A. Oliehoek, Max Welling
MDP homomorphic networks are neural networks that are equivariant under symmetries in the joint state-action space of an MDP.
1 code implementation • 18 Mar 2020 • Tessa van der Heiden, Florian Mirus, Herke van Hoof
In contrast to self-empowerment, a robot employing our approach strives for the empowerment of people in its environment, so they are not disturbed by the robot's presence and motion.
1 code implementation • ICLR 2020 • Wouter Kool, Herke van Hoof, Max Welling
We derive an unbiased estimator for expectations over discrete random variables based on sampling without replacement, which reduces variance as it avoids duplicate samples.
1 code implementation • 23 Oct 2019 • Sanjay Thakur, Herke van Hoof, Gunshi Gupta, David Meger
PAC Bayes is a generalized framework which is more resistant to overfitting and that yields performance bounds that hold with arbitrarily high probability even on the unjustified extrapolations.
no code implementations • 15 Jun 2019 • Sandeep Manjanna, Herke van Hoof, Gregory Dudek
In this paper, we present a search algorithm that generates efficient trajectories that optimize the rate at which probability mass is covered by a searcher.
no code implementations • ICLR Workshop drlStructPred 2019 • Wouter Kool, Herke van Hoof, Max Welling
REINFORCE can be used to train models in structured prediction settings to directly optimize the test-time objective.
4 code implementations • 14 Mar 2019 • Wouter Kool, Herke van Hoof, Max Welling
We show how to implicitly apply this 'Gumbel-Top-$k$' trick on a factorized distribution over sequences, allowing to draw exact samples without replacement using a Stochastic Beam Search.
1 code implementation • 13 Mar 2019 • Sanjay Thakur, Herke van Hoof, Juan Camilo Gamboa Higuera, Doina Precup, David Meger
Learned controllers such as neural networks typically do not have a notion of uncertainty that allows to diagnose an offset between training and testing conditions, and potentially intervene.
1 code implementation • 4 Dec 2018 • Lucas Caccia, Herke van Hoof, Aaron Courville, Joelle Pineau
In this work, we show that one can adapt deep generative models for this task by unravelling lidar scans into a 2D point map.
1 code implementation • EMNLP 2018 • Yue Dong, Yikang Shen, Eric Crawford, Herke van Hoof, Jackie Chi Kit Cheung
In this work, we propose a novel method for training neural networks to perform single-document extractive summarization without heuristically-generated extractive labels.
Ranked #10 on Extractive Text Summarization on CNN / Daily Mail
15 code implementations • ICLR 2019 • Wouter Kool, Herke van Hoof, Max Welling
The recently presented idea to learn heuristics for combinatorial optimization problems is promising as it can save costly development.
67 code implementations • ICML 2018 • Scott Fujimoto, Herke van Hoof, David Meger
In value-based reinforcement learning methods such as deep Q-learning, function approximation errors are known to lead to overestimated value estimates and suboptimal policies.
Ranked #2 on OpenAI Gym on Ant-v4
no code implementations • ICLR 2018 • Matthew J. A. Smith, Herke van Hoof, Joelle Pineau
In this work we develop a novel policy gradient method for the automatic learning of policies with options.
no code implementations • 10 Nov 2016 • Voot Tangkaratt, Herke van Hoof, Simone Parisi, Gerhard Neumann, Jan Peters, Masashi Sugiyama
A naive application of unsupervised dimensionality reduction methods to the context variables, such as principal component analysis, is insufficient as task-relevant input may be ignored.