no code implementations • NeurIPS 2023 • Johannes Kirschner, Seyed Alireza Bakhtiari, Kushagra Chandak, Volodymyr Tkachuk, Csaba Szepesvári
A long line of works characterizes the sample complexity of regret minimization in sequential decision-making by min-max programs.
no code implementations • 8 Feb 2023 • Volodymyr Tkachuk, Seyed Alireza Bakhtiari, Johannes Kirschner, Matej Jusup, Ilija Bogunovic, Csaba Szepesvári
A practical challenge in reinforcement learning are combinatorial action spaces that make planning computationally demanding.
Multi-agent Reinforcement Learning reinforcement-learning +1
no code implementations • 7 Feb 2023 • Johannes Kirschner, Tor Lattimore, Andreas Krause
Partial monitoring is an expressive framework for sequential decision-making with an abundance of applications, including graph-structured and dueling bandits, dynamic pricing and transductive feedback models.
no code implementations • 19 Dec 2022 • Xiang Li, Viraj Mehta, Johannes Kirschner, Ian Char, Willie Neiswanger, Jeff Schneider, Andreas Krause, Ilija Bogunovic
Many real-world reinforcement learning tasks require control of complex dynamical systems that involve both costly data acquisition processes and large state spaces.
no code implementations • NeurIPS 2023 • Zichen Zhang, Johannes Kirschner, Junxi Zhang, Francesco Zanini, Alex Ayoub, Masood Dehghan, Dale Schuurmans
A default assumption in reinforcement learning (RL) and optimal control is that observations arrive at discrete time points on a fixed clock cycle.
no code implementations • 26 Mar 2022 • Johannes Kirschner, Mojmir Mutný, Andreas Krause, Jaime Coello de Portugal, Nicole Hiller, Jochem Snuverink
Tuning machine parameters of particle accelerators is a repetitive and time-consuming task that is challenging to automate.
no code implementations • 25 May 2021 • Johannes Kirschner, Andreas Krause
We consider Bayesian optimization in settings where observations can be adversarially biased, for example by an uncontrolled hidden confounder.
no code implementations • 21 Jan 2021 • Marc Jourdan, Mojmír Mutný, Johannes Kirschner, Andreas Krause
Combinatorial bandits with semi-bandit feedback generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set.
no code implementations • 11 Nov 2020 • Johannes Kirschner, Tor Lattimore, Claire Vernade, Csaba Szepesvári
We introduce a simple and efficient algorithm for stochastic linear bandits with finitely many actions that is asymptotically optimal and (nearly) worst-case optimal in finite time.
no code implementations • 25 Feb 2020 • Johannes Kirschner, Tor Lattimore, Andreas Krause
Partial monitoring is a rich framework for sequential decision making under uncertainty that generalizes many well known bandit models, including linear, combinatorial and dueling bandits.
no code implementations • 20 Feb 2020 • Johannes Kirschner, Ilija Bogunovic, Stefanie Jegelka, Andreas Krause
Attaining such robustness is the goal of distributionally robust optimization, which seeks a solution to an optimization problem that is worst-case robust under a specified distributional shift of an uncontrolled covariate.
1 code implementation • NeurIPS 2019 • Johannes Kirschner, Andreas Krause
We introduce a stochastic contextual bandit model where at each time step the environment chooses a distribution over a context set and samples the context from this distribution.
2 code implementations • 8 Feb 2019 • Johannes Kirschner, Mojmír Mutný, Nicole Hiller, Rasmus Ischebeck, Andreas Krause
In order to scale the method and keep its benefits, we propose an algorithm (LineBO) that restricts the problem to a sequence of iteratively chosen one-dimensional sub-problems that can be solved efficiently.
1 code implementation • ICLR 2019 • Nikolay Nikolov, Johannes Kirschner, Felix Berkenkamp, Andreas Krause
Efficient exploration remains a major challenge for reinforcement learning.
no code implementations • 29 Jan 2018 • Johannes Kirschner, Andreas Krause
In the stochastic bandit problem, the goal is to maximize an unknown function via a sequence of noisy evaluations.