no code implementations • 8 Jan 2024 • Gokul Swamy, Christoph Dann, Rahul Kidambi, Zhiwei Steven Wu, Alekh Agarwal

Our approach is maximalist in that it provably handles non-Markovian, intransitive, and stochastic preferences while being robust to the compounding errors that plague offline approaches to sequential prediction.

no code implementations • 5 Jun 2023 • Aldo Pacchiano, Christoph Dann, Claudio Gentile

We consider model selection for sequential decision making in stochastic environments with bandit feedback, where a meta-learner has at its disposal a pool of base learners, and decides on the fly which action to take based on the policies recommended by each base learner.

no code implementations • 20 Feb 2023 • Christoph Dann, Chen-Yu Wei, Julian Zimmert

Best-of-both-worlds algorithms for online learning which achieve near-optimal regret in both the adversarial and the stochastic regimes have received growing attention recently.

no code implementations • 18 Feb 2023 • Christoph Dann, Chen-Yu Wei, Julian Zimmert

Then we show that under known transitions, we can further obtain a first-order regret bound in the adversarial regime by leveraging the log-barrier regularizer.

no code implementations • 3 Feb 2023 • Christoph Dann, Yishay Mansour, Mehryar Mohri, Jon Schneider, Balasubramanian Sivan

We then use that to show, modulo mild normalization assumptions, that there exists an $\ell_\infty$-approachability algorithm whose convergence is independent of the dimension of the original vectorial payoff.

no code implementations • 31 Jan 2023 • Jonathan N. Lee, Alekh Agarwal, Christoph Dann, Tong Zhang

POMDPs capture a broad class of decision making problems, but hardness results suggest that learning is intractable even in simple settings due to the inherent partial observability.

no code implementations • 17 Oct 2022 • Christoph Dann, Chen-Yu Wei, Julian Zimmert

Our regret bound matches the best known results for the well-studied special case of stochastic shortest path (SSP) with all non-positive rewards.

no code implementations • NeurIPS 2021 • Christoph Dann, Mehryar Mohri, Tong Zhang, Julian Zimmert

Thompson Sampling is one of the most effective methods for contextual bandits and has been generalized to posterior sampling for certain MDP settings.

no code implementations • 29 Jun 2022 • Aldo Pacchiano, Christoph Dann, Claudio Gentile

We study the problem of model selection in bandit scenarios in the presence of nested policy classes, with the goal of obtaining simultaneous adversarial and stochastic ("best of both worlds") high-probability regret guarantees.

no code implementations • 19 Jun 2022 • Christoph Dann, Yishay Mansour, Mehryar Mohri, Ayush Sekhari, Karthik Sridharan

This paper presents a theoretical analysis of such policies and provides the first regret and sample-complexity bounds for reinforcement learning with myopic exploration.

1 code implementation • 21 Feb 2022 • Mariya Toneva, Jennifer Williams, Anand Bollu, Christoph Dann, Leila Wehbe

It is then natural to ask: "Is the activity in these different brain zones caused by the stimulus properties in the same way?"

no code implementations • 7 Oct 2021 • Chen-Yu Wei, Christoph Dann, Julian Zimmert

We develop a model selection approach to tackle reinforcement learning with adversarial corruption in both transition and reward.

no code implementations • NeurIPS 2021 • Christoph Dann, Teodor V. Marinov, Mehryar Mohri, Julian Zimmert

Our results show that optimistic algorithms can not achieve the information-theoretic lower bounds even in deterministic MDPs unless there is a unique optimal policy.

no code implementations • NeurIPS 2021 • Christoph Dann, Yishay Mansour, Mehryar Mohri, Ayush Sekhari, Karthik Sridharan

In this work, we consider the more realistic setting of agnostic RL with rich observation spaces and a fixed class of policies $\Pi$ that may not contain any near-optimal policy.

no code implementations • NeurIPS 2021 • Pranjal Awasthi, Christoph Dann, Claudio Gentile, Ayush Sekhari, Zhilei Wang

We investigate the problem of active learning in the streaming setting in non-parametric regimes, where the labels are stochastically generated from a class of functions on which we make no assumptions whatsoever.

no code implementations • 24 Dec 2020 • Aldo Pacchiano, Christoph Dann, Claudio Gentile, Peter Bartlett

Finally, unlike recent efforts in model selection for linear stochastic bandits, our approach is versatile enough to also cover cases where the context information is generated by an adversarial environment, rather than a stochastic one.

no code implementations • NeurIPS 2020 • Christoph Dann, Yishay Mansour, Mehryar Mohri, Ayush Sekhari, Karthik Sridharan

We study episodic reinforcement learning in Markov decision processes when the agent receives additional feedback per step in the form of several transition observations.

no code implementations • 5 Nov 2019 • Ramtin Keramati, Christoph Dann, Alex Tamkin, Emma Brunskill

While maximizing expected return is the goal in most reinforcement learning approaches, risk-sensitive objectives such as conditional value at risk (CVaR) are more suitable for many high-stakes applications.

no code implementations • 7 Nov 2018 • Christoph Dann, Lihong Li, Wei Wei, Emma Brunskill

The performance of a reinforcement learning algorithm can vary drastically during learning because of exploration.

no code implementations • ICML 2018 • Philip Thomas, Christoph Dann, Emma Brunskill

When creating a machine learning system, we must make two decisions: what representation should be used (i. e., what parameterized function should be used) and what learning rule should be used to search through the resulting set of representable functions.

no code implementations • NeurIPS 2018 • Christoph Dann, Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford, Robert E. Schapire

We study the computational tractability of PAC reinforcement learning with rich observations.

no code implementations • 9 Jun 2017 • Philip S. Thomas, Christoph Dann, Emma Brunskill

When creating an artificial intelligence system, we must make two decisions: what representation should be used (i. e., what parameterized function should be used) and what learning rule should be used to search through the resulting set of representable functions.

1 code implementation • NeurIPS 2017 • Christoph Dann, Tor Lattimore, Emma Brunskill

Statistical performance bounds for reinforcement learning (RL) algorithms can be critical for high-stakes applications like healthcare.

no code implementations • 21 Feb 2017 • Karan Goel, Christoph Dann, Emma Brunskill

Optimal stopping problems consider the question of deciding when to stop an observation-generating process in order to maximize a return.

no code implementations • 21 Nov 2016 • Christoph Dann, Katja Hofmann, Sebastian Nowozin

The study of memory as information that flows from the past to the current action opens avenues to understand and improve successful reinforcement learning algorithms.

3 code implementations • 5 Nov 2015 • Andrew Gordon Wilson, Christoph Dann, Hannes Nickisch

This multi-level circulant approximation allows one to unify the orthogonal computational benefits of fast Kronecker and Toeplitz approaches, and is significantly faster than either approach in isolation; 2) local kernel interpolation and inducing points to allow for arbitrarily located data inputs, and $O(1)$ test time predictions; 3) exploiting block-Toeplitz Toeplitz-block structure (BTTB), which enables fast inference and learning when multidimensional Kronecker structure is not present; and 4) projections of the input space to flexibly model correlated inputs and high dimensional data.

no code implementations • NeurIPS 2015 • Christoph Dann, Emma Brunskill

In this paper, we derive an upper PAC bound $\tilde O(\frac{|\mathcal S|^2 |\mathcal A| H^2}{\epsilon^2} \ln\frac 1 \delta)$ and a lower PAC bound $\tilde \Omega(\frac{|\mathcal S| |\mathcal A| H^2}{\epsilon^2} \ln \frac 1 {\delta + c})$ that match up to log-terms and an additional linear dependency on the number of states $|\mathcal S|$.

no code implementations • NeurIPS 2015 • Andrew Gordon Wilson, Christoph Dann, Christopher G. Lucas, Eric P. Xing

Bayesian nonparametric models, such as Gaussian processes, provide a compelling framework for automatic statistical modelling: these models have a high degree of flexibility, and automatically calibrated complexity.

no code implementations • 22 Jul 2015 • Amit Adam, Christoph Dann, Omer Yair, Shai Mazor, Sebastian Nowozin

We propose a computational model for shape, illumination and albedo inference in a pulsed time-of-flight (TOF) camera.

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.