Search Results for author: Christoph Dann

Found 29 papers, 3 papers with code

A Minimaximalist Approach to Reinforcement Learning from Human Feedback

no code implementations8 Jan 2024 Gokul Swamy, Christoph Dann, Rahul Kidambi, Zhiwei Steven Wu, Alekh Agarwal

Our approach is maximalist in that it provably handles non-Markovian, intransitive, and stochastic preferences while being robust to the compounding errors that plague offline approaches to sequential prediction.

Continuous Control reinforcement-learning

Data-Driven Online Model Selection With Regret Guarantees

no code implementations5 Jun 2023 Aldo Pacchiano, Christoph Dann, Claudio Gentile

We consider model selection for sequential decision making in stochastic environments with bandit feedback, where a meta-learner has at its disposal a pool of base learners, and decides on the fly which action to take based on the policies recommended by each base learner.

Decision Making Model Selection

A Blackbox Approach to Best of Both Worlds in Bandits and Beyond

no code implementations20 Feb 2023 Christoph Dann, Chen-Yu Wei, Julian Zimmert

Best-of-both-worlds algorithms for online learning which achieve near-optimal regret in both the adversarial and the stochastic regimes have received growing attention recently.

Multi-Armed Bandits

Best of Both Worlds Policy Optimization

no code implementations18 Feb 2023 Christoph Dann, Chen-Yu Wei, Julian Zimmert

Then we show that under known transitions, we can further obtain a first-order regret bound in the adversarial regime by leveraging the log-barrier regularizer.

Pseudonorm Approachability and Applications to Regret Minimization

no code implementations3 Feb 2023 Christoph Dann, Yishay Mansour, Mehryar Mohri, Jon Schneider, Balasubramanian Sivan

We then use that to show, modulo mild normalization assumptions, that there exists an $\ell_\infty$-approachability algorithm whose convergence is independent of the dimension of the original vectorial payoff.

Learning in POMDPs is Sample-Efficient with Hindsight Observability

no code implementations31 Jan 2023 Jonathan N. Lee, Alekh Agarwal, Christoph Dann, Tong Zhang

POMDPs capture a broad class of decision making problems, but hardness results suggest that learning is intractable even in simple settings due to the inherent partial observability.

Decision Making Scheduling

A Unified Algorithm for Stochastic Path Problems

no code implementations17 Oct 2022 Christoph Dann, Chen-Yu Wei, Julian Zimmert

Our regret bound matches the best known results for the well-studied special case of stochastic shortest path (SSP) with all non-positive rewards.

Best of Both Worlds Model Selection

no code implementations29 Jun 2022 Aldo Pacchiano, Christoph Dann, Claudio Gentile

We study the problem of model selection in bandit scenarios in the presence of nested policy classes, with the goal of obtaining simultaneous adversarial and stochastic ("best of both worlds") high-probability regret guarantees.

Model Selection

Guarantees for Epsilon-Greedy Reinforcement Learning with Function Approximation

no code implementations19 Jun 2022 Christoph Dann, Yishay Mansour, Mehryar Mohri, Ayush Sekhari, Karthik Sridharan

This paper presents a theoretical analysis of such policies and provides the first regret and sample-complexity bounds for reinforcement learning with myopic exploration.

reinforcement-learning Reinforcement Learning (RL)

Same Cause; Different Effects in the Brain

1 code implementation21 Feb 2022 Mariya Toneva, Jennifer Williams, Anand Bollu, Christoph Dann, Leila Wehbe

It is then natural to ask: "Is the activity in these different brain zones caused by the stimulus properties in the same way?"

A Model Selection Approach for Corruption Robust Reinforcement Learning

no code implementations7 Oct 2021 Chen-Yu Wei, Christoph Dann, Julian Zimmert

We develop a model selection approach to tackle reinforcement learning with adversarial corruption in both transition and reward.

Model Selection Multi-Armed Bandits +3

Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds for Episodic Reinforcement Learning

no code implementations NeurIPS 2021 Christoph Dann, Teodor V. Marinov, Mehryar Mohri, Julian Zimmert

Our results show that optimistic algorithms can not achieve the information-theoretic lower bounds even in deterministic MDPs unless there is a unique optimal policy.

reinforcement-learning Reinforcement Learning (RL)

Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations

no code implementations NeurIPS 2021 Christoph Dann, Yishay Mansour, Mehryar Mohri, Ayush Sekhari, Karthik Sridharan

In this work, we consider the more realistic setting of agnostic RL with rich observation spaces and a fixed class of policies $\Pi$ that may not contain any near-optimal policy.

reinforcement-learning Reinforcement Learning (RL)

Neural Active Learning with Performance Guarantees

no code implementations NeurIPS 2021 Pranjal Awasthi, Christoph Dann, Claudio Gentile, Ayush Sekhari, Zhilei Wang

We investigate the problem of active learning in the streaming setting in non-parametric regimes, where the labels are stochastically generated from a class of functions on which we make no assumptions whatsoever.

Active Learning Model Selection

Regret Bound Balancing and Elimination for Model Selection in Bandits and RL

no code implementations24 Dec 2020 Aldo Pacchiano, Christoph Dann, Claudio Gentile, Peter Bartlett

Finally, unlike recent efforts in model selection for linear stochastic bandits, our approach is versatile enough to also cover cases where the context information is generated by an adversarial environment, rather than a stochastic one.

Model Selection valid

Reinforcement Learning with Feedback Graphs

no code implementations NeurIPS 2020 Christoph Dann, Yishay Mansour, Mehryar Mohri, Ayush Sekhari, Karthik Sridharan

We study episodic reinforcement learning in Markov decision processes when the agent receives additional feedback per step in the form of several transition observations.

reinforcement-learning Reinforcement Learning (RL)

Being Optimistic to Be Conservative: Quickly Learning a CVaR Policy

no code implementations5 Nov 2019 Ramtin Keramati, Christoph Dann, Alex Tamkin, Emma Brunskill

While maximizing expected return is the goal in most reinforcement learning approaches, risk-sensitive objectives such as conditional value at risk (CVaR) are more suitable for many high-stakes applications.

Policy Certificates: Towards Accountable Reinforcement Learning

no code implementations7 Nov 2018 Christoph Dann, Lihong Li, Wei Wei, Emma Brunskill

The performance of a reinforcement learning algorithm can vary drastically during learning because of exploration.

reinforcement-learning Reinforcement Learning (RL)

Decoupling Gradient-Like Learning Rules from Representations

no code implementations ICML 2018 Philip Thomas, Christoph Dann, Emma Brunskill

When creating a machine learning system, we must make two decisions: what representation should be used (i. e., what parameterized function should be used) and what learning rule should be used to search through the resulting set of representable functions.

BIG-bench Machine Learning

Decoupling Learning Rules from Representations

no code implementations9 Jun 2017 Philip S. Thomas, Christoph Dann, Emma Brunskill

When creating an artificial intelligence system, we must make two decisions: what representation should be used (i. e., what parameterized function should be used) and what learning rule should be used to search through the resulting set of representable functions.

Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning

1 code implementation NeurIPS 2017 Christoph Dann, Tor Lattimore, Emma Brunskill

Statistical performance bounds for reinforcement learning (RL) algorithms can be critical for high-stakes applications like healthcare.

reinforcement-learning Reinforcement Learning (RL)

Sample Efficient Policy Search for Optimal Stopping Domains

no code implementations21 Feb 2017 Karan Goel, Christoph Dann, Emma Brunskill

Optimal stopping problems consider the question of deciding when to stop an observation-generating process in order to maximize a return.

Memory Lens: How Much Memory Does an Agent Use?

no code implementations21 Nov 2016 Christoph Dann, Katja Hofmann, Sebastian Nowozin

The study of memory as information that flows from the past to the current action opens avenues to understand and improve successful reinforcement learning algorithms.

reinforcement-learning Reinforcement Learning (RL)

Thoughts on Massively Scalable Gaussian Processes

3 code implementations5 Nov 2015 Andrew Gordon Wilson, Christoph Dann, Hannes Nickisch

This multi-level circulant approximation allows one to unify the orthogonal computational benefits of fast Kronecker and Toeplitz approaches, and is significantly faster than either approach in isolation; 2) local kernel interpolation and inducing points to allow for arbitrarily located data inputs, and $O(1)$ test time predictions; 3) exploiting block-Toeplitz Toeplitz-block structure (BTTB), which enables fast inference and learning when multidimensional Kronecker structure is not present; and 4) projections of the input space to flexibly model correlated inputs and high dimensional data.

Gaussian Processes

Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning

no code implementations NeurIPS 2015 Christoph Dann, Emma Brunskill

In this paper, we derive an upper PAC bound $\tilde O(\frac{|\mathcal S|^2 |\mathcal A| H^2}{\epsilon^2} \ln\frac 1 \delta)$ and a lower PAC bound $\tilde \Omega(\frac{|\mathcal S| |\mathcal A| H^2}{\epsilon^2} \ln \frac 1 {\delta + c})$ that match up to log-terms and an additional linear dependency on the number of states $|\mathcal S|$.

reinforcement-learning Reinforcement Learning (RL)

The Human Kernel

no code implementations NeurIPS 2015 Andrew Gordon Wilson, Christoph Dann, Christopher G. Lucas, Eric P. Xing

Bayesian nonparametric models, such as Gaussian processes, provide a compelling framework for automatic statistical modelling: these models have a high degree of flexibility, and automatically calibrated complexity.

Gaussian Processes

Bayesian Time-of-Flight for Realtime Shape, Illumination and Albedo

no code implementations22 Jul 2015 Amit Adam, Christoph Dann, Omer Yair, Shai Mazor, Sebastian Nowozin

We propose a computational model for shape, illumination and albedo inference in a pulsed time-of-flight (TOF) camera.

Cannot find the paper you are looking for? You can Submit a new open access paper.