Search Results for author: Daniil Tiapkin

Found 12 papers, 3 papers with code

Incentivized Learning in Principal-Agent Bandit Games

no code implementations6 Mar 2024 Antoine Scheid, Daniil Tiapkin, Etienne Boursier, Aymeric Capitaine, El Mahdi El Mhamdi, Eric Moulines, Michael I. Jordan, Alain Durmus

This work considers a repeated principal-agent bandit game, where the principal can only interact with her environment through the agent.

Demonstration-Regularized RL

no code implementations26 Oct 2023 Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Alexey Naumov, Pierre Perrault, Michal Valko, Pierre Menard

In particular, we study the demonstration-regularized reinforcement learning that leverages the expert demonstrations by KL-regularization for a policy learned by behavior cloning.

reinforcement-learning Reinforcement Learning (RL)

Finite-Sample Analysis of the Temporal Difference Learning

no code implementations22 Oct 2023 Sergey Samsonov, Daniil Tiapkin, Alexey Naumov, Eric Moulines

In this paper we consider the problem of obtaining sharp bounds for the performance of temporal difference (TD) methods with linear functional approximation for policy evaluation in discounted Markov Decision Processes.

Generative Flow Networks as Entropy-Regularized RL

1 code implementation19 Oct 2023 Daniil Tiapkin, Nikita Morozov, Alexey Naumov, Dmitry Vetrov

We demonstrate how the task of learning a generative flow network can be efficiently redefined as an entropy-regularized RL problem with a specific reward and regularizer structure.

Fast Rates for Maximum Entropy Exploration

1 code implementation14 Mar 2023 Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Remi Munos, Alexey Naumov, Pierre Perrault, Yunhao Tang, Michal Valko, Pierre Menard

Finally, we apply developed regularization techniques to reduce sample complexity of visitation entropy maximization to $\widetilde{\mathcal{O}}(H^2SA/\varepsilon^2)$, yielding a statistical separation between maximum entropy exploration and reward-free exploration.

Reinforcement Learning (RL)

Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees

1 code implementation28 Sep 2022 Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Remi Munos, Alexey Naumov, Mark Rowland, Michal Valko, Pierre Menard

We consider reinforcement learning in an environment modeled by an episodic, finite, stage-dependent Markov decision process of horizon $H$ with $S$ states, and $A$ actions.

reinforcement-learning Reinforcement Learning (RL)

From Dirichlet to Rubin: Optimistic Exploration in RL without Bonuses

no code implementations16 May 2022 Daniil Tiapkin, Denis Belomestny, Eric Moulines, Alexey Naumov, Sergey Samsonov, Yunhao Tang, Michal Valko, Pierre Menard

We propose the Bayes-UCBVI algorithm for reinforcement learning in tabular, stage-dependent, episodic Markov decision process: a natural extension of the Bayes-UCB algorithm by Kaufmann et al. (2012) for multi-armed bandits.

Multi-Armed Bandits

Primal-Dual Stochastic Mirror Descent for MDPs

no code implementations27 Feb 2021 Daniil Tiapkin, Alexander Gasnikov

We consider the problem of learning the optimal policy for infinite-horizon Markov decision processes (MDPs).

Improved Complexity Bounds in Wasserstein Barycenter Problem

no code implementations9 Oct 2020 Darina Dvinskikh, Daniil Tiapkin

In this paper, we focus on computational aspects of the Wasserstein barycenter problem.

Optimization and Control

Stochastic Saddle-Point Optimization for Wasserstein Barycenters

no code implementations11 Jun 2020 Daniil Tiapkin, Alexander Gasnikov, Pavel Dvurechensky

This leads to a complicated stochastic optimization problem where the objective is given as an expectation of a function given as a solution to a random optimization problem.

Stochastic Optimization

Cannot find the paper you are looking for? You can Submit a new open access paper.