Search Results for author: Denis Belomestny

Found 13 papers, 2 papers with code

Demonstration-Regularized RL

no code implementations • 26 Oct 2023 • Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Alexey Naumov, Pierre Perrault, Michal Valko, Pierre Menard

In particular, we study the demonstration-regularized reinforcement learning that leverages the expert demonstrations by KL-regularization for a policy learned by behavior cloning.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Sharp Deviations Bounds for Dirichlet Weighted Sums with Application to analysis of Bayesian algorithms

no code implementations • 6 Apr 2023 • Denis Belomestny, Pierre Menard, Alexey Naumov, Daniil Tiapkin, Michal Valko

These bounds are based on a novel integral representation of the density of a weighted Dirichlet sum.

Multi-Armed Bandits Thompson Sampling

Paper
Add Code

Theoretical guarantees for neural control variates in MCMC

no code implementations • 3 Apr 2023 • Denis Belomestny, Artur Goldman, Alexey Naumov, Sergey Samsonov

In this paper, we propose a variance reduction approach for Markov chains based on additive control variates and the minimization of an appropriate estimate for the asymptotic variance.

Paper
Add Code

Fast Rates for Maximum Entropy Exploration

1 code implementation • 14 Mar 2023 • Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Remi Munos, Alexey Naumov, Pierre Perrault, Yunhao Tang, Michal Valko, Pierre Menard

Finally, we apply developed regularization techniques to reduce sample complexity of visitation entropy maximization to $\widetilde{\mathcal{O}}(H^2SA/\varepsilon^2)$, yielding a statistical separation between maximum entropy exploration and reward-free exploration.

Reinforcement Learning (RL)

Paper
Code

Primal-dual regression approach for Markov decision processes with general state and action space

no code implementations • 1 Oct 2022 • Denis Belomestny, John Schoenmakers

As a result, our method allows for the construction of tight upper and lower biased approximations of the value functions, and, provides tight approximations to the optimal policy.

regression

Paper
Add Code

Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees

1 code implementation • 28 Sep 2022 • Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Remi Munos, Alexey Naumov, Mark Rowland, Michal Valko, Pierre Menard

We consider reinforcement learning in an environment modeled by an episodic, finite, stage-dependent Markov decision process of horizon $H$ with $S$ states, and $A$ actions.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Variance Reduction for Policy-Gradient Methods via Empirical Variance Minimization

no code implementations • 14 Jun 2022 • Maxim Kaledin, Alexander Golubev, Denis Belomestny

Policy-gradient methods in Reinforcement Learning(RL) are very universal and widely applied in practice but their performance suffers from the high variance of the gradient estimate.

Policy Gradient Methods Reinforcement Learning (RL)

Paper
Add Code

From Dirichlet to Rubin: Optimistic Exploration in RL without Bonuses

no code implementations • 16 May 2022 • Daniil Tiapkin, Denis Belomestny, Eric Moulines, Alexey Naumov, Sergey Samsonov, Yunhao Tang, Michal Valko, Pierre Menard

We propose the Bayes-UCBVI algorithm for reinforcement learning in tabular, stage-dependent, episodic Markov decision process: a natural extension of the Bayes-UCB algorithm by Kaufmann et al. (2012) for multi-armed bandits.

Multi-Armed Bandits

Paper
Add Code

A Reproducing Kernel Hilbert Space approach to singular local stochastic volatility McKean-Vlasov models

no code implementations • 2 Mar 2022 • Christian Bayer, Denis Belomestny, Oleg Butkovsky, John Schoenmakers

Motivated by the challenges related to the calibration of financial models, we consider the problem of numerically solving a singular McKean-Vlasov equation $$ d X_t= \sigma(t, X_t) X_t \frac{\sqrt v_t}{\sqrt {E[v_t|X_t]}}dW_t, $$ where $W$ is a Brownian motion and $v$ is an adapted diffusion process.

Paper
Add Code

From optimal martingales to randomized dual optimal stopping

no code implementations • 2 Feb 2021 • Denis Belomestny, John Schoenmakers

As a main feature, in a possibly large family of optimal martingales the algorithm efficiently selects a martingale that is as close as possible to the Doob martingale.

Probability Optimization and Control Computational Finance 91G60, 65C05, 60G40

Paper
Add Code

Rates of convergence for density estimation with generative adversarial networks

no code implementations • 30 Jan 2021 • Nikita Puchkin, Sergey Samsonov, Denis Belomestny, Eric Moulines, Alexey Naumov

In this work we undertake a thorough study of the non-asymptotic properties of the vanilla generative adversarial networks (GANs).

Density Estimation

Paper
Add Code

Reinforced optimal control

no code implementations • 24 Nov 2020 • Christian Bayer, Denis Belomestny, Paul Hager, Paolo Pigato, John Schoenmakers, Vladimir Spokoiny

Least squares Monte Carlo methods are a popular numerical approximation method for solving stochastic control problems.

Math regression

Paper
Add Code

Optimal stopping via reinforced regression

no code implementations • 7 Aug 2018 • Denis Belomestny, John Schoenmakers, Vladimir Spokoiny, Bakhyt Zharkynbay

In this note we propose a new approach towards solving numerically optimal stopping problems via reinforced regression based Monte Carlo algorithms.

regression

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.