Search Results for author: Subhojyoti Mukherjee

Found 11 papers, 0 papers with code

Efficient and Interpretable Bandit Algorithms

no code implementations23 Oct 2023 Subhojyoti Mukherjee, Ruihao Zhu, Branislav Kveton

We propose CODE, a bandit algorithm based on a Constrained Optimal DEsign, that is interpretable and maximally reduces the uncertainty.

SPEED: Experimental Design for Policy Evaluation in Linear Heteroscedastic Bandits

no code implementations29 Jan 2023 Subhojyoti Mukherjee, Qiaomin Xie, Josiah Hanna, Robert Nowak

In this paper, we study the problem of optimal data collection for policy evaluation in linear bandits.

Experimental Design

Safety Aware Changepoint Detection for Piecewise i.i.d. Bandits

no code implementations27 May 2022 Subhojyoti Mukherjee

We provide regret bounds for our algorithms and show that the bounds are comparable to their counterparts from the safe bandit and piecewise i. i. d.

ReVar: Strengthening Policy Evaluation via Reduced Variance Sampling

no code implementations9 Mar 2022 Subhojyoti Mukherjee, Josiah P. Hanna, Robert Nowak

This paper studies the problem of data collection for policy evaluation in Markov decision processes (MDPs).

Nearly Optimal Algorithms for Level Set Estimation

no code implementations2 Nov 2021 Blake Mason, Romain Camilleri, Subhojyoti Mukherjee, Kevin Jamieson, Robert Nowak, Lalit Jain

The threshold value $\alpha$ can either be \emph{explicit} and provided a priori, or \emph{implicit} and defined relative to the optimal function value, i. e. $\alpha = (1-\epsilon)f(x_\ast)$ for a given $\epsilon > 0$ where $f(x_\ast)$ is the maximal function value and is unknown.

Experimental Design

Chernoff Sampling for Active Testing and Extension to Active Regression

no code implementations15 Dec 2020 Subhojyoti Mukherjee, Ardhendu Tripathy, Robert Nowak

Active learning can reduce the number of samples needed to perform a hypothesis test and to estimate the parameters of a model.

Active Learning Experimental Design +1

Distribution-dependent and Time-uniform Bounds for Piecewise i.i.d Bandits

no code implementations30 May 2019 Subhojyoti Mukherjee, Odalric-Ambrym Maillard

The second strategy \ImpCPD makes use of the knowledge of $T$ to achieve the order optimal regret bound of $\min\big\lbrace O(\sum\limits_{i=1}^{K} \sum\limits_{g=1}^{G}\frac{\log(T/H_{1, g})}{\Delta^{opt}_{i, g}}), O(\sqrt{GT})\big\rbrace$, (where $H_{1, g}$ is the problem complexity) thereby closing an important gap with respect to the lower bound in a specific challenging setting.

Multi-Armed Bandits

A Unified Approach to Translate Classical Bandit Algorithms to the Structured Bandit Setting

no code implementations18 Oct 2018 Samarth Gupta, Shreyas Chaudhari, Subhojyoti Mukherjee, Gauri Joshi, Osman Yağan

We consider a finite-armed structured bandit problem in which mean rewards of different arms are known functions of a common hidden parameter $\theta^*$.

Thompson Sampling

Efficient-UCBV: An Almost Optimal Algorithm using Variance Estimates

no code implementations9 Nov 2017 Subhojyoti Mukherjee, K. P. Naveen, Nandan Sudarsanam, Balaraman Ravindran

We propose a novel variant of the UCB algorithm (referred to as Efficient-UCB-Variance (EUCBV)) for minimizing cumulative regret in the stochastic multi-armed bandit (MAB) setting.

Thompson Sampling

Thresholding Bandits with Augmented UCB

no code implementations7 Apr 2017 Subhojyoti Mukherjee, K. P. Naveen, Nandan Sudarsanam, Balaraman Ravindran

In this paper we propose the Augmented-UCB (AugUCB) algorithm for a fixed-budget version of the thresholding bandit problem (TBP), where the objective is to identify a set of arms whose quality is above a threshold.

Cannot find the paper you are looking for? You can Submit a new open access paper.