no code implementations • 30 Jun 2023 • Samarth Gupta, Saurabh Amin
(2) How can we computationally handle both nonlinear ODE constraints and parameter uncertainties for a generic stochastic optimization problem for resource allocation?
no code implementations • 3 Aug 2022 • Samarth Gupta, Daniel N. Hill, Lexing Ying, Inderjit Dhillon
Due to noise, the policy learnedfrom the estimated model is often far from the optimal policy of the underlying model.
no code implementations • 5 Oct 2021 • Haoya Li, Samarth Gupta, HsiangFu Yu, Lexing Ying, Inderjit Dhillon
This paper proposes an approximate Newton method for the policy gradient algorithm with entropy regularization.
no code implementations • 10 Sep 2021 • Samarth Gupta, Gauri Joshi, Osman Yağan
In this paper we consider the problem of best-arm identification in multi-armed bandits in the fixed confidence setting, where the goal is to identify, with probability $1-\delta$ for some $\delta>0$, the arm with the highest mean reward in minimum possible samples from the set of arms $\mathcal{K}$.
no code implementations • 14 Dec 2020 • Yae Jee Cho, Samarth Gupta, Gauri Joshi, Osman Yağan
Due to communication constraints and intermittent client availability in federated learning, only a subset of clients can participate in each training round.
no code implementations • 30 Oct 2020 • Samarth Gupta, Saurabh Amin
We also estimate the adversarial accuracy of our ECOC-based classifiers in a white-box setting.
2 code implementations • 6 Nov 2019 • Samarth Gupta, Shreyas Chaudhari, Gauri Joshi, Osman Yağan
We consider a multi-armed bandit framework where the rewards obtained by pulling different arms are correlated.
no code implementations • 18 Oct 2018 • Samarth Gupta, Shreyas Chaudhari, Subhojyoti Mukherjee, Gauri Joshi, Osman Yağan
We consider a finite-armed structured bandit problem in which mean rewards of different arms are known functions of a common hidden parameter $\theta^*$.
2 code implementations • 17 Aug 2018 • Samarth Gupta, Gauri Joshi, Osman Yağan
As a result, there are regimes where our algorithm achieves a $\mathcal{O}(1)$ regret as opposed to the typical logarithmic regret scaling of multi-armed bandit algorithms.
no code implementations • 16 Aug 2018 • Samarth Gupta, Gauri Joshi, Osman Yağan
At each time step, we choose one of the possible $K$ functions, $g_1, \ldots, g_K$ and observe the corresponding sample $g_i(X)$.
no code implementations • 8 Sep 2016 • Samarth Gupta, Sharayu Moharir
We propose a Markovian request model to capture the time-correlation in user requests and show that our model is consistent with the observations of existing empirical studies.
Networking and Internet Architecture