no code implementations • 13 Oct 2023 • Debangshu Banerjee, Aditya Gopalan
Parametric, feature-based reward models are employed by a variety of algorithms in decision-making settings such as bandits and Markov decision processes (MDPs).
1 code implementation • 31 May 2023 • Shubham Ugare, Tarun Suresh, Debangshu Banerjee, Gagandeep Singh, Sasa Misailovic
We experimentally demonstrate the effectiveness of our approach, showing up to 3x certification speedup over the certification that applies randomized smoothing of the approximate model from scratch.
2 code implementations • 4 Apr 2023 • Shubham Ugare, Debangshu Banerjee, Sasa Misailovic, Gagandeep Singh
Complete verification of deep neural networks (DNNs) can exactly determine whether the DNN satisfies a desired trustworthy property (e. g., robustness, fairness) on an infinite set of inputs or not.
no code implementations • 31 Jan 2023 • Debangshu Banerjee, Avaljot Singh, Gagandeep Singh
In recent years numerous methods have been developed to formally verify the robustness of deep neural networks (DNNs).
no code implementations • 9 Jan 2023 • Debangshu Banerjee, Aditya Gopalan
As noted in the works of \cite{lattimore2020bandit}, it has been mentioned that it is an open problem to characterize the minimax regret of linear bandits in a wide variety of action spaces.
no code implementations • 7 Jan 2023 • Debangshu Banerjee
Given $X_1,\cdot , X_N$ random variables whose joint distribution is given as $\mu$ we will use the Martingale Method to show any Lipshitz Function $f$ over these random variables is subgaussian.
no code implementations • 23 Jul 2022 • Debangshu Banerjee, Avishek Ghosh, Sayak Ray Chowdhury, Aditya Gopalan
Furthermore, while the previous result is shown to hold only in the asymptotic regime (as $n \to \infty$), our result for these "locally rich" action spaces is any-time.
no code implementations • 19 Jan 2022 • Debangshu Banerjee, Kavita Wagh
This algorithm tracks the Projected Bellman Algorithm and is therefore different from the class of residual algorithms.
no code implementations • 11 Aug 2020 • Debangshu Banerjee
Instead what we use is reinforcement learning through self play and approximations through neural networks to by pass the problem of high branching factor and maintaining large tables for state-action evaluations.