Search Results for author: Gaurav Mahajan

Found 13 papers, 0 papers with code

Learning Hidden Markov Models Using Conditional Samples

no code implementations28 Feb 2023 Sham M. Kakade, Akshay Krishnamurthy, Gaurav Mahajan, Cyril Zhang

In this paper, we depart from this setup and consider an interactive access model, in which the algorithm can query for samples from the conditional distributions of the HMMs.

Time Series Time Series Analysis

Exponential Hardness of Reinforcement Learning with Linear Function Approximation

no code implementations25 Feb 2023 Daniel Kane, Sihan Liu, Shachar Lovett, Gaurav Mahajan, Csaba Szepesvári, Gellért Weisz

The rewards in this game are chosen such that if the learner achieves large reward, then the learner's actions can be used to simulate solving a variant of 3-SAT, where (a) each variable shows up in a bounded number of clauses (b) if an instance has no solutions then it also has no solutions that satisfy more than (1-$\epsilon$)-fraction of clauses.

Learning Theory reinforcement-learning +1

Do PAC-Learners Learn the Marginal Distribution?

no code implementations13 Feb 2023 Max Hopkins, Daniel M. Kane, Shachar Lovett, Gaurav Mahajan

We study a foundational variant of Valiant and Vapnik and Chervonenkis' Probably Approximately Correct (PAC)-Learning in which the adversary is restricted to a known family of marginal distributions $\mathscr{P}$.

PAC learning

Convergence of online $k$-means

no code implementations22 Feb 2022 Sanjoy Dasgupta, Gaurav Mahajan, Geelon So

We prove asymptotic convergence for a general class of $k$-means algorithms performed over streaming data from a distribution: the centers asymptotically converge to the set of stationary points of the $k$-means cost function.

Computational-Statistical Gaps in Reinforcement Learning

no code implementations11 Feb 2022 Daniel Kane, Sihan Liu, Shachar Lovett, Gaurav Mahajan

In this work, we make progress on this open problem by presenting the first computational lower bound for RL with linear function approximation: unless NP=RP, no randomized polynomial time algorithm exists for deterministic transition MDPs with a constant number of actions and linear optimal value functions.

reinforcement-learning Reinforcement Learning (RL)

Learning what to remember

no code implementations11 Jan 2022 Robi Bhattacharjee, Gaurav Mahajan

We consider a lifelong learning scenario in which a learner faces a neverending and arbitrary stream of facts and has to decide which ones to retain in its limited memory.

Realizable Learning is All You Need

no code implementations8 Nov 2021 Max Hopkins, Daniel M. Kane, Shachar Lovett, Gaurav Mahajan

The equivalence of realizable and agnostic learnability is a fundamental phenomenon in learning theory.

Learning Theory PAC learning

Bilinear Classes: A Structural Framework for Provable Generalization in RL

no code implementations19 Mar 2021 Simon S. Du, Sham M. Kakade, Jason D. Lee, Shachar Lovett, Gaurav Mahajan, Wen Sun, Ruosong Wang

The framework incorporates nearly all existing models in which a polynomial sample complexity is achievable, and, notably, also includes new models, such as the Linear $Q^*/V^*$ model in which both the optimal $Q$-function and the optimal $V$-function are linear in some known feature space.

Agnostic $Q$-learning with Function Approximation in Deterministic Systems: Near-Optimal Bounds on Approximation Error and Sample Complexity

no code implementations NeurIPS 2020 Simon S. Du, Jason D. Lee, Gaurav Mahajan, Ruosong Wang

The current paper studies the problem of agnostic $Q$-learning with function approximation in deterministic systems where the optimal $Q$-function is approximable by a function in the class $\mathcal{F}$ with approximation error $\delta \ge 0$.

Q-Learning

Point Location and Active Learning: Learning Halfspaces Almost Optimally

no code implementations23 Apr 2020 Max Hopkins, Daniel M. Kane, Shachar Lovett, Gaurav Mahajan

Given a finite set $X \subset \mathbb{R}^d$ and a binary linear classifier $c: \mathbb{R}^d \to \{0, 1\}$, how many queries of the form $c(x)$ are required to learn the label of every point in $X$?

Active Learning Position

Agnostic Q-learning with Function Approximation in Deterministic Systems: Tight Bounds on Approximation Error and Sample Complexity

no code implementations17 Feb 2020 Simon S. Du, Jason D. Lee, Gaurav Mahajan, Ruosong Wang

2) In conjunction with the lower bound in [Wen and Van Roy, NIPS 2013], our upper bound suggests that the sample complexity $\widetilde{\Theta}\left(\mathrm{dim}_E\right)$ is tight even in the agnostic setting.

Q-Learning

Noise-tolerant, Reliable Active Classification with Comparison Queries

no code implementations15 Jan 2020 Max Hopkins, Daniel Kane, Shachar Lovett, Gaurav Mahajan

With the explosion of massive, widely available unlabeled data in the past years, finding label and time efficient, robust learning algorithms has become ever more important in theory and in practice.

Active Learning Classification +1

On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift

no code implementations1 Aug 2019 Alekh Agarwal, Sham M. Kakade, Jason D. Lee, Gaurav Mahajan

Policy gradient methods are among the most effective methods in challenging reinforcement learning problems with large state and/or action spaces.

Policy Gradient Methods

Cannot find the paper you are looking for? You can Submit a new open access paper.