Search Results for author: Rahul Kidambi

Found 23 papers, 10 papers with code

The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning Rate Procedure For Least Squares

1 code implementation • NeurIPS 2019 • Rong Ge, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli

First, this work shows that even if the time horizon T (i. e. the number of iterations SGD is run for) is known in advance, SGD's final iterate behavior with any polynomially decaying learning rate scheme is highly sub-optimal compared to the minimax rate (by a condition number factor in the strongly convex case and a factor of $\sqrt{T}$ in the non-strongly convex case).

Stochastic Optimization

619

Paper
Code

Parallelizing Stochastic Gradient Descent for Least Squares Regression: mini-batching, averaging, and model misspecification

1 code implementation • 12 Oct 2016 • Prateek Jain, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli, Aaron Sidford

In particular, this work provides a sharp analysis of: (1) mini-batching, a method of averaging many samples of a stochastic gradient to both reduce the variance of the stochastic gradient estimate and for parallelizing SGD and (2) tail-averaging, a method involving averaging the final few iterates of SGD to decrease the variance in SGD's final iterate.

regression

210

Paper
Code

On the insufficiency of existing momentum schemes for Stochastic Optimization

2 code implementations • ICLR 2018 • Rahul Kidambi, Praneeth Netrapalli, Prateek Jain, Sham M. Kakade

Extensive empirical results in this paper show that ASGD has performance gains over HB, NAG, and SGD.

Stochastic Optimization

210

Paper
Code

MOReL : Model-Based Offline Reinforcement Learning

2 code implementations • 12 May 2020 • Rahul Kidambi, Aravind Rajeswaran, Praneeth Netrapalli, Thorsten Joachims

In this work, we present MOReL, an algorithmic framework for model-based offline RL.

Offline RL reinforcement-learning +1

Paper
Code

Mitigating Covariate Shift in Imitation Learning via Offline Data Without Great Coverage

1 code implementation • NeurIPS 2021 • Jonathan D. Chang, Masatoshi Uehara, Dhruv Sreenivas, Rahul Kidambi, Wen Sun

Instead, the learner is presented with a static offline dataset of state-action-next state transition triples from a potentially less proficient behavior policy.

Continuous Control Imitation Learning

Paper
Code

Mitigating Covariate Shift in Imitation Learning via Offline Data With Partial Coverage

1 code implementation • NeurIPS 2021 • Jonathan Daniel Chang, Masatoshi Uehara, Dhruv Sreenivas, Rahul Kidambi, Wen Sun

Instead, the learner is presented with a static offline dataset of state-action-next state triples from a potentially less proficient behavior policy.

Continuous Control Imitation Learning

Paper
Code

Making Paper Reviewing Robust to Bid Manipulation Attacks

1 code implementation • 9 Feb 2021 • Ruihan Wu, Chuan Guo, Felix Wu, Rahul Kidambi, Laurens van der Maaten, Kilian Q. Weinberger

We develop a novel approach for paper bidding and assignment that is much more robust against such attacks.

Paper
Code

Top-$k$ eXtreme Contextual Bandits with Arm Hierarchy

1 code implementation • 15 Feb 2021 • Rajat Sen, Alexander Rakhlin, Lexing Ying, Rahul Kidambi, Dean Foster, Daniel Hill, Inderjit Dhillon

We show that our algorithm has a regret guarantee of $O(k\sqrt{(A-k+1)T \log (|\mathcal{F}|T)})$, where $A$ is the total number of arms and $\mathcal{F}$ is the class containing the regression function, while only requiring $\tilde{O}(A)$ computation per time step.

Computational Efficiency Extreme Multi-Label Classification +2

Paper
Code

MobILE: Model-Based Imitation Learning From Observation Alone

1 code implementation • NeurIPS 2021 • Rahul Kidambi, Jonathan Chang, Wen Sun

This paper studies Imitation Learning from Observations alone (ILFO) where the learner is presented with expert demonstrations that consist only of states visited by an expert (without access to actions taken by the expert).

Imitation Learning OpenAI Gym

Paper
Code

Enhancing Group Fairness in Online Settings Using Oblique Decision Forests

2 code implementations • 17 Oct 2023 • Somnath Basu Roy Chowdhury, Nicholas Monath, Ahmad Beirami, Rahul Kidambi, Avinava Dubey, Amr Ahmed, Snigdha Chaturvedi

In the online setting, where the algorithm has access to a single instance at a time, estimating the group fairness objective requires additional storage and significantly more computation (e. g., forward/backward passes) than the task-specific objective at every time step.

Fairness

Paper
Code

Leverage Score Sampling for Faster Accelerated Regression and ERM

no code implementations • 22 Nov 2017 • Naman Agarwal, Sham Kakade, Rahul Kidambi, Yin Tat Lee, Praneeth Netrapalli, Aaron Sidford

Given a matrix $\mathbf{A}\in\mathbb{R}^{n\times d}$ and a vector $b \in\mathbb{R}^{d}$, we show how to compute an $\epsilon$-approximate solution to the regression problem $ \min_{x\in\mathbb{R}^{d}}\frac{1}{2} \|\mathbf{A} x - b\|_{2}^{2} $ in time $ \tilde{O} ((n+\sqrt{d\cdot\kappa_{\text{sum}}})\cdot s\cdot\log\epsilon^{-1}) $ where $\kappa_{\text{sum}}=\mathrm{tr}\left(\mathbf{A}^{\top}\mathbf{A}\right)/\lambda_{\min}(\mathbf{A}^{T}\mathbf{A})$ and $s$ is the maximum number of non-zero entries in a row of $\mathbf{A}$.

regression

Paper
Add Code

Efficient Estimation of Generalization Error and Bias-Variance Components of Ensembles

no code implementations • 15 Nov 2017 • Dhruv Mahajan, Vivek Gupta, S. Sathiya Keerthi, Sellamanickam Sundararajan, Shravan Narayanamurthy, Rahul Kidambi

We also demonstrate their usefulness in making design choices such as the number of classifiers in the ensemble and the size of a subset of data used for training that is needed to achieve a certain value of generalization error.

Paper
Add Code

A Markov Chain Theory Approach to Characterizing the Minimax Optimality of Stochastic Gradient Descent (for Least Squares)

no code implementations • 25 Oct 2017 • Prateek Jain, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli, Venkata Krishna Pillutla, Aaron Sidford

This work provides a simplified proof of the statistical minimax optimality of (iterate averaged) stochastic gradient descent (SGD), for the special case of least squares.

Paper
Add Code

Accelerating Stochastic Gradient Descent For Least Squares Regression

no code implementations • 26 Apr 2017 • Prateek Jain, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli, Aaron Sidford

There is widespread sentiment that it is not possible to effectively utilize fast gradient methods (e. g. Nesterov's acceleration, conjugate gradient, heavy ball) for the purposes of stochastic optimization due to their instability and error accumulation, a notion made precise in d'Aspremont 2008 and Devolder, Glineur, and Nesterov 2014.

regression Stochastic Optimization

Paper
Add Code

Submodular Hamming Metrics

no code implementations • NeurIPS 2015 • Jennifer Gillenwater, Rishabh Iyer, Bethany Lusch, Rahul Kidambi, Jeff Bilmes

We show that there is a largely unexplored class of functions (positive polymatroids) that can define proper discrete metrics over pairs of binary vectors and that are fairly tractable to optimize over.

Clustering

Paper
Add Code

A Quantitative Evaluation Framework for Missing Value Imputation Algorithms

no code implementations • 10 Nov 2013 • Vinod Nair, Rahul Kidambi, Sundararajan Sellamanickam, S. Sathiya Keerthi, Johannes Gehrke, Vijay Narayanan

We consider the problem of quantitatively evaluating missing value imputation algorithms.

Imputation

Paper
Add Code

A Structured Prediction Approach for Missing Value Imputation

no code implementations • 9 Nov 2013 • Rahul Kidambi, Vinod Nair, Sundararajan Sellamanickam, S. Sathiya Keerthi

In this paper we propose a structured output approach for missing value imputation that also incorporates domain constraints.

Imputation Structured Prediction

Paper
Add Code

Rethinking learning rate schedules for stochastic optimization

no code implementations • ICLR 2019 • Rong Ge, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli

One plausible explanation is that non-convex neural network training procedures are better suited to the use of fundamentally different learning rate schedules, such as the ``cut the learning rate every constant number of epochs'' method (which more closely resembles an exponentially decaying learning rate schedule); note that this widely used schedule is in stark contrast to the polynomial decay schemes prescribed in the stochastic approximation literature, which are indeed shown to be (worst case) optimal for classes of convex optimization problems.

Stochastic Optimization

Paper
Add Code

MOReL: Model-Based Offline Reinforcement Learning

no code implementations • NeurIPS 2020 • Rahul Kidambi, Aravind Rajeswaran, Praneeth Netrapalli, Thorsten Joachims

In this work, we present MOReL, an algorithmic framework for model-based offline RL.

Offline RL reinforcement-learning +1

Paper
Add Code

Optimism is All You Need: Model-Based Imitation Learning From Observation Alone

no code implementations • ICLR Workshop SSL-RL 2021 • Rahul Kidambi, Jonathan Daniel Chang, Wen Sun

This paper studies Imitation Learning from Observations alone (ILFO) where the learner is presented with expert demonstrations that only consist of states encountered by an expert (without access to actions taken by the expert).

Imitation Learning OpenAI Gym

Paper
Add Code

Counterfactual Learning To Rank for Utility-Maximizing Query Autocompletion

no code implementations • 22 Apr 2022 • Adam Block, Rahul Kidambi, Daniel N. Hill, Thorsten Joachims, Inderjit S. Dhillon

A shortcoming of this approach is that users often do not know which query will provide the best retrieval performance on the current information retrieval system, meaning that any query autocompletion methods trained to mimic user behavior can lead to suboptimal query suggestions.

counterfactual Information Retrieval +2

Paper
Add Code

A Minimaximalist Approach to Reinforcement Learning from Human Feedback

no code implementations • 8 Jan 2024 • Gokul Swamy, Christoph Dann, Rahul Kidambi, Zhiwei Steven Wu, Alekh Agarwal

Our approach is maximalist in that it provably handles non-Markovian, intransitive, and stochastic preferences while being robust to the compounding errors that plague offline approaches to sequential prediction.

Continuous Control reinforcement-learning

Paper
Add Code

Auctions with LLM Summaries

no code implementations • 11 Apr 2024 • Kumar Avinava Dubey, Zhe Feng, Rahul Kidambi, Aranyak Mehta, Di Wang

We study an auction setting in which bidders bid for placement of their content within a summary generated by a large language model (LLM), e. g., an ad auction in which the display is a summary paragraph of multiple ads.

Language Modelling Large Language Model +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.