no code implementations • 10 Jul 2024 • Rosie Zhao, Depen Morwani, David Brandfonbrener, Nikhil Vyas, Sham Kakade

Training language models becomes increasingly expensive with scale, prompting numerous attempts to improve optimization efficiency.

no code implementations • 3 Jul 2024 • Kaiying Hou, David Brandfonbrener, Sham Kakade, Samy Jelassi, Eran Malach

Length generalization refers to the ability to extrapolate from short training sequences to long test sequences and is a challenge for current large language models.

no code implementations • 25 Jun 2024 • Depen Morwani, Itai Shapira, Nikhil Vyas, Eran Malach, Sham Kakade, Lucas Janson

Shampoo, a second-order optimization algorithm which uses a Kronecker product preconditioner, has recently garnered increasing attention from the machine learning community.

2 code implementations • 17 Jun 2024 • Jeffrey Li, Alex Fang, Georgios Smyrnis, Maor Ivgi, Matt Jordan, Samir Gadre, Hritik Bansal, Etash Guha, Sedrick Keh, Kushal Arora, Saurabh Garg, Rui Xin, Niklas Muennighoff, Reinhard Heckel, Jean Mercat, Mayee Chen, Suchin Gururangan, Mitchell Wortsman, Alon Albalak, Yonatan Bitton, Marianna Nezhurina, Amro Abbas, Cheng-Yu Hsieh, Dhruba Ghosh, Josh Gardner, Maciej Kilian, HANLIN ZHANG, Rulin Shao, Sarah Pratt, Sunny Sanyal, Gabriel Ilharco, Giannis Daras, Kalyani Marathe, Aaron Gokaslan, Jieyu Zhang, Khyathi Chandu, Thao Nguyen, Igor Vasiljevic, Sham Kakade, Shuran Song, Sujay Sanghavi, Fartash Faghri, Sewoong Oh, Luke Zettlemoyer, Kyle Lo, Alaaeldin El-Nouby, Hadi Pouransari, Alexander Toshev, Stephanie Wang, Dirk Groeneveld, Luca Soldaini, Pang Wei Koh, Jenia Jitsev, Thomas Kollar, Alexandros G. Dimakis, Yair Carmon, Achal Dave, Ludwig Schmidt, Vaishaal Shankar

We introduce DataComp for Language Models (DCLM), a testbed for controlled dataset experiments with the goal of improving language models.

1 code implementation • 15 Jun 2024 • David Brandfonbrener, HANLIN ZHANG, Andreas Kirsch, Jonathan Richard Schwarz, Sham Kakade

Selecting high-quality data for pre-training is crucial in shaping the downstream task performance of language models.

1 code implementation • 27 Feb 2024 • Zhenting Qi, HANLIN ZHANG, Eric Xing, Sham Kakade, Himabindu Lakkaraju

Retrieval-Augmented Generation (RAG) improves pre-trained models by incorporating external knowledge at test time to enable customized adaptation.

1 code implementation • 22 Feb 2024 • Kenneth Li, Samy Jelassi, Hugh Zhang, Sham Kakade, Martin Wattenberg, David Brandfonbrener

The idea is to learn a simple linear function on a model's embedding space that can be used to reweight candidate completions.

1 code implementation • 7 Dec 2023 • HANLIN ZHANG, Yi-Fan Zhang, Yaodong Yu, Dhruv Madeka, Dean Foster, Eric Xing, Himabindu Lakkaraju, Sham Kakade

Accurate uncertainty quantification is crucial for the safe deployment of machine learning models, and prior research has demonstrated improvements in the calibration of modern language models (LMs).

no code implementations • 13 Nov 2023 • Depen Morwani, Benjamin L. Edelman, Costin-Andrei Oncescu, Rosie Zhao, Sham Kakade

Understanding the internal representations learned by neural networks is a cornerstone challenge in the science of machine learning.

no code implementations • 26 Oct 2023 • Sohrab Andaz, Carson Eisenach, Dhruv Madeka, Kari Torkkola, Randy Jia, Dean Foster, Sham Kakade

In this paper we address the problem of learning and backtesting inventory control policies in the presence of general arrival dynamics -- which we term as a quantity-over-time arrivals model (QOT).

2 code implementations • 11 Oct 2023 • Devvrit, Sneha Kudugunta, Aditya Kusupati, Tim Dettmers, KaiFeng Chen, Inderjit Dhillon, Yulia Tsvetkov, Hannaneh Hajishirzi, Sham Kakade, Ali Farhadi, Prateek Jain

Furthermore, we observe that smaller encoders extracted from a universal MatFormer-based ViT (MatViT) encoder preserve the metric-space structure for adaptive large-scale retrieval.

no code implementations • 7 Sep 2023 • Benjamin L. Edelman, Surbhi Goel, Sham Kakade, Eran Malach, Cyril Zhang

Finally, we show that the synthetic sparse parity task can be useful as a proxy for real problems requiring axis-aligned feature learning.

no code implementations • 18 Jul 2023 • Jens Tuyls, Dhruv Madeka, Kari Torkkola, Dean Foster, Karthik Narasimhan, Sham Kakade

Inspired by recent work in Natural Language Processing (NLP) where "scaling up" has resulted in increasingly more capable LLMs, we investigate whether carefully scaling up model and data size can bring similar improvements in the imitation learning setting for single-agent games.

no code implementations • 14 Jun 2023 • Nikhil Vyas, Depen Morwani, Rosie Zhao, Gal Kaplun, Sham Kakade, Boaz Barak

The success of SGD in deep learning has been ascribed by prior works to the implicit bias induced by finite batch sizes ("SGD noise").

1 code implementation • NeurIPS 2023 • Aniket Rege, Aditya Kusupati, Sharan Ranjit S, Alan Fan, Qingqing Cao, Sham Kakade, Prateek Jain, Ali Farhadi

Finally, we demonstrate that AdANNS can enable inference-time adaptivity for compute-aware search on ANNS indices built non-adaptively on matryoshka representations.

no code implementations • 18 May 2023 • Krishna Pillutla, Vincent Roulet, Sham Kakade, Zaid Harchaoui

Gauss-Newton methods and their stochastic version have been widely used in machine learning and signal processing.

no code implementations • 21 Feb 2023 • Nikhil Vyas, Sham Kakade, Boaz Barak

There is a growing concern that learned conditional generative models may output samples that are substantially similar to some copyrighted data $C$ that was in their training set.

no code implementations • 1 Sep 2022 • Surbhi Goel, Sham Kakade, Adam Tauman Kalai, Cyril Zhang

For example, on parity problems, the NN learns as well as Gaussian elimination, an efficient algorithm that can be succinctly described.

no code implementations • 18 Jul 2022 • Boaz Barak, Benjamin L. Edelman, Surbhi Goel, Sham Kakade, Eran Malach, Cyril Zhang

There is mounting evidence of emergent phenomena in the capabilities of deep learning methods as we scale up datasets, model sizes, and training times.

3 code implementations • 26 May 2022 • Aditya Kusupati, Gantavya Bhatt, Aniket Rege, Matthew Wallingford, Aditya Sinha, Vivek Ramanujan, William Howard-Snyder, KaiFeng Chen, Sham Kakade, Prateek Jain, Ali Farhadi

The flexibility within the learned Matryoshka Representations offer: (a) up to 14x smaller embedding size for ImageNet-1K classification at the same level of accuracy; (b) up to 14x real-world speed-ups for large-scale retrieval on ImageNet-1K and 4K; and (c) up to 2% accuracy improvements for long-tail few-shot classification, all while being as robust as the original representations.

Ranked #25 on Image Classification on ObjectNet (using extra training data)

no code implementations • 8 Mar 2022 • Juan C. Perdomo, Akshay Krishnamurthy, Peter Bartlett, Sham Kakade

Offline policy evaluation is a fundamental statistical problem in reinforcement learning that involves estimating the value function of some decision-making policy given data collected by a potentially different policy.

no code implementations • 28 Feb 2022 • Nikunj Saunshi, Jordan Ash, Surbhi Goel, Dipendra Misra, Cyril Zhang, Sanjeev Arora, Sham Kakade, Akshay Krishnamurthy

Contrastive learning is a popular form of self-supervised learning that encourages augmentations (views) of the same input to have more similar representations compared to augmentations of different inputs.

1 code implementation • ICLR 2022 • Jens Tuyls, Shunyu Yao, Sham Kakade, Karthik Narasimhan

Text adventure games present unique challenges to reinforcement learning methods due to their combinatorially large action spaces and sparse rewards.

no code implementations • ICLR 2022 • Jordan T. Ash, Cyril Zhang, Surbhi Goel, Akshay Krishnamurthy, Sham Kakade

Intrinsic rewards play a central role in handling the exploration-exploitation trade-off when designing sequential decision-making algorithms, in both foundational theory and state-of-the-art deep reinforcement learning.

no code implementations • 19 Oct 2021 • Benjamin L. Edelman, Surbhi Goel, Sham Kakade, Cyril Zhang

Self-attention, an architectural motif designed to model long-range interactions in sequential data, has driven numerous recent breakthroughs in natural language processing and beyond.

no code implementations • 12 Oct 2021 • Yonathan Efroni, Sham Kakade, Akshay Krishnamurthy, Cyril Zhang

However, in practice, we often encounter systems in which a large set of state variables evolve exogenously and independently of the control inputs; such systems are only partially controllable.

1 code implementation • 30 Jun 2021 • Motoya Ohnishi, Isao Ishikawa, Kendall Lowrey, Masahiro Ikeda, Sham Kakade, Yoshinobu Kawahara

In this work, we present a novel paradigm of controlling nonlinear systems via the minimization of the Koopman spectrum cost: a cost over the Koopman operator of the controlled dynamics.

1 code implementation • NeurIPS 2021 • Jordan T. Ash, Surbhi Goel, Akshay Krishnamurthy, Sham Kakade

There is an increasing need for effective active learning algorithms that are compatible with deep neural networks.

1 code implementation • NeurIPS 2021 • Aditya Kusupati, Matthew Wallingford, Vivek Ramanujan, Raghav Somani, Jae Sung Park, Krishna Pillutla, Prateek Jain, Sham Kakade, Ali Farhadi

We further quantitatively measure the quality of our codes by applying it to the efficient image retrieval as well as out-of-distribution (OOD) detection problems.

1 code implementation • NeurIPS 2021 • Xiyang Liu, Weihao Kong, Sham Kakade, Sewoong Oh

In statistical learning and analysis from shared data, which is increasingly widely adopted in platforms such as federated learning and meta-learning, there are two major concerns: privacy and robustness.

no code implementations • NeurIPS 2020 • Ruosong Wang, Simon S. Du, Lin Yang, Sham Kakade

In a COLT 2018 open problem, Jiang and Agarwal conjectured that, for tabular, episodic reinforcement learning problems, there exists a sample complexity lower bound which exhibits a polynomial dependence on the horizon --- a conjecture which is consistent with all known sample complexity upper bounds.

no code implementations • 12 Oct 2020 • Yu Bai, Minshuo Chen, Pan Zhou, Tuo Zhao, Jason D. Lee, Sham Kakade, Huan Wang, Caiming Xiong

A common practice in meta-learning is to perform a train-validation split (\emph{train-val method}) where the prior adapts to the task on one split of the data, and the resulting predictor is evaluated on another split.

1 code implementation • NeurIPS 2020 • Alekh Agarwal, Mikael Henaff, Sham Kakade, Wen Sun

Direct policy gradient methods for reinforcement learning are a successful approach for a variety of reasons: they are model free, they directly optimize the performance metric of interest, and they allow for richly parameterized policies.

1 code implementation • NeurIPS 2020 • Sham Kakade, Akshay Krishnamurthy, Kendall Lowrey, Motoya Ohnishi, Wen Sun

This work studies the problem of sequential control in an unknown, nonlinear dynamical system, where we model the underlying system dynamics as an unknown function in a known Reproducing Kernel Hilbert Space.

no code implementations • NeurIPS 2020 • Alekh Agarwal, Sham Kakade, Akshay Krishnamurthy, Wen Sun

In order to deal with the curse of dimensionality in reinforcement learning (RL), it is common practice to make parametric assumptions where values or policies are functions of some low dimensional feature space.

no code implementations • NeurIPS 2020 • Weihao Kong, Raghav Somani, Sham Kakade, Sewoong Oh

Together, this approach is robust against outliers and achieves a graceful statistical trade-off; the lack of $\Omega(k^{1/2})$-size tasks can be compensated for with smaller tasks, which can now be as small as $O(\log k)$.

no code implementations • ICLR 2021 • Preetum Nakkiran, Prayaag Venkat, Sham Kakade, Tengyu Ma

Recent empirical and theoretical studies have shown that many learning algorithms -- from linear regression to neural networks -- can have test performance that is non-monotonic in quantities such the sample size and model size.

1 code implementation • ICML 2020 • Colin Wei, Sham Kakade, Tengyu Ma

This implicit regularization effect is analogous to the effect of stochasticity in small mini-batch stochastic gradient descent.

no code implementations • ICML 2020 • Sanjeev Arora, Simon S. Du, Sham Kakade, Yuping Luo, Nikunj Saunshi

We formulate representation learning as a bi-level optimization problem where the "outer" optimization tries to learn the joint representation and the "inner" optimization encodes the imitation learning setup and tries to learn task-specific parameters.

no code implementations • ICML 2020 • Weihao Kong, Raghav Somani, Zhao Song, Sham Kakade, Sewoong Oh

In modern supervised learning, there are a large number of tasks, but many of them are associated with only a small amount of labeled data.

1 code implementation • ICML 2020 • Aditya Kusupati, Vivek Ramanujan, Raghav Somani, Mitchell Wortsman, Prateek Jain, Sham Kakade, Ali Farhadi

Sparsity in Deep Neural Networks (DNNs) is studied extensively with the focus of maximizing prediction accuracy given an overall parameter budget.

6 code implementations • NeurIPS 2019 • Aravind Rajeswaran, Chelsea Finn, Sham Kakade, Sergey Levine

By drawing upon implicit differentiation, we develop the implicit MAML algorithm, which depends only on the solution to the inner level optimization and not the path taken by the inner loop optimizer.

no code implementations • 10 Jun 2019 • Alekh Agarwal, Sham Kakade, Lin F. Yang

In this work, we study the effectiveness of the most natural plug-in approach to model-based planning: we build the maximum likelihood estimate of the transition model in the MDP from observations and then find an optimal policy in this empirical MDP.

Model-based Reinforcement Learning
reinforcement-learning
**+1**

no code implementations • ICLR Workshop LLD 2019 • Chelsea Finn, Aravind Rajeswaran, Sham Kakade, Sergey Levine

Meta-learning views this problem as learning a prior over model parameters that is amenable for fast adaptation on a new task, but typically assumes the set of tasks are available together as a batch.

no code implementations • ICLR 2019 • Kendall Lowrey, Aravind Rajeswaran, Sham Kakade, Emanuel Todorov, Igor Mordatch

We study how local trajectory optimization can cope with approximation errors in the value function, and can stabilize and accelerate value function learning.

no code implementations • 23 Sep 2018 • Sham Kakade, Jason D. Lee

The Cheap Gradient Principle (Griewank 2008) --- the computational cost of computing the gradient of a scalar-valued function is nearly the same (often within a factor of $5$) as that of simply computing the function itself --- is of central importance in optimization; it allows us to quickly obtain (high dimensional) gradients of scalar loss functions which are subsequently used in black box gradient-based optimization procedures.

1 code implementation • 20 Apr 2018 • Damek Davis, Dmitriy Drusvyatskiy, Sham Kakade, Jason D. Lee

This work considers the question: what convergence guarantees does the stochastic subgradient method have in the absence of smoothness and convexity?

no code implementations • ICLR 2018 • Cathy Wu, Aravind Rajeswaran, Yan Duan, Vikash Kumar, Alexandre M. Bayen, Sham Kakade, Igor Mordatch, Pieter Abbeel

To mitigate this issue, we derive a bias-free action-dependent baseline for variance reduction which fully exploits the structural form of the stochastic policy itself and does not make any additional assumptions about the MDP.

no code implementations • 26 Feb 2018 • Sham Kakade, Mengdi Wang, Lin F. Yang

There is a technical issue in the analysis that is not easily fixable.

no code implementations • 22 Nov 2017 • Naman Agarwal, Sham Kakade, Rahul Kidambi, Yin Tat Lee, Praneeth Netrapalli, Aaron Sidford

Given a matrix $\mathbf{A}\in\mathbb{R}^{n\times d}$ and a vector $b \in\mathbb{R}^{d}$, we show how to compute an $\epsilon$-approximate solution to the regression problem $ \min_{x\in\mathbb{R}^{d}}\frac{1}{2} \|\mathbf{A} x - b\|_{2}^{2} $ in time $ \tilde{O} ((n+\sqrt{d\cdot\kappa_{\text{sum}}})\cdot s\cdot\log\epsilon^{-1}) $ where $\kappa_{\text{sum}}=\mathrm{tr}\left(\mathbf{A}^{\top}\mathbf{A}\right)/\lambda_{\min}(\mathbf{A}^{T}\mathbf{A})$ and $s$ is the maximum number of non-zero entries in a row of $\mathbf{A}$.

no code implementations • NeurIPS 2017 • Vatsal Sharan, Sham Kakade, Percy Liang, Gregory Valiant

On the other hand, we show that learning is impossible given only a polynomial number of samples for HMMs with a small output alphabet and whose transition matrices are random regular graphs with large degree.

1 code implementation • NeurIPS 2017 • Aravind Rajeswaran, Kendall Lowrey, Emanuel Todorov, Sham Kakade

This work shows that policies with simple linear and RBF parameterizations can be trained to solve a variety of continuous control tasks, including the OpenAI gym benchmarks.

no code implementations • 8 Dec 2016 • Vatsal Sharan, Sham Kakade, Percy Liang, Gregory Valiant

For a Hidden Markov Model with $n$ hidden states, $I$ is bounded by $\log n$, a quantity that does not depend on the mixing time, and we show that the trivial prediction algorithm based on the empirical frequencies of length $O(\log n/\epsilon)$ windows of observations achieves this error, provided the length of the sequence is $d^{\Omega(\log n/\epsilon)}$, where $d$ is the size of the observation alphabet.

2 code implementations • 29 Nov 2016 • John Thickstun, Zaid Harchaoui, Sham Kakade

This paper introduces a new large-scale music dataset, MusicNet, to serve as a source of supervision and evaluation of machine learning methods for music research.

Ranked #6 on Music Transcription on MusicNet

no code implementations • NeurIPS 2015 • Kamalika Chaudhuri, Sham Kakade, Praneeth Netrapalli, Sujay Sanghavi

Provided certain conditions hold on the model class, we provide a two-stage active learning algorithm for this problem.

no code implementations • 13 Feb 2015 • David Belanger, Sham Kakade

Finally, the Kalman filter updates can be seen as a linear recurrent neural network.

no code implementations • 13 Nov 2014 • Qingqing Huang, Rong Ge, Sham Kakade, Munther Dahleh

Consider a stationary discrete random process with alphabet size d, which is assumed to be the output process of an unknown stationary Hidden Markov Model (HMM).

no code implementations • NeurIPS 2013 • Animashree Anandkumar, Daniel Hsu, Majid Janzamin, Sham Kakade

This set of higher-order expansion conditions allow for overcomplete models, and require the existence of a perfect matching from latent topics to higher order observed words.

no code implementations • NeurIPS 2010 • Alex Strehl, John Langford, Sham Kakade, Lihong Li

We provide a sound and consistent foundation for the use of \emph{nonrandom} exploration data in "contextual bandit" or "partially labeled" settings where only the value of a chosen action is learned.

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.