no code implementations • ICML 2020 • Dylan Foster, Max Simchowitz

We consider the problem of online control in a known linear dynamical system subject to adversarial noise.

1 code implementation • 1 Jul 2024 • Boyuan Chen, Diego Marti Monso, Yilun Du, Max Simchowitz, Russ Tedrake, Vincent Sitzmann

This paper presents Diffusion Forcing, a new training paradigm where a diffusion model is trained to denoise a set of tokens with independent per-token noise levels.

no code implementations • 17 Oct 2023 • Adam Block, Dylan J. Foster, Akshay Krishnamurthy, Max Simchowitz, Cyril Zhang

This work studies training instabilities of behavior cloning with deep neural networks.

1 code implementation • 2 Oct 2023 • Lirui Wang, Kaiqing Zhang, Allan Zhou, Max Simchowitz, Russ Tedrake

We show that FLEET-MERGE consolidates the behavior of policies trained on 50 tasks in the Meta-World environment, with good performance on nearly all training tasks at test time.

no code implementations • 12 Jul 2023 • Max Simchowitz, Abhishek Gupta, Kaiqing Zhang

Focusing on the special case where the labels are given by bilinear embeddings into a Hilbert space $H$: $\mathbb{E}[z \mid x, y ]=\langle f_{\star}(x), g_{\star}(y)\rangle_{{H}}$, we aim to extrapolate to a test distribution domain that is $not$ covered in training, i. e., achieving bilinear combinatorial extrapolation.

no code implementations • 16 May 2023 • Daniel Pfrommer, Max Simchowitz, Tyler Westenbroek, Nikolai Matni, Stephen Tu

A common pipeline in learning-based control is to iteratively estimate a model of system dynamics, and apply a trajectory optimization algorithm - e. g.~$\mathtt{iLQR}$ - on the learned model to minimize a target cost.

1 code implementation • 27 Apr 2023 • Aviv Netanyahu, Abhishek Gupta, Max Simchowitz, Kaiqing Zhang, Pulkit Agrawal

Machine learning systems, especially with overparameterized deep neural networks, can generalize to novel test instances drawn from the same distribution as the training data.

no code implementations • 27 Feb 2023 • Max Simchowitz, Anurag Ajay, Pulkit Agrawal, Akshay Krishnamurthy

We show that, when the class $F$ is "simpler" than $G$ (measured, e. g., in terms of its metric entropy), our predictor is more resilient to heterogeneous covariate shifts} in which the shift in $\mathbf{x}$ is much greater than that in $\mathbf{y}$.

no code implementations • 10 Feb 2023 • Adam Block, Alexander Rakhlin, Max Simchowitz

Smoothed online learning has emerged as a popular framework to mitigate the substantial loss in statistical and computational complexity that arises when one moves from classical to adversarial learning.

no code implementations • NeurIPS 2023 • Adam Block, Max Simchowitz, Russ Tedrake

The problem of piecewise affine (PWA) regression and planning is of foundational importance to the study of online learning, control, and robotics, where it provides a theoretically and empirically tractable setting to study systems undergoing sharp changes in the dynamics.

no code implementations • 25 May 2022 • Adam Block, Max Simchowitz

Due to the drastic gap in complexity between sequential and batch statistical learning, recent work has studied a smoothed sequential learning setting, where Nature is constrained to select contexts with density bounded by 1/{\sigma} with respect to a known measure {\mu}.

no code implementations • 23 Feb 2022 • Jack Umenberger, Max Simchowitz, Juan C. Perdomo, Kaiqing Zhang, Russ Tedrake

In this paper, we provide a new perspective on this challenging problem based on the notion of $\textit{informativity}$, which intuitively requires that all components of a filter's internal state are representative of the true state of the underlying dynamical system.

no code implementations • NeurIPS 2021 • Edgar Minasyan, Paula Gradu, Max Simchowitz, Elad Hazan

On the positive side, we give an efficient algorithm that attains a sublinear regret bound against the class of Disturbance Response policies up to the aforementioned system variability term.

no code implementations • 2 Feb 2022 • H. J. Terry Suh, Max Simchowitz, Kaiqing Zhang, Russ Tedrake

Differentiable simulators promise faster computation time for reinforcement learning by replacing zeroth-order gradient estimates of a stochastic objective with an estimate based on first-order gradients.

no code implementations • 26 Jan 2022 • Andrew Wagenmaker, Yifang Chen, Max Simchowitz, Simon S. Du, Kevin Jamieson

We first develop a computationally efficient algorithm for reward-free RL in a $d$-dimensional linear MDP with sample complexity scaling as $\widetilde{\mathcal{O}}(d^2 H^5/\epsilon^2)$.

no code implementations • 7 Dec 2021 • Andrew Wagenmaker, Yifang Chen, Max Simchowitz, Simon S. Du, Kevin Jamieson

Obtaining first-order regret bounds -- regret bounds scaling not as the worst-case but with some measure of the performance of the optimal policy on a given instance -- is a core question in sequential decision-making.

no code implementations • NeurIPS 2021 • Juan C. Perdomo, Jack Umenberger, Max Simchowitz

Stabilizing an unknown control system is one of the most fundamental problems in control systems engineering.

no code implementations • 5 Aug 2021 • Andrew Wagenmaker, Max Simchowitz, Kevin Jamieson

We show this is not possible -- there exists a fundamental tradeoff between achieving low regret and identifying an $\epsilon$-optimal policy at the instance-optimal rate.

no code implementations • NeurIPS 2021 • Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy, Daniel Hsu, Thodoris Lykouris, Miroslav Dudík, Robert E. Schapire

We prove that the expected reward accrued by Thompson sampling (TS) with a misspecified prior differs by at most $\tilde{\mathcal{O}}(H^2 \epsilon)$ from TS with a well specified prior, where $\epsilon$ is the total-variation distance between priors and $H$ is the learning horizon.

no code implementations • 27 Mar 2021 • Tyler Westenbroek, Max Simchowitz, Michael I. Jordan, S. Shankar Sastry

Crucially, this guarantee requires that state costs applied to the planning problems are in a certain sense `compatible' with the global geometry of the system, and a simple counter-example demonstrates the necessity of this condition.

no code implementations • 19 Mar 2021 • Juan C. Perdomo, Max Simchowitz, Alekh Agarwal, Peter Bartlett

We study the problem of adaptive control of the linear quadratic regulator for systems in very high, or even infinite dimension.

no code implementations • 28 Feb 2021 • Max Simchowitz, Aleksandrs Slivkins

How do you incentivize self-interested agents to $\textit{explore}$ when they prefer to $\textit{exploit}$?

no code implementations • 10 Feb 2021 • Andrew Wagenmaker, Max Simchowitz, Kevin Jamieson

Along the way, we establish that certainty equivalence decision making is instance- and task-optimal, and obtain the first algorithm for the linear quadratic regulator problem which is instance-optimal.

no code implementations • NeurIPS 2020 • Zakaria Mhammedi, Dylan J. Foster, Max Simchowitz, Dipendra Misra, Wen Sun, Akshay Krishnamurthy, Alexander Rakhlin, John Langford

We introduce a new algorithm, RichID, which learns a near-optimal policy for the RichLQR with sample complexity scaling only with the dimension of the latent state space and the capacity of the decoder function class.

no code implementations • NeurIPS 2020 • Max Simchowitz

Recent literature has made much progress in understanding \emph{online LQR}: a modern learning-theoretic take on the classical control problem in which a learner attempts to optimally control an unknown linear dynamical system with fully observed state, perturbed by i. i. d.

1 code implementation • NeurIPS 2020 • Kianté Brantley, Miroslav Dudik, Thodoris Lykouris, Sobhan Miryoosefi, Max Simchowitz, Aleksandrs Slivkins, Wen Sun

We propose an algorithm for tabular episodic reinforcement learning with constraints.

1 code implementation • ICML 2020 • Esther Rolf, Max Simchowitz, Sarah Dean, Lydia T. Liu, Daniel Björkegren, Moritz Hardt, Joshua Blumenstock

Our theoretical results characterize the optimal strategies in this class, bound the Pareto errors due to inaccuracies in the scores, and show an equivalence between optimal strategies and a rich class of fairness-constrained profit-maximizing policies.

no code implementations • 29 Feb 2020 • Dylan J. Foster, Max Simchowitz

We introduce a new algorithm for online linear-quadratic control in a known system subject to adversarial disturbances.

no code implementations • ICML 2020 • Chi Jin, Akshay Krishnamurthy, Max Simchowitz, Tiancheng Yu

We give an efficient algorithm that conducts $\tilde{\mathcal{O}}(S^2A\mathrm{poly}(H)/\epsilon^2)$ episodes of exploration and returns $\epsilon$-suboptimal policies for an arbitrary number of reward functions.

no code implementations • ICML 2020 • Max Simchowitz, Dylan J. Foster

Our upper bound is attained by a simple variant of $\textit{{certainty equivalent control}}$, where the learner selects control inputs according to the optimal controller for their estimate of the system while injecting exploratory random noise.

no code implementations • 25 Jan 2020 • Max Simchowitz, Karan Singh, Elad Hazan

We consider the problem of controlling a possibly unknown linear dynamical system with adversarial perturbations, adversarially chosen convex loss functions, and partially observed states, known as non-stochastic control.

no code implementations • 20 Nov 2019 • Thodoris Lykouris, Max Simchowitz, Aleksandrs Slivkins, Wen Sun

We initiate the study of multi-stage episodic reinforcement learning under adversarial corruptions in both the rewards and the transition probabilities of the underlying system extending recent results for the special case of stochastic bandits.

no code implementations • 6 Nov 2019 • Mark Braverman, Elad Hazan, Max Simchowitz, Blake Woodworth

We investigate the computational complexity of several basic linear algebra primitives, including largest eigenvector computation and linear regression, in the computational model that allows access to the data via a matrix-vector product oracle.

no code implementations • NeurIPS 2019 • Max Simchowitz, Kevin Jamieson

This paper establishes that optimistic algorithms attain gap-dependent and non-asymptotic logarithmic regret for episodic MDPs.

1 code implementation • 2 Feb 2019 • Max Simchowitz, Ross Boczar, Benjamin Recht

We analyze a simple prefiltered variation of the least squares estimator for the problem of estimation with biased, semi-parametric noise, an error model studied more broadly in causal statistics and active learning.

no code implementations • 27 Sep 2018 • Esther Rolf, David Fridovich-Keil, Max Simchowitz, Benjamin Recht, Claire Tomlin

We study an adaptive source seeking problem, in which a mobile robot must identify the strongest emitter(s) of a signal in an environment with background emissions.

no code implementations • 29 Aug 2018 • Lydia T. Liu, Max Simchowitz, Moritz Hardt

We show that under reasonable conditions, the deviation from satisfying group calibration is upper bounded by the excess risk of the learned score relative to the Bayes optimal score function.

no code implementations • 14 Aug 2018 • Max Simchowitz, Kevin Jamieson, Jordan W. Suchow, Thomas L. Griffiths

In this paper, we introduce the first principled adaptive-sampling procedure for learning a convex function in the $L_\infty$ norm, a problem that arises often in the behavioral and social sciences.

no code implementations • 24 Jul 2018 • Max Simchowitz

Minimizing a convex, quadratic objective of the form $f_{\mathbf{A},\mathbf{b}}(x) := \frac{1}{2}x^\top \mathbf{A} x - \langle \mathbf{b}, x \rangle$ for $\mathbf{A} \succ 0 $ is a fundamental problem in machine learning and optimization.

no code implementations • 4 Apr 2018 • Max Simchowitz, Ahmed El Alaoui, Benjamin Recht

We show that for every $\mathtt{gap} \in (0, 1/2]$, there exists a distribution over matrices $\mathbf{M}$ for which 1) $\mathrm{gap}_r(\mathbf{M}) = \Omega(\mathtt{gap})$ (where $\mathrm{gap}_r(\mathbf{M})$ is the normalized gap between the $r$ and $r+1$-st largest-magnitude eigenvector of $\mathbf{M}$), and 2) any algorithm $\mathsf{Alg}$ which takes fewer than $\mathrm{const} \times \frac{r \log d}{\sqrt{\mathtt{gap}}}$ queries fails (with overwhelming probability) to identity a matrix $\widehat{\mathsf{V}} \in \mathbb{R}^{d \times r}$ with orthonormal columns for which $\langle \widehat{\mathsf{V}}, \mathbf{M} \widehat{\mathsf{V}}\rangle \ge (1 - \mathrm{const} \times \mathtt{gap})\sum_{i=1}^r \lambda_i(\mathbf{M})$.

3 code implementations • ICML 2018 • Lydia T. Liu, Sarah Dean, Esther Rolf, Max Simchowitz, Moritz Hardt

Fairness in machine learning has predominantly been studied in static classification settings without concern for how decisions change the underlying population over time.

no code implementations • 22 Feb 2018 • Max Simchowitz, Horia Mania, Stephen Tu, Michael. I. Jordan, Benjamin Recht

We prove that the ordinary least-squares (OLS) estimator attains nearly minimax optimal performance for the identification of linear dynamical systems from a single observed trajectory.

no code implementations • 4 Jan 2018 • Reinhard Heckel, Max Simchowitz, Kannan Ramchandran, Martin J. Wainwright

Accordingly, we study the problem of finding approximate rankings from pairwise comparisons.

no code implementations • 20 Oct 2017 • Jason D. Lee, Ioannis Panageas, Georgios Piliouras, Max Simchowitz, Michael. I. Jordan, Benjamin Recht

We establish that first-order methods avoid saddle points for almost all initializations.

no code implementations • 14 Apr 2017 • Max Simchowitz, Ahmed El Alaoui, Benjamin Recht

We prove a \emph{query complexity} lower bound on rank-one principal component analysis (PCA).

no code implementations • 16 Feb 2017 • Max Simchowitz, Kevin Jamieson, Benjamin Recht

Moreover, our lower bounds zero-in on the number of times each \emph{individual} arm needs to be pulled, uncovering new phenomena which are drowned out in the aggregate sample complexity.

no code implementations • 9 Mar 2016 • Max Simchowitz, Kevin Jamieson, Benjamin Recht

This paper studies the Best-of-K Bandit game: At each time the player chooses a subset S among all N-choose-K possible options and observes reward max(X(i) : i in S) where X is a random vector drawn from a joint distribution.

no code implementations • 16 Feb 2016 • Jason D. Lee, Max Simchowitz, Michael. I. Jordan, Benjamin Recht

We show that gradient descent converges to a local minimizer, almost surely with random initialization.

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.