Search Results for author: Max Simchowitz

Found 48 papers, 7 papers with code

Logarithmic Regret for Online Control with Adversarial Noise

no code implementations ICML 2020 Dylan Foster, Max Simchowitz

We consider the problem of online control in a known linear dynamical system subject to adversarial noise.

LEMMA

Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion

1 code implementation1 Jul 2024 Boyuan Chen, Diego Marti Monso, Yilun Du, Max Simchowitz, Russ Tedrake, Vincent Sitzmann

This paper presents Diffusion Forcing, a new training paradigm where a diffusion model is trained to denoise a set of tokens with independent per-token noise levels.

Decision Making

Robot Fleet Learning via Policy Merging

1 code implementation2 Oct 2023 Lirui Wang, Kaiqing Zhang, Allan Zhou, Max Simchowitz, Russ Tedrake

We show that FLEET-MERGE consolidates the behavior of policies trained on 50 tasks in the Meta-World environment, with good performance on nearly all training tasks at test time.

Robot Manipulation

Tackling Combinatorial Distribution Shift: A Matrix Completion Perspective

no code implementations12 Jul 2023 Max Simchowitz, Abhishek Gupta, Kaiqing Zhang

Focusing on the special case where the labels are given by bilinear embeddings into a Hilbert space $H$: $\mathbb{E}[z \mid x, y ]=\langle f_{\star}(x), g_{\star}(y)\rangle_{{H}}$, we aim to extrapolate to a test distribution domain that is $not$ covered in training, i. e., achieving bilinear combinatorial extrapolation.

Matrix Completion

The Power of Learned Locally Linear Models for Nonlinear Policy Optimization

no code implementations16 May 2023 Daniel Pfrommer, Max Simchowitz, Tyler Westenbroek, Nikolai Matni, Stephen Tu

A common pipeline in learning-based control is to iteratively estimate a model of system dynamics, and apply a trajectory optimization algorithm - e. g.~$\mathtt{iLQR}$ - on the learned model to minimize a target cost.

Learning to Extrapolate: A Transductive Approach

1 code implementation27 Apr 2023 Aviv Netanyahu, Abhishek Gupta, Max Simchowitz, Kaiqing Zhang, Pulkit Agrawal

Machine learning systems, especially with overparameterized deep neural networks, can generalize to novel test instances drawn from the same distribution as the training data.

Imitation Learning

Statistical Learning under Heterogeneous Distribution Shift

no code implementations27 Feb 2023 Max Simchowitz, Anurag Ajay, Pulkit Agrawal, Akshay Krishnamurthy

We show that, when the class $F$ is "simpler" than $G$ (measured, e. g., in terms of its metric entropy), our predictor is more resilient to heterogeneous covariate shifts} in which the shift in $\mathbf{x}$ is much greater than that in $\mathbf{y}$.

Oracle-Efficient Smoothed Online Learning for Piecewise Continuous Decision Making

no code implementations10 Feb 2023 Adam Block, Alexander Rakhlin, Max Simchowitz

Smoothed online learning has emerged as a popular framework to mitigate the substantial loss in statistical and computational complexity that arises when one moves from classical to adversarial learning.

Decision Making Econometrics

Smoothed Online Learning for Prediction in Piecewise Affine Systems

no code implementations NeurIPS 2023 Adam Block, Max Simchowitz, Russ Tedrake

The problem of piecewise affine (PWA) regression and planning is of foundational importance to the study of online learning, control, and robotics, where it provides a theoretically and empirically tractable setting to study systems undergoing sharp changes in the dynamics.

Efficient and Near-Optimal Smoothed Online Learning for Generalized Linear Functions

no code implementations25 May 2022 Adam Block, Max Simchowitz

Due to the drastic gap in complexity between sequential and batch statistical learning, recent work has studied a smoothed sequential learning setting, where Nature is constrained to select contexts with density bounded by 1/{\sigma} with respect to a known measure {\mu}.

Globally Convergent Policy Search over Dynamic Filters for Output Estimation

no code implementations23 Feb 2022 Jack Umenberger, Max Simchowitz, Juan C. Perdomo, Kaiqing Zhang, Russ Tedrake

In this paper, we provide a new perspective on this challenging problem based on the notion of $\textit{informativity}$, which intuitively requires that all components of a filter's internal state are representative of the true state of the underlying dynamical system.

Online Control of Unknown Time-Varying Dynamical Systems

no code implementations NeurIPS 2021 Edgar Minasyan, Paula Gradu, Max Simchowitz, Elad Hazan

On the positive side, we give an efficient algorithm that attains a sublinear regret bound against the class of Disturbance Response policies up to the aforementioned system variability term.

Do Differentiable Simulators Give Better Policy Gradients?

no code implementations2 Feb 2022 H. J. Terry Suh, Max Simchowitz, Kaiqing Zhang, Russ Tedrake

Differentiable simulators promise faster computation time for reinforcement learning by replacing zeroth-order gradient estimates of a stochastic objective with an estimate based on first-order gradients.

Reward-Free RL is No Harder Than Reward-Aware RL in Linear Markov Decision Processes

no code implementations26 Jan 2022 Andrew Wagenmaker, Yifang Chen, Max Simchowitz, Simon S. Du, Kevin Jamieson

We first develop a computationally efficient algorithm for reward-free RL in a $d$-dimensional linear MDP with sample complexity scaling as $\widetilde{\mathcal{O}}(d^2 H^5/\epsilon^2)$.

Reinforcement Learning (RL)

First-Order Regret in Reinforcement Learning with Linear Function Approximation: A Robust Estimation Approach

no code implementations7 Dec 2021 Andrew Wagenmaker, Yifang Chen, Max Simchowitz, Simon S. Du, Kevin Jamieson

Obtaining first-order regret bounds -- regret bounds scaling not as the worst-case but with some measure of the performance of the optimal policy on a given instance -- is a core question in sequential decision-making.

Decision Making reinforcement-learning +1

Stabilizing Dynamical Systems via Policy Gradient Methods

no code implementations NeurIPS 2021 Juan C. Perdomo, Jack Umenberger, Max Simchowitz

Stabilizing an unknown control system is one of the most fundamental problems in control systems engineering.

Policy Gradient Methods

Beyond No Regret: Instance-Dependent PAC Reinforcement Learning

no code implementations5 Aug 2021 Andrew Wagenmaker, Max Simchowitz, Kevin Jamieson

We show this is not possible -- there exists a fundamental tradeoff between achieving low regret and identifying an $\epsilon$-optimal policy at the instance-optimal rate.

reinforcement-learning Reinforcement Learning (RL)

Bayesian decision-making under misspecified priors with applications to meta-learning

no code implementations NeurIPS 2021 Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy, Daniel Hsu, Thodoris Lykouris, Miroslav Dudík, Robert E. Schapire

We prove that the expected reward accrued by Thompson sampling (TS) with a misspecified prior differs by at most $\tilde{\mathcal{O}}(H^2 \epsilon)$ from TS with a well specified prior, where $\epsilon$ is the total-variation distance between priors and $H$ is the learning horizon.

Decision Making Meta-Learning +2

On the Stability of Nonlinear Receding Horizon Control: A Geometric Perspective

no code implementations27 Mar 2021 Tyler Westenbroek, Max Simchowitz, Michael I. Jordan, S. Shankar Sastry

Crucially, this guarantee requires that state costs applied to the planning problems are in a certain sense `compatible' with the global geometry of the system, and a simple counter-example demonstrates the necessity of this condition.

Towards a Dimension-Free Understanding of Adaptive Linear Control

no code implementations19 Mar 2021 Juan C. Perdomo, Max Simchowitz, Alekh Agarwal, Peter Bartlett

We study the problem of adaptive control of the linear quadratic regulator for systems in very high, or even infinite dimension.

Exploration and Incentives in Reinforcement Learning

no code implementations28 Feb 2021 Max Simchowitz, Aleksandrs Slivkins

How do you incentivize self-interested agents to $\textit{explore}$ when they prefer to $\textit{exploit}$?

reinforcement-learning Reinforcement Learning (RL)

Task-Optimal Exploration in Linear Dynamical Systems

no code implementations10 Feb 2021 Andrew Wagenmaker, Max Simchowitz, Kevin Jamieson

Along the way, we establish that certainty equivalence decision making is instance- and task-optimal, and obtain the first algorithm for the linear quadratic regulator problem which is instance-optimal.

Decision Making

Learning the Linear Quadratic Regulator from Nonlinear Observations

no code implementations NeurIPS 2020 Zakaria Mhammedi, Dylan J. Foster, Max Simchowitz, Dipendra Misra, Wen Sun, Akshay Krishnamurthy, Alexander Rakhlin, John Langford

We introduce a new algorithm, RichID, which learns a near-optimal policy for the RichLQR with sample complexity scaling only with the dimension of the latent state space and the capacity of the decoder function class.

Continuous Control Decoder

Making Non-Stochastic Control (Almost) as Easy as Stochastic

no code implementations NeurIPS 2020 Max Simchowitz

Recent literature has made much progress in understanding \emph{online LQR}: a modern learning-theoretic take on the classical control problem in which a learner attempts to optimally control an unknown linear dynamical system with fully observed state, perturbed by i. i. d.

Balancing Competing Objectives with Noisy Data: Score-Based Classifiers for Welfare-Aware Machine Learning

1 code implementation ICML 2020 Esther Rolf, Max Simchowitz, Sarah Dean, Lydia T. Liu, Daniel Björkegren, Moritz Hardt, Joshua Blumenstock

Our theoretical results characterize the optimal strategies in this class, bound the Pareto errors due to inaccuracies in the scores, and show an equivalence between optimal strategies and a rich class of fairness-constrained profit-maximizing policies.

BIG-bench Machine Learning Fairness

Logarithmic Regret for Adversarial Online Control

no code implementations29 Feb 2020 Dylan J. Foster, Max Simchowitz

We introduce a new algorithm for online linear-quadratic control in a known system subject to adversarial disturbances.

Reward-Free Exploration for Reinforcement Learning

no code implementations ICML 2020 Chi Jin, Akshay Krishnamurthy, Max Simchowitz, Tiancheng Yu

We give an efficient algorithm that conducts $\tilde{\mathcal{O}}(S^2A\mathrm{poly}(H)/\epsilon^2)$ episodes of exploration and returns $\epsilon$-suboptimal policies for an arbitrary number of reward functions.

reinforcement-learning Reinforcement Learning (RL)

Naive Exploration is Optimal for Online LQR

no code implementations ICML 2020 Max Simchowitz, Dylan J. Foster

Our upper bound is attained by a simple variant of $\textit{{certainty equivalent control}}$, where the learner selects control inputs according to the optimal controller for their estimate of the system while injecting exploratory random noise.

Improper Learning for Non-Stochastic Control

no code implementations25 Jan 2020 Max Simchowitz, Karan Singh, Elad Hazan

We consider the problem of controlling a possibly unknown linear dynamical system with adversarial perturbations, adversarially chosen convex loss functions, and partially observed states, known as non-stochastic control.

Corruption-robust exploration in episodic reinforcement learning

no code implementations20 Nov 2019 Thodoris Lykouris, Max Simchowitz, Aleksandrs Slivkins, Wen Sun

We initiate the study of multi-stage episodic reinforcement learning under adversarial corruptions in both the rewards and the transition probabilities of the underlying system extending recent results for the special case of stochastic bandits.

Multi-Armed Bandits reinforcement-learning +1

The gradient complexity of linear regression

no code implementations6 Nov 2019 Mark Braverman, Elad Hazan, Max Simchowitz, Blake Woodworth

We investigate the computational complexity of several basic linear algebra primitives, including largest eigenvector computation and linear regression, in the computational model that allows access to the data via a matrix-vector product oracle.

regression

Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs

no code implementations NeurIPS 2019 Max Simchowitz, Kevin Jamieson

This paper establishes that optimistic algorithms attain gap-dependent and non-asymptotic logarithmic regret for episodic MDPs.

Learning Linear Dynamical Systems with Semi-Parametric Least Squares

1 code implementation2 Feb 2019 Max Simchowitz, Ross Boczar, Benjamin Recht

We analyze a simple prefiltered variation of the least squares estimator for the problem of estimation with biased, semi-parametric noise, an error model studied more broadly in causal statistics and active learning.

Active Learning

A Successive-Elimination Approach to Adaptive Robotic Sensing

no code implementations27 Sep 2018 Esther Rolf, David Fridovich-Keil, Max Simchowitz, Benjamin Recht, Claire Tomlin

We study an adaptive source seeking problem, in which a mobile robot must identify the strongest emitter(s) of a signal in an environment with background emissions.

Trajectory Planning

The implicit fairness criterion of unconstrained learning

no code implementations29 Aug 2018 Lydia T. Liu, Max Simchowitz, Moritz Hardt

We show that under reasonable conditions, the deviation from satisfying group calibration is upper bounded by the excess risk of the learned score relative to the Bayes optimal score function.

BIG-bench Machine Learning Fairness

Adaptive Sampling for Convex Regression

no code implementations14 Aug 2018 Max Simchowitz, Kevin Jamieson, Jordan W. Suchow, Thomas L. Griffiths

In this paper, we introduce the first principled adaptive-sampling procedure for learning a convex function in the $L_\infty$ norm, a problem that arises often in the behavioral and social sciences.

regression

On the Randomized Complexity of Minimizing a Convex Quadratic Function

no code implementations24 Jul 2018 Max Simchowitz

Minimizing a convex, quadratic objective of the form $f_{\mathbf{A},\mathbf{b}}(x) := \frac{1}{2}x^\top \mathbf{A} x - \langle \mathbf{b}, x \rangle$ for $\mathbf{A} \succ 0 $ is a fundamental problem in machine learning and optimization.

Tight Query Complexity Lower Bounds for PCA via Finite Sample Deformed Wigner Law

no code implementations4 Apr 2018 Max Simchowitz, Ahmed El Alaoui, Benjamin Recht

We show that for every $\mathtt{gap} \in (0, 1/2]$, there exists a distribution over matrices $\mathbf{M}$ for which 1) $\mathrm{gap}_r(\mathbf{M}) = \Omega(\mathtt{gap})$ (where $\mathrm{gap}_r(\mathbf{M})$ is the normalized gap between the $r$ and $r+1$-st largest-magnitude eigenvector of $\mathbf{M}$), and 2) any algorithm $\mathsf{Alg}$ which takes fewer than $\mathrm{const} \times \frac{r \log d}{\sqrt{\mathtt{gap}}}$ queries fails (with overwhelming probability) to identity a matrix $\widehat{\mathsf{V}} \in \mathbb{R}^{d \times r}$ with orthonormal columns for which $\langle \widehat{\mathsf{V}}, \mathbf{M} \widehat{\mathsf{V}}\rangle \ge (1 - \mathrm{const} \times \mathtt{gap})\sum_{i=1}^r \lambda_i(\mathbf{M})$.

Delayed Impact of Fair Machine Learning

3 code implementations ICML 2018 Lydia T. Liu, Sarah Dean, Esther Rolf, Max Simchowitz, Moritz Hardt

Fairness in machine learning has predominantly been studied in static classification settings without concern for how decisions change the underlying population over time.

BIG-bench Machine Learning Fairness

Learning Without Mixing: Towards A Sharp Analysis of Linear System Identification

no code implementations22 Feb 2018 Max Simchowitz, Horia Mania, Stephen Tu, Michael. I. Jordan, Benjamin Recht

We prove that the ordinary least-squares (OLS) estimator attains nearly minimax optimal performance for the identification of linear dynamical systems from a single observed trajectory.

Time Series Time Series Analysis

Approximate Ranking from Pairwise Comparisons

no code implementations4 Jan 2018 Reinhard Heckel, Max Simchowitz, Kannan Ramchandran, Martin J. Wainwright

Accordingly, we study the problem of finding approximate rankings from pairwise comparisons.

First-order Methods Almost Always Avoid Saddle Points

no code implementations20 Oct 2017 Jason D. Lee, Ioannis Panageas, Georgios Piliouras, Max Simchowitz, Michael. I. Jordan, Benjamin Recht

We establish that first-order methods avoid saddle points for almost all initializations.

The Simulator: Understanding Adaptive Sampling in the Moderate-Confidence Regime

no code implementations16 Feb 2017 Max Simchowitz, Kevin Jamieson, Benjamin Recht

Moreover, our lower bounds zero-in on the number of times each \emph{individual} arm needs to be pulled, uncovering new phenomena which are drowned out in the aggregate sample complexity.

Best-of-K Bandits

no code implementations9 Mar 2016 Max Simchowitz, Kevin Jamieson, Benjamin Recht

This paper studies the Best-of-K Bandit game: At each time the player chooses a subset S among all N-choose-K possible options and observes reward max(X(i) : i in S) where X is a random vector drawn from a joint distribution.

Gradient Descent Converges to Minimizers

no code implementations16 Feb 2016 Jason D. Lee, Max Simchowitz, Michael. I. Jordan, Benjamin Recht

We show that gradient descent converges to a local minimizer, almost surely with random initialization.

Cannot find the paper you are looking for? You can Submit a new open access paper.