Search Results for author: Michael. I. Jordan

Found 203 papers, 46 papers with code

Perseus: A Simple High-Order Regularization Method for Variational Inequalities

no code implementations6 May 2022 Tianyi Lin, Michael. I. Jordan

\citet{Monteiro-2012-Iteration} proposed another second-order method which achieved an improved rate of $O(\epsilon^{-2/3}\log(1/\epsilon))$, but this method required a nontrivial binary search procedure as an inner loop.

Uncertainty Sets for Image Classifiers using Conformal Prediction

2 code implementations ICLR 2021 Anastasios Angelopoulos, Stephen Bates, Jitendra Malik, Michael. I. Jordan

Convolutional image classifiers can achieve high predictive accuracy, but quantifying their uncertainty remains an unresolved challenge, hindering their deployment in consequential settings.

Learning from eXtreme Bandit Feedback

no code implementations27 Sep 2020 Romain Lopez, Inderjit S. Dhillon, Michael. I. Jordan

In POXM, the selected actions for the sIS estimator are the top-p actions of the logging policy, where p is adjusted from the data and is significantly smaller than the size of the action space.

Extreme Multi-Label Classification Multi-Label Classification +1

ROOT-SGD: Sharp Nonasymptotics and Asymptotic Efficiency in a Single Algorithm

no code implementations28 Aug 2020 Chris Junchi Li, Wenlong Mou, Martin J. Wainwright, Michael. I. Jordan

The theory and practice of stochastic optimization has focused on stochastic gradient descent (SGD) in recent years, retaining the basic first-order stochastic nature of SGD while aiming to improve it via mechanisms such as averaging, momentum, and variance reduction.

Stochastic Optimization

On Localized Discrepancy for Domain Adaptation

no code implementations14 Aug 2020 Yuchen Zhang, Mingsheng Long, Jian-Min Wang, Michael. I. Jordan

Finally, we further extend the localized discrepancies for achieving super transfer and derive generalization bounds that could be even more sample-efficient on source domain.

Generalization Bounds Unsupervised Domain Adaptation

Transferable Calibration with Lower Bias and Variance in Domain Adaptation

no code implementations NeurIPS 2020 Ximei Wang, Mingsheng Long, Jian-Min Wang, Michael. I. Jordan

In this paper, we delve into the open problem of Calibration in DA, which is extremely challenging due to the coexistence of domain shift and the lack of target labels.

Decision Making Domain Adaptation

Optimal Robust Linear Regression in Nearly Linear Time

no code implementations16 Jul 2020 Yeshwanth Cherapanamjeri, Efe Aras, Nilesh Tripuraneni, Michael. I. Jordan, Nicolas Flammarion, Peter L. Bartlett

We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = \langle X, w^* \rangle + \epsilon$ (with $X \in \mathbb{R}^d$ and $\epsilon$ independent), in which an $\eta$ fraction of the samples have been adversarially corrupted.

Manifold Learning via Manifold Deflation

no code implementations7 Jul 2020 Daniel Ting, Michael. I. Jordan

Nonlinear dimensionality reduction methods provide a valuable means to visualize and interpret high-dimensional data.

Dimensionality Reduction

Accelerated Message Passing for Entropy-Regularized MAP Inference

no code implementations ICML 2020 Jonathan N. Lee, Aldo Pacchiano, Peter Bartlett, Michael. I. Jordan

Maximum a posteriori (MAP) inference in discrete-valued Markov random fields is a fundamental problem in machine learning that involves identifying the most likely configuration of random variables given a distribution.

On Projection Robust Optimal Transport: Sample Complexity and Model Misspecification

no code implementations22 Jun 2020 Tianyi Lin, Zeyu Zheng, Elynn Y. Chen, Marco Cuturi, Michael. I. Jordan

Yet, the behavior of minimum Wasserstein estimators is poorly understood, notably in high-dimensional regimes or under model misspecification.

On the Theory of Transfer Learning: The Importance of Task Diversity

no code implementations NeurIPS 2020 Nilesh Tripuraneni, Michael. I. Jordan, Chi Jin

Formally, we consider $t+1$ tasks parameterized by functions of the form $f_j \circ h$ in a general function class $\mathcal{F} \circ \mathcal{H}$, where each $f_j$ is a task-specific function in $\mathcal{F}$ and $h$ is the shared representation in $\mathcal{H}$.

Representation Learning Transfer Learning

Active Learning for Nonlinear System Identification with Guarantees

no code implementations18 Jun 2020 Horia Mania, Michael. I. Jordan, Benjamin Recht

While the identification of nonlinear dynamical systems is a fundamental building block of model-based reinforcement learning and feedback control, its sample complexity is only understood for systems that either have discrete states and actions or for systems that can be identified from data generated by i. i. d.

Active Learning Model-based Reinforcement Learning +1

Projection Robust Wasserstein Distance and Riemannian Optimization

no code implementations NeurIPS 2020 Tianyi Lin, Chenyou Fan, Nhat Ho, Marco Cuturi, Michael. I. Jordan

Projection robust Wasserstein (PRW) distance, or Wasserstein projection pursuit (WPP), is a robust variant of the Wasserstein distance.

Riemannian optimization

Instability, Computational Efficiency and Statistical Accuracy

no code implementations22 May 2020 Nhat Ho, Koulik Khamaru, Raaz Dwivedi, Martin J. Wainwright, Michael. I. Jordan, Bin Yu

Many statistical estimators are defined as the fixed point of a data-dependent operator, with estimators based on minimizing a cost function being an important special case.

Lower bounds in multiple testing: A framework based on derandomized proxies

no code implementations7 May 2020 Max Rabinovich, Michael. I. Jordan, Martin J. Wainwright

A line of more recent work in multiple testing has begun to investigate the tradeoffs between the FDR and FNR and to provide lower bounds on the performance of procedures that depend on the model structure.

VCG Mechanism Design with Unknown Agent Values under Stochastic Bandit Feedback

no code implementations19 Apr 2020 Kirthevasan Kandasamy, Joseph E. Gonzalez, Michael. I. Jordan, Ion Stoica

To that end, we first define three notions of regret for the welfare, the individual utilities of each agent and that of the mechanism.

On Learning Rates and Schrödinger Operators

no code implementations15 Apr 2020 Bin Shi, Weijie J. Su, Michael. I. Jordan

In this paper, we present a general theoretical analysis of the effect of the learning rate in stochastic gradient descent (SGD).

On dissipative symplectic integration with applications to gradient-based optimization

no code implementations15 Apr 2020 Guilherme França, Michael. I. Jordan, René Vidal

More specifically, we show that a generalization of symplectic integrators to nonconservative and in particular dissipative Hamiltonian systems is able to preserve rates of convergence up to a controlled error.

On Linear Stochastic Approximation: Fine-grained Polyak-Ruppert and Non-Asymptotic Concentration

no code implementations9 Apr 2020 Wenlong Mou, Chris Junchi Li, Martin J. Wainwright, Peter L. Bartlett, Michael. I. Jordan

When the matrix $\bar{A}$ is Hurwitz, we prove a central limit theorem (CLT) for the averaged iterates with fixed step size and number of iterations going to infinity.

Identifying and Correcting Bias from Time- and Severity- Dependent Reporting Rates in the Estimation of the COVID-19 Case Fatality Rate

1 code implementation19 Mar 2020 Anastasios Nikolas Angelopoulos, Reese Pathak, Rohit Varma, Michael. I. Jordan

As we are in the middle of an active outbreak, estimating this measure will necessarily involve correcting for time- and severity- dependent reporting of cases, and time-lags in observed patient outcomes.

Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis

no code implementations16 Mar 2020 Koulik Khamaru, Ashwin Pananjady, Feng Ruan, Martin J. Wainwright, Michael. I. Jordan

We address the problem of policy evaluation in discounted Markov decision processes, and provide instance-dependent guarantees on the $\ell_\infty$-error under a generative model.

Post-Estimation Smoothing: A Simple Baseline for Learning with Side Information

1 code implementation12 Mar 2020 Esther Rolf, Michael. I. Jordan, Benjamin Recht

Observational data are often accompanied by natural structural indices, such as time stamps or geographic locations, which are meaningful to prediction tasks but are often discarded.

Robustness Guarantees for Mode Estimation with an Application to Bandits

no code implementations5 Mar 2020 Aldo Pacchiano, Heinrich Jiang, Michael. I. Jordan

Mode estimation is a classical problem in statistics with a wide range of applications in machine learning.

Multi-Armed Bandits

Optimization with Momentum: Dynamical, Control-Theoretic, and Symplectic Perspectives

no code implementations28 Feb 2020 Michael Muehlebach, Michael. I. Jordan

We analyze the convergence rate of various momentum-based optimization algorithms from a dynamical systems point of view.

Provable Meta-Learning of Linear Representations

1 code implementation26 Feb 2020 Nilesh Tripuraneni, Chi Jin, Michael. I. Jordan

In this paper, we focus on the problem of multi-task linear regression -- in which multiple linear regression models share a common, low-dimensional linear representation.

Meta-Learning Representation Learning

Finite-Time Last-Iterate Convergence for Multi-Agent Learning in Games

no code implementations ICML 2020 Tianyi Lin, Zhengyuan Zhou, Panayotis Mertikopoulos, Michael. I. Jordan

In this paper, we consider multi-agent learning via online gradient descent in a class of games called $\lambda$-cocoercive games, a fairly broad class of games that admits many Nash equilibria and that properly includes unconstrained strongly monotone games.

online learning

On Thompson Sampling with Langevin Algorithms

no code implementations ICML 2020 Eric Mazumdar, Aldo Pacchiano, Yi-An Ma, Peter L. Bartlett, Michael. I. Jordan

The resulting approximate Thompson sampling algorithm has logarithmic regret and its computational complexity does not scale with the time horizon of the algorithm.

Robust Optimization for Fairness with Noisy Protected Groups

1 code implementation NeurIPS 2020 Serena Wang, Wenshuo Guo, Harikrishna Narasimhan, Andrew Cotter, Maya Gupta, Michael. I. Jordan

Second, we introduce two new approaches using robust optimization that, unlike the naive approach of only relying on $\hat{G}$, are guaranteed to satisfy fairness criteria on the true protected groups G while minimizing a training objective.

Fairness

Decision-Making with Auto-Encoding Variational Bayes

2 code implementations NeurIPS 2020 Romain Lopez, Pierre Boyeau, Nir Yosef, Michael. I. Jordan, Jeffrey Regier

To make decisions based on a model fit with auto-encoding variational Bayes (AEVB), practitioners often let the variational distribution serve as a surrogate for the posterior distribution.

Decision Making Two-sample testing

Fixed-Support Wasserstein Barycenters: Computational Hardness and Fast Algorithm

no code implementations NeurIPS 2020 Tianyi Lin, Nhat Ho, Xi Chen, Marco Cuturi, Michael. I. Jordan

We study the fixed-support Wasserstein barycenter problem (FS-WBP), which consists in computing the Wasserstein barycenter of $m$ discrete probability measures supported on a finite metric space of size $n$.

Continuous-time Lower Bounds for Gradient-based Algorithms

no code implementations ICML 2020 Michael Muehlebach, Michael. I. Jordan

This article derives lower bounds on the convergence rate of continuous-time gradient-based optimization algorithms.

Optimization and Control Systems and Control Systems and Control

Near-Optimal Algorithms for Minimax Optimization

no code implementations5 Feb 2020 Tianyi Lin, Chi Jin, Michael. I. Jordan

This paper presents the first algorithm with $\tilde{O}(\sqrt{\kappa_{\mathbf x}\kappa_{\mathbf y}})$ gradient complexity, matching the lower bound up to logarithmic factors.

Variance Reduction with Sparse Gradients

no code implementations ICLR 2020 Melih Elibol, Lihua Lei, Michael. I. Jordan

Variance reduction methods such as SVRG and SpiderBoost use a mixture of large and small batch gradients to reduce the variance of stochastic gradients.

Image Classification

Sampling for Bayesian Mixture Models: MCMC with Polynomial-Time Mixing

no code implementations11 Dec 2019 Wenlong Mou, Nhat Ho, Martin J. Wainwright, Peter L. Bartlett, Michael. I. Jordan

We study the problem of sampling from the power posterior distribution in Bayesian Gaussian mixture models, a robust version of the classical posterior.

The Power of Batching in Multiple Hypothesis Testing

no code implementations11 Oct 2019 Tijana Zrnic, Daniel L. Jiang, Aaditya Ramdas, Michael. I. Jordan

One important partition of algorithms for controlling the false discovery rate (FDR) in multiple testing is into offline and online algorithms.

Two-sample testing

On the Complexity of Approximating Multimarginal Optimal Transport

no code implementations30 Sep 2019 Tianyi Lin, Nhat Ho, Marco Cuturi, Michael. I. Jordan

This provides a first \textit{near-linear time} complexity bound guarantee for approximating the MOT problem and matches the best known complexity bound for the Sinkhorn algorithm in the classical OT setting when $m = 2$.

Towards Understanding the Transferability of Deep Representations

no code implementations26 Sep 2019 Hong Liu, Mingsheng Long, Jian-Min Wang, Michael. I. Jordan

3) The feasibility of transferability is related to the similarity of both input and label.

High-Order Langevin Diffusion Yields an Accelerated MCMC Algorithm

no code implementations28 Aug 2019 Wenlong Mou, Yi-An Ma, Martin J. Wainwright, Peter L. Bartlett, Michael. I. Jordan

We propose a Markov chain Monte Carlo (MCMC) algorithm based on third-order Langevin dynamics for sampling from distributions with log-concave and smooth densities.

How Does Learning Rate Decay Help Modern Neural Networks?

no code implementations ICLR 2020 Kaichao You, Mingsheng Long, Jian-Min Wang, Michael. I. Jordan

Despite the popularity of these common beliefs, experiments suggest that they are insufficient in explaining the general effectiveness of lrDecay in training modern neural networks that are deep, wide, and nonconvex.

A Higher-Order Swiss Army Infinitesimal Jackknife

1 code implementation28 Jul 2019 Ryan Giordano, Michael. I. Jordan, Tamara Broderick

The first-order approximation is known as the "infinitesimal jackknife" in the statistics literature and has been the subject of recent interest in machine learning for approximate CV.

Bayesian Robustness: A Nonasymptotic Viewpoint

no code implementations27 Jul 2019 Kush Bhatia, Yi-An Ma, Anca D. Dragan, Peter L. Bartlett, Michael. I. Jordan

We study the problem of robustly estimating the posterior distribution for the setting where observed data can be contaminated with potentially adversarial outliers.

Provably Efficient Reinforcement Learning with Linear Function Approximation

1 code implementation11 Jul 2019 Chi Jin, Zhuoran Yang, Zhaoran Wang, Michael. I. Jordan

Modern Reinforcement Learning (RL) is commonly applied to practical problems with an enormous number of states, where function approximation must be deployed to approximate either the value function or the policy.

reinforcement-learning

Convergence Rates for Gaussian Mixtures of Experts

no code implementations9 Jul 2019 Nhat Ho, Chiao-Yu Yang, Michael. I. Jordan

We provide a theoretical treatment of over-specified Gaussian mixtures of experts with covariate-free gating networks.

Policy-Gradient Algorithms Have No Guarantees of Convergence in Linear Quadratic Games

no code implementations8 Jul 2019 Eric Mazumdar, Lillian J. Ratliff, Michael. I. Jordan, S. Shankar Sastry

In such games the state and action spaces are continuous and global Nash equilibria can be found be solving coupled Ricatti equations.

reinforcement-learning

Stochastic Gradient and Langevin Processes

no code implementations ICML 2020 Xiang Cheng, Dong Yin, Peter L. Bartlett, Michael. I. Jordan

We prove quantitative convergence rates at which discrete Langevin-like processes converge to the invariant distribution of a related stochastic differential equation.

Competing Bandits in Matching Markets

no code implementations12 Jun 2019 Lydia T. Liu, Horia Mania, Michael. I. Jordan

Stable matching, a classical model for two-sided markets, has long been studied with little consideration for how each side's preferences are learned.

Multi-Armed Bandits

Learning to Score Behaviors for Guided Policy Optimization

1 code implementation ICML 2020 Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang, Anna Choromanska, Krzysztof Choromanski, Michael. I. Jordan

We introduce a new approach for comparing reinforcement learning policies, using Wasserstein distances (WDs) in a newly defined latent behavioral space.

Efficient Exploration Imitation Learning +1

ML-LOO: Detecting Adversarial Examples with Feature Attribution

no code implementations8 Jun 2019 Puyudi Yang, Jianbo Chen, Cho-Jui Hsieh, Jane-Ling Wang, Michael. I. Jordan

Furthermore, we extend our method to include multi-layer feature attributions in order to tackle the attacks with mixed confidence levels.

On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems

no code implementations ICML 2020 Tianyi Lin, Chi Jin, Michael. I. Jordan

We consider nonconvex-concave minimax problems, $\min_{\mathbf{x}} \max_{\mathbf{y} \in \mathcal{Y}} f(\mathbf{x}, \mathbf{y})$, where $f$ is nonconvex in $\mathbf{x}$ but concave in $\mathbf{y}$ and $\mathcal{Y}$ is a convex and bounded set.

Generalized Momentum-Based Methods: A Hamiltonian Perspective

no code implementations2 Jun 2019 Jelena Diakonikolas, Michael. I. Jordan

We take a Hamiltonian-based perspective to generalize Nesterov's accelerated gradient descent and Polyak's heavy ball method to a broad class of momentum methods in the setting of (possibly) constrained minimization in Euclidean and non-Euclidean normed vector spaces.

On the Efficiency of Entropic Regularized Algorithms for Optimal Transport

no code implementations1 Jun 2019 Tianyi Lin, Nhat Ho, Michael. I. Jordan

We prove that APDAMD achieves the complexity bound of $\widetilde{O}(n^2\sqrt{\delta}\varepsilon^{-1})$ in which $\delta>0$ stands for the regularity of $\phi$.

Langevin Monte Carlo without smoothness

no code implementations30 May 2019 Niladri S. Chatterji, Jelena Diakonikolas, Michael. I. Jordan, Peter L. Bartlett

Langevin Monte Carlo (LMC) is an iterative algorithm used to generate samples from a distribution that is known only up to a normalizing constant.

Fast Algorithms for Computational Optimal Transport and Wasserstein Barycenter

no code implementations23 May 2019 Wenshuo Guo, Nhat Ho, Michael. I. Jordan

First, we introduce the \emph{accelerated primal-dual randomized coordinate descent} (APDRCD) algorithm for computing the OT distance.

A Dynamical Systems Perspective on Nesterov Acceleration

no code implementations17 May 2019 Michael Muehlebach, Michael. I. Jordan

We present a dynamical system framework for understanding Nesterov's accelerated gradient method.

A joint model of unpaired data from scRNA-seq and spatial transcriptomics for imputing missing gene expression measurements

3 code implementations6 May 2019 Romain Lopez, Achille Nazaret, Maxime Langevin, Jules Samaran, Jeffrey Regier, Michael. I. Jordan, Nir Yosef

Building upon domain adaptation work, we propose gimVI, a deep generative model for the integration of spatial transcriptomic data and scRNA-seq data that can be used to impute missing genes.

Domain Adaptation Imputation

Neural Rendering Model: Joint Generation and Prediction for Semi-Supervised Learning

no code implementations ICLR 2019 Nhat Ho, Tan Nguyen, Ankit B. Patel, Anima Anandkumar, Michael. I. Jordan, Richard G. Baraniuk

The conjugate prior yields a new regularizer for learning based on the paths rendered in the generative model for training CNNs–the Rendering Path Normalization (RPN).

Neural Rendering

On Structured Filtering-Clustering: Global Error Bound and Optimal First-Order Algorithms

no code implementations16 Apr 2019 Nhat Ho, Tianyi Lin, Michael. I. Jordan

We also conduct experiments on real datasets and the numerical results demonstrate the effectiveness of our algorithms.

Bridging Theory and Algorithm for Domain Adaptation

5 code implementations11 Apr 2019 Yuchen Zhang, Tianle Liu, Mingsheng Long, Michael. I. Jordan

We introduce Margin Disparity Discrepancy, a novel measurement with rigorous generalization bounds, tailored to the distribution comparison with the asymmetric margin loss, and to the minimax optimization for easier training.

Domain Adaptation Generalization Bounds

On the Adaptivity of Stochastic Gradient-Based Optimization

no code implementations9 Apr 2019 Lihua Lei, Michael. I. Jordan

Stochastic-gradient-based optimization has been a core enabling methodology in applications to large-scale problems in machine learning and related areas.

HopSkipJumpAttack: A Query-Efficient Decision-Based Attack

3 code implementations3 Apr 2019 Jianbo Chen, Michael. I. Jordan, Martin J. Wainwright

We develop HopSkipJumpAttack, a family of algorithms based on a novel estimate of the gradient direction using binary information at the decision boundary.

Adversarial Attack

On Nonconvex Optimization for Machine Learning: Gradients, Stochasticity, and Saddle Points

no code implementations13 Feb 2019 Chi Jin, Praneeth Netrapalli, Rong Ge, Sham M. Kakade, Michael. I. Jordan

More recent theory has shown that GD and SGD can avoid saddle points, but the dependence on dimension in these analyses is polynomial.

Acceleration via Symplectic Discretization of High-Resolution Differential Equations

no code implementations NeurIPS 2019 Bin Shi, Simon S. Du, Weijie J. Su, Michael. I. Jordan

We study first-order optimization methods obtained by discretizing ordinary differential equations (ODEs) corresponding to Nesterov's accelerated gradient methods (NAGs) and Polyak's heavy-ball method.

A Short Note on Concentration Inequalities for Random Vectors with SubGaussian Norm

no code implementations11 Feb 2019 Chi Jin, Praneeth Netrapalli, Rong Ge, Sham M. Kakade, Michael. I. Jordan

In this note, we derive concentration inequalities for random vectors with subGaussian norm (a generalization of both subGaussian random vectors and norm bounded random vectors), which are tight up to logarithmic factors.

LS-Tree: Model Interpretation When the Data Are Linguistic

no code implementations11 Feb 2019 Jianbo Chen, Michael. I. Jordan

We study the problem of interpreting trained classification models in the setting of linguistic data sets.

General Classification

Cost-Effective Incentive Allocation via Structured Counterfactual Inference

no code implementations7 Feb 2019 Romain Lopez, Chenchen Li, Xiang Yan, Junwu Xiong, Michael. I. Jordan, Yuan Qi, Le Song

We address a practical problem ubiquitous in modern marketing campaigns, in which a central agent tries to learn a policy for allocating strategic financial incentives to customers and observes only bandit feedback.

Counterfactual Inference Domain Adaptation

Is There an Analog of Nesterov Acceleration for MCMC?

no code implementations4 Feb 2019 Yi-An Ma, Niladri Chatterji, Xiang Cheng, Nicolas Flammarion, Peter Bartlett, Michael. I. Jordan

We formulate gradient-based Markov chain Monte Carlo (MCMC) sampling as optimization on the space of probability measures, with Kullback-Leibler (KL) divergence as the objective functional.

Quantitative Weak Convergence for Discrete Stochastic Processes

no code implementations3 Feb 2019 Xiang Cheng, Peter L. Bartlett, Michael. I. Jordan

In this paper, we quantitative convergence in $W_2$ for a family of Langevin-like stochastic processes that includes stochastic gradient descent and related gradient-based algorithms.

What is Local Optimality in Nonconvex-Nonconcave Minimax Optimization?

1 code implementation ICML 2020 Chi Jin, Praneeth Netrapalli, Michael. I. Jordan

Minimax optimization has found extensive applications in modern machine learning, in settings such as generative adversarial networks (GANs), adversarial training and multi-agent reinforcement learning.

Multi-agent Reinforcement Learning

Sharp Analysis of Expectation-Maximization for Weakly Identifiable Models

no code implementations1 Feb 2019 Raaz Dwivedi, Nhat Ho, Koulik Khamaru, Martin J. Wainwright, Michael. I. Jordan, Bin Yu

We study a class of weakly identifiable location-scale mixture models for which the maximum likelihood estimates based on $n$ i. i. d.

Theoretically Principled Trade-off between Robustness and Accuracy

5 code implementations24 Jan 2019 Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric P. Xing, Laurent El Ghaoui, Michael. I. Jordan

We identify a trade-off between robustness and accuracy that serves as a guiding principle in the design of defenses against adversarial examples.

Adversarial Attack Adversarial Defense +2

On Efficient Optimal Transport: An Analysis of Greedy and Accelerated Mirror Descent Algorithms

no code implementations19 Jan 2019 Tianyi Lin, Nhat Ho, Michael. I. Jordan

We show that a greedy variant of the classical Sinkhorn algorithm, known as the \emph{Greenkhorn algorithm}, can be improved to $\widetilde{\mathcal{O}}(n^2\varepsilon^{-2})$, improving on the best known complexity bound of $\widetilde{\mathcal{O}}(n^2\varepsilon^{-3})$.

Data Structures and Algorithms

On Finding Local Nash Equilibria (and Only Local Nash Equilibria) in Zero-Sum Games

no code implementations3 Jan 2019 Eric V. Mazumdar, Michael. I. Jordan, S. Shankar Sastry

We propose local symplectic surgery, a two-timescale procedure for finding local Nash equilibria in two-player zero-sum games.

Asynchronous Online Testing of Multiple Hypotheses

2 code implementations12 Dec 2018 Tijana Zrnic, Aaditya Ramdas, Michael. I. Jordan

We consider the problem of asynchronous online testing, aimed at providing control of the false discovery rate (FDR) during a continual stream of data collection and testing, where each test may be a sequential test that can start and stop at arbitrary times.

Gen-Oja: Simple & Efficient Algorithm for Streaming Generalized Eigenvector Computation

no code implementations NeurIPS 2018 Kush Bhatia, Aldo Pacchiano, Nicolas Flammarion, Peter L. Bartlett, Michael. I. Jordan

In this paper, we study the problems of principle Generalized Eigenvector computation and Canonical Correlation Analysis in the stochastic setting.

Theoretical guarantees for EM under misspecified Gaussian mixture models

no code implementations NeurIPS 2018 Raaz Dwivedi, Nhật Hồ, Koulik Khamaru, Martin J. Wainwright, Michael. I. Jordan

We provide two classes of theoretical guarantees: first, we characterize the bias introduced due to the misspecification; and second, we prove that population EM converges at a geometric rate to the model projection under a suitable initialization condition.

Gen-Oja: A Two-time-scale approach for Streaming CCA

no code implementations20 Nov 2018 Kush Bhatia, Aldo Pacchiano, Nicolas Flammarion, Peter L. Bartlett, Michael. I. Jordan

In this paper, we study the problems of principal Generalized Eigenvector computation and Canonical Correlation Analysis in the stochastic setting.

Sampling Can Be Faster Than Optimization

no code implementations20 Nov 2018 Yi-An Ma, Yuansi Chen, Chi Jin, Nicolas Flammarion, Michael. I. Jordan

Optimization algorithms and Monte Carlo sampling algorithms have provided the computational foundations for the rapid growth in applications of statistical machine learning in recent years.

A Bayesian Perspective of Convolutional Neural Networks through a Deconvolutional Generative Model

no code implementations1 Nov 2018 Tan Nguyen, Nhat Ho, Ankit Patel, Anima Anandkumar, Michael. I. Jordan, Richard G. Baraniuk

This conjugate prior yields a new regularizer based on paths rendered in the generative model for training CNNs-the Rendering Path Normalization (RPN).

Probabilistic Multilevel Clustering via Composite Transportation Distance

no code implementations29 Oct 2018 Nhat Ho, Viet Huynh, Dinh Phung, Michael. I. Jordan

We propose a novel probabilistic approach to multilevel clustering problems based on composite transportation distance, which is a variant of transportation distance where the underlying metric is Kullback-Leibler divergence.

Understanding the Acceleration Phenomenon via High-Resolution Differential Equations

no code implementations21 Oct 2018 Bin Shi, Simon S. Du, Michael. I. Jordan, Weijie J. Su

We also show that these ODEs are more accurate surrogates for the underlying algorithms; in particular, they not only distinguish between NAG-SC and Polyak's heavy-ball method, but they allow the identification of a term that we refer to as "gradient correction" that is present in NAG-SC but not in the heavy-ball method and is responsible for the qualitative difference in convergence of the two methods.

Evaluating Sensitivity to the Stick-Breaking Prior in Bayesian Nonparametrics

4 code implementations15 Oct 2018 Runjing Liu, Ryan Giordano, Michael. I. Jordan, Tamara Broderick

Bayesian models based on the Dirichlet process and other stick-breaking priors have been proposed as core ingredients for clustering, topic modeling, and other unsupervised learning tasks.

Methodology

Rao-Blackwellized Stochastic Gradients for Discrete Distributions

1 code implementation10 Oct 2018 Runjing Liu, Jeffrey Regier, Nilesh Tripuraneni, Michael. I. Jordan, Jon McAuliffe

We wish to compute the gradient of an expectation over a finite or countably infinite sample space having $K \leq \infty$ categories.

General Classification

Singularity, Misspecification, and the Convergence Rate of EM

no code implementations1 Oct 2018 Raaz Dwivedi, Nhat Ho, Koulik Khamaru, Michael. I. Jordan, Martin J. Wainwright, Bin Yu

A line of recent work has analyzed the behavior of the Expectation-Maximization (EM) algorithm in the well-specified setting, in which the population likelihood is locally strongly concave around its maximizing argument.

Is Q-learning Provably Efficient?

no code implementations NeurIPS 2018 Chi Jin, Zeyuan Allen-Zhu, Sebastien Bubeck, Michael. I. Jordan

We prove that, in an episodic MDP setting, Q-learning with UCB exploration achieves regret $\tilde{O}(\sqrt{H^3 SAT})$, where $S$ and $A$ are the numbers of states and actions, $H$ is the number of steps per episode, and $T$ is the total number of steps.

Q-Learning

Fundamental limits of detection in the spiked Wigner model

no code implementations25 Jun 2018 Ahmed El Alaoui, Florent Krzakala, Michael. I. Jordan

We study the fundamental limits of detecting the presence of an additive rank-one perturbation, or spike, to a Wigner matrix.

A Swiss Army Infinitesimal Jackknife

3 code implementations1 Jun 2018 Ryan Giordano, Will Stephenson, Runjing Liu, Michael. I. Jordan, Tamara Broderick

This linear approximation is sometimes known as the "infinitesimal jackknife" in the statistics literature, where it is mostly used to as a theoretical tool to prove asymptotic results.

Methodology

Improved Sample Complexity for Stochastic Compositional Variance Reduced Gradient

1 code implementation1 Jun 2018 Tianyi Lin, Chenyou Fan, Mengdi Wang, Michael. I. Jordan

Convex composition optimization is an emerging topic that covers a wide range of applications arising from stochastic optimal control, reinforcement learning and multi-stage stochastic programming.

reinforcement-learning

Information Constraints on Auto-Encoding Variational Bayes

no code implementations NeurIPS 2018 Romain Lopez, Jeffrey Regier, Michael. I. Jordan, Nir Yosef

We show how to apply this method to a range of problems, including the problems of learning invariant representations and the learning of interpretable representations.

Sharp convergence rates for Langevin dynamics in the nonconvex setting

no code implementations4 May 2018 Xiang Cheng, Niladri S. Chatterji, Yasin Abbasi-Yadkori, Peter L. Bartlett, Michael. I. Jordan

We study the problem of sampling from a distribution $p^*(x) \propto \exp\left(-U(x)\right)$, where the function $U$ is $L$-smooth everywhere and $m$-strongly convex outside a ball of radius $R$, but potentially nonconvex inside this ball.

On the Local Minima of the Empirical Risk

no code implementations NeurIPS 2018 Chi Jin, Lydia T. Liu, Rong Ge, Michael. I. Jordan

Our objective is to find the $\epsilon$-approximate local minima of the underlying function $F$ while avoiding the shallow local minima---arising because of the tolerance $\nu$---which exist only in $f$.

Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning

no code implementations28 Feb 2018 Vladimir Feinberg, Alvin Wan, Ion Stoica, Michael. I. Jordan, Joseph E. Gonzalez, Sergey Levine

By enabling wider use of learned dynamics models within a model-free reinforcement learning algorithm, we improve value estimation, which, in turn, reduces the sample complexity of learning.

Continuous Control reinforcement-learning

Averaging Stochastic Gradient Descent on Riemannian Manifolds

no code implementations26 Feb 2018 Nilesh Tripuraneni, Nicolas Flammarion, Francis Bach, Michael. I. Jordan

We consider the minimization of a function defined on a Riemannian manifold $\mathcal{M}$ accessible only through unbiased estimates of its gradients.

Learning Without Mixing: Towards A Sharp Analysis of Linear System Identification

no code implementations22 Feb 2018 Max Simchowitz, Horia Mania, Stephen Tu, Michael. I. Jordan, Benjamin Recht

We prove that the ordinary least-squares (OLS) estimator attains nearly minimax optimal performance for the identification of linear dynamical systems from a single observed trajectory.

Time Series

On the Theory of Variance Reduction for Stochastic Gradient Monte Carlo

no code implementations ICML 2018 Niladri S. Chatterji, Nicolas Flammarion, Yi-An Ma, Peter L. Bartlett, Michael. I. Jordan

We provide convergence guarantees in Wasserstein distance for a variety of variance-reduction methods: SAGA Langevin diffusion, SVRG Langevin diffusion and control-variate underdamped Langevin diffusion.

RLlib: Abstractions for Distributed Reinforcement Learning

3 code implementations ICML 2018 Eric Liang, Richard Liaw, Philipp Moritz, Robert Nishihara, Roy Fox, Ken Goldberg, Joseph E. Gonzalez, Michael. I. Jordan, Ion Stoica

Reinforcement learning (RL) algorithms involve the deep nesting of highly irregular computation patterns, each of which typically exhibits opportunities for distributed computation.

reinforcement-learning

Ray: A Distributed Framework for Emerging AI Applications

4 code implementations16 Dec 2017 Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael. I. Jordan, Ion Stoica

To meet the performance requirements, Ray employs a distributed scheduler and a distributed and fault-tolerant store to manage the system's control state.

reinforcement-learning

Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent

no code implementations28 Nov 2017 Chi Jin, Praneeth Netrapalli, Michael. I. Jordan

Nesterov's accelerated gradient descent (AGD), an instance of the general family of "momentum methods", provably achieves faster convergence rate than gradient descent (GD) in the convex setting.

Stochastic Cubic Regularization for Fast Nonconvex Optimization

no code implementations NeurIPS 2018 Nilesh Tripuraneni, Mitchell Stern, Chi Jin, Jeffrey Regier, Michael. I. Jordan

This paper proposes a stochastic variant of a classic algorithm---the cubic-regularized Newton method [Nesterov and Polyak 2006].

First-order Methods Almost Always Avoid Saddle Points

no code implementations20 Oct 2017 Jason D. Lee, Ioannis Panageas, Georgios Piliouras, Max Simchowitz, Michael. I. Jordan, Benjamin Recht

We establish that first-order methods avoid saddle points for almost all initializations.

Online control of the false discovery rate with decaying memory

1 code implementation NeurIPS 2017 Aaditya Ramdas, Fanny Yang, Martin J. Wainwright, Michael. I. Jordan

In the online multiple testing problem, p-values corresponding to different null hypotheses are observed one by one, and the decision of whether or not to reject the current hypothesis must be made immediately, after which the next p-value is observed.

Unity

DAGGER: A sequential algorithm for FDR control on DAGs

1 code implementation29 Sep 2017 Aaditya Ramdas, Jianbo Chen, Martin J. Wainwright, Michael. I. Jordan

We propose a linear-time, single-pass, top-down algorithm for multiple testing on directed acyclic graphs (DAGs), where nodes represent hypotheses and edges specify a partial ordering in which hypotheses must be tested.

Model Selection

Covariances, Robustness, and Variational Bayes

4 code implementations8 Sep 2017 Ryan Giordano, Tamara Broderick, Michael. I. Jordan

The estimates for MFVB posterior covariances rely on a result from the classical Bayesian robustness literature relating derivatives of posterior expectations to posterior covariances and include the Laplace approximation as a special case.

Methodology

Underdamped Langevin MCMC: A non-asymptotic analysis

no code implementations12 Jul 2017 Xiang Cheng, Niladri S. Chatterji, Peter L. Bartlett, Michael. I. Jordan

We study the underdamped Langevin diffusion when the log of the target distribution is smooth and strongly concave.

Kernel Feature Selection via Conditional Covariance Minimization

1 code implementation NeurIPS 2017 Jianbo Chen, Mitchell Stern, Martin J. Wainwright, Michael. I. Jordan

We propose a method for feature selection that employs kernel-based measures of independence to find a subset of covariates that is maximally predictive of the response.

Dimensionality Reduction

Fast Black-box Variational Inference through Stochastic Trust-Region Optimization

no code implementations NeurIPS 2017 Jeffrey Regier, Michael. I. Jordan, Jon McAuliffe

We introduce TrustVI, a fast second-order algorithm for black-box variational inference based on trust-region optimization and the reparameterization trick.

Variational Inference

Gradient Descent Can Take Exponential Time to Escape Saddle Points

no code implementations NeurIPS 2017 Simon S. Du, Chi Jin, Jason D. Lee, Michael. I. Jordan, Barnabas Poczos, Aarti Singh

Although gradient descent (GD) almost always escapes saddle points asymptotically [Lee et al., 2016], this paper shows that even with fairly natural random initialization schemes and non-pathological functions, GD can be significantly slowed down by saddle points, taking exponential time to escape.

Conditional Adversarial Domain Adaptation

3 code implementations NeurIPS 2018 Mingsheng Long, Zhangjie Cao, Jian-Min Wang, Michael. I. Jordan

Adversarial learning has been embedded into deep networks to learn disentangled and transferable representations for domain adaptation.

Domain Adaptation General Classification

A unified treatment of multiple testing with prior knowledge using the p-filter

no code implementations18 Mar 2017 Aaditya Ramdas, Rina Foygel Barber, Martin J. Wainwright, Michael. I. Jordan

There is a significant literature on methods for incorporating knowledge into multiple testing procedures so as to improve their power and precision.

Real-Time Machine Learning: The Missing Pieces

2 code implementations11 Mar 2017 Robert Nishihara, Philipp Moritz, Stephanie Wang, Alexey Tumanov, William Paul, Johann Schleier-Smith, Richard Liaw, Mehrdad Niknami, Michael. I. Jordan, Ion Stoica

Machine learning applications are increasingly deployed not only to serve predictions using static models, but also as tightly-integrated components of feedback loops involving dynamic, real-time decision making.

Decision Making

How to Escape Saddle Points Efficiently

no code implementations ICML 2017 Chi Jin, Rong Ge, Praneeth Netrapalli, Sham M. Kakade, Michael. I. Jordan

This paper shows that a perturbed form of gradient descent converges to a second-order stationary point in a number iterations which depends only poly-logarithmically on dimension (i. e., it is almost "dimension-free").

Less than a Single Pass: Stochastically Controlled Stochastic Gradient Method

no code implementations12 Sep 2016 Lihua Lei, Michael. I. Jordan

We develop and analyze a procedure for gradient-based optimization that we refer to as stochastically controlled stochastic gradient (SCSG).

Communication-Efficient Distributed Statistical Inference

no code implementations25 May 2016 Michael. I. Jordan, Jason D. Lee, Yun Yang

CSL provides a communication-efficient surrogate to the global likelihood that can be used for low-dimensional estimation, high-dimensional regularized estimation and Bayesian inference.

Bayesian Inference

Deep Transfer Learning with Joint Adaptation Networks

3 code implementations ICML 2017 Mingsheng Long, Han Zhu, Jian-Min Wang, Michael. I. Jordan

Deep networks have been successfully applied to learn transferable features for adapting models from a source domain to a different target domain.

Multi-Source Unsupervised Domain Adaptation Transfer Learning

Function-Specific Mixing Times and Concentration Away from Equilibrium

no code implementations6 May 2016 Maxim Rabinovich, Aaditya Ramdas, Michael. I. Jordan, Martin J. Wainwright

These results show that it is possible for empirical expectations of functions to concentrate long before the underlying chain has mixed in the classical sense, and we show that the concentration rates we achieve are optimal up to constants.

On kernel methods for covariates that are rankings

no code implementations25 Mar 2016 Horia Mania, Aaditya Ramdas, Martin J. Wainwright, Michael. I. Jordan, Benjamin Recht

This paper studies the use of reproducing kernel Hilbert space methods for learning from permutation-valued features.

A Variational Perspective on Accelerated Methods in Optimization

no code implementations14 Mar 2016 Andre Wibisono, Ashia C. Wilson, Michael. I. Jordan

We show that there is a Lagrangian functional that we call the \emph{Bregman Lagrangian} which generates a large class of accelerated methods in continuous time, including (but not limited to) accelerated gradient descent, its non-Euclidean extension, and accelerated higher-order gradient methods.

Asymptotic behavior of $\ell_p$-based Laplacian regularization in semi-supervised learning

no code implementations2 Mar 2016 Ahmed El Alaoui, Xiang Cheng, Aaditya Ramdas, Martin J. Wainwright, Michael. I. Jordan

Together, these properties show that $p = d+1$ is an optimal choice, yielding a function estimate $\hat{f}$ that is both smooth and non-degenerate, while remaining maximally sensitive to $P$.

Gradient Descent Converges to Minimizers

no code implementations16 Feb 2016 Jason D. Lee, Max Simchowitz, Michael. I. Jordan, Benjamin Recht

We show that gradient descent converges to a local minimizer, almost surely with random initialization.

Unsupervised Domain Adaptation with Residual Transfer Networks

1 code implementation NeurIPS 2016 Mingsheng Long, Han Zhu, Jian-Min Wang, Michael. I. Jordan

In this paper, we propose a new approach to domain adaptation in deep networks that can jointly learn adaptive classifiers and transferable features from labeled data in the source domain and unlabeled data in the target domain.

Unsupervised Domain Adaptation

A Kernelized Stein Discrepancy for Goodness-of-fit Tests and Model Evaluation

no code implementations10 Feb 2016 Qiang Liu, Jason D. Lee, Michael. I. Jordan

We derive a new discrepancy statistic for measuring differences between two probability distributions based on combining Stein's identity with the reproducing kernel Hilbert space theory.

L1-Regularized Distributed Optimization: A Communication-Efficient Primal-Dual Framework

2 code implementations13 Dec 2015 Virginia Smith, Simone Forte, Michael. I. Jordan, Martin Jaggi

Despite the importance of sparsity in many large-scale applications, there are few methods for distributed optimization of sparsity-inducing objectives.

Distributed Optimization

Distributed Optimization with Arbitrary Local Solvers

1 code implementation13 Dec 2015 Chenxin Ma, Jakub Konečný, Martin Jaggi, Virginia Smith, Michael. I. Jordan, Peter Richtárik, Martin Takáč

To this end, we present a framework for distributed optimization that both allows the flexibility of arbitrary solvers to be used on each (single) machine locally, and yet maintains competitive performance against other state-of-the-art special-purpose distributed methods.

Distributed Optimization

Learning Halfspaces and Neural Networks with Random Initialization

no code implementations25 Nov 2015 Yuchen Zhang, Jason D. Lee, Martin J. Wainwright, Michael. I. Jordan

For loss functions that are $L$-Lipschitz continuous, we present algorithms to learn halfspaces and multi-layer neural networks that achieve arbitrarily small excess risk $\epsilon>0$.

SparkNet: Training Deep Networks in Spark

1 code implementation19 Nov 2015 Philipp Moritz, Robert Nishihara, Ion Stoica, Michael. I. Jordan

We introduce SparkNet, a framework for training deep networks in Spark.

$\ell_1$-regularized Neural Networks are Improperly Learnable in Polynomial Time

no code implementations13 Oct 2015 Yuchen Zhang, Jason D. Lee, Michael. I. Jordan

The sample complexity and the time complexity of the presented method are polynomial in the input dimension and in $(1/\epsilon,\log(1/\delta), F(k, L))$, where $F(k, L)$ is a function depending on $(k, L)$ and on the activation function, independent of the number of neurons.

A Linearly-Convergent Stochastic L-BFGS Algorithm

1 code implementation9 Aug 2015 Philipp Moritz, Robert Nishihara, Michael. I. Jordan

We propose a new stochastic L-BFGS algorithm and prove a linear convergence rate for strongly convex and smooth functions.

Perturbed Iterate Analysis for Asynchronous Stochastic Optimization

no code implementations24 Jul 2015 Horia Mania, Xinghao Pan, Dimitris Papailiopoulos, Benjamin Recht, Kannan Ramchandran, Michael. I. Jordan

We demonstrate experimentally on a 16-core machine that the sparse and parallel version of SVRG is in some cases more than four orders of magnitude faster than the standard SVRG algorithm.

Stochastic Optimization

Parallel Correlation Clustering on Big Graphs

no code implementations NeurIPS 2015 Xinghao Pan, Dimitris Papailiopoulos, Samet Oymak, Benjamin Recht, Kannan Ramchandran, Michael. I. Jordan

We present C4 and ClusterWild!, two algorithms for parallel correlation clustering that run in a polylogarithmic number of rounds and achieve nearly linear speedups, provably.

On the accuracy of self-normalized log-linear models

no code implementations NeurIPS 2015 Jacob Andreas, Maxim Rabinovich, Dan Klein, Michael. I. Jordan

Calculation of the log-normalizer is a major computational obstacle in applications of log-linear models with large output spaces.

Generalization Bounds

Variational consensus Monte Carlo

no code implementations NeurIPS 2015 Maxim Rabinovich, Elaine Angelino, Michael. I. Jordan

Practitioners of Bayesian statistics have long depended on Markov chain Monte Carlo (MCMC) to obtain samples from intractable posterior distributions.

On the Computational Complexity of High-Dimensional Bayesian Variable Selection

no code implementations29 May 2015 Yun Yang, Martin J. Wainwright, Michael. I. Jordan

We study the computational complexity of Markov chain Monte Carlo (MCMC) methods for high-dimensional Bayesian linear regression under sparsity constraints.

Variable Selection

Trust Region Policy Optimization

20 code implementations19 Feb 2015 John Schulman, Sergey Levine, Philipp Moritz, Michael. I. Jordan, Pieter Abbeel

We describe an iterative procedure for optimizing policies, with guaranteed monotonic improvement.

Atari Games Policy Gradient Methods

Adding vs. Averaging in Distributed Primal-Dual Optimization

1 code implementation12 Feb 2015 Chenxin Ma, Virginia Smith, Martin Jaggi, Michael. I. Jordan, Peter Richtárik, Martin Takáč

Distributed optimization methods for large-scale machine learning suffer from a communication bottleneck.

Distributed Optimization

Learning Transferable Features with Deep Adaptation Networks

4 code implementations10 Feb 2015 Mingsheng Long, Yue Cao, Jian-Min Wang, Michael. I. Jordan

Recent studies reveal that a deep neural network can learn transferable features which generalize well to novel tasks for domain adaptation.

Domain Adaptation Image Classification

A General Analysis of the Convergence of ADMM

no code implementations6 Feb 2015 Robert Nishihara, Laurent Lessard, Benjamin Recht, Andrew Packard, Michael. I. Jordan

We provide a new proof of the linear convergence of the alternating direction method of multipliers (ADMM) when one of the objective terms is strongly convex.

Optimization and Control Numerical Analysis

Distributed Estimation of Generalized Matrix Rank: Efficient Algorithms and Lower Bounds

no code implementations5 Feb 2015 Yuchen Zhang, Martin J. Wainwright, Michael. I. Jordan

We study the following generalized matrix rank estimation problem: given an $n \times n$ matrix and a constant $c \geq 0$, estimate the number of eigenvalues that are greater than $c$.

TuPAQ: An Efficient Planner for Large-scale Predictive Analytic Queries

no code implementations31 Jan 2015 Evan R. Sparks, Ameet Talwalkar, Michael J. Franklin, Michael. I. Jordan, Tim Kraska

The proliferation of massive datasets combined with the development of sophisticated analytical techniques have enabled a wide variety of novel applications such as improved product recommendations, automatic image tagging, and improved speech-driven interfaces.

Communication-Efficient Distributed Dual Coordinate Ascent

no code implementations NeurIPS 2014 Martin Jaggi, Virginia Smith, Martin Takáč, Jonathan Terhorst, Sanjay Krishnan, Thomas Hofmann, Michael. I. Jordan

Communication remains the most significant bottleneck in the performance of distributed optimization algorithms for large-scale machine learning.

Distributed Optimization

On the Convergence Rate of Decomposable Submodular Function Minimization

no code implementations NeurIPS 2014 Robert Nishihara, Stefanie Jegelka, Michael. I. Jordan

Submodular functions describe a variety of discrete problems in machine learning, signal processing, and computer vision.

Optimality guarantees for distributed statistical estimation

no code implementations5 May 2014 John C. Duchi, Michael. I. Jordan, Martin J. Wainwright, Yuchen Zhang

Large data sets often require performing distributed statistical estimation, with a full data set split across multiple machines and limited communication between machines.

Particle Gibbs with Ancestor Sampling

no code implementations3 Jan 2014 Fredrik Lindsten, Michael. I. Jordan, Thomas B. Schön

Particle Markov chain Monte Carlo (PMCMC) is a systematic way of combining the two main tools used for Monte Carlo statistical inference: sequential Monte Carlo (SMC) and Markov chain Monte Carlo (MCMC).

Optimal rates for zero-order convex optimization: the power of two function evaluations

no code implementations7 Dec 2013 John C. Duchi, Michael. I. Jordan, Martin J. Wainwright, Andre Wibisono

We consider derivative-free algorithms for stochastic and non-stochastic convex optimization problems that use only function values rather than gradients.

A Comparative Framework for Preconditioned Lasso Algorithms

no code implementations NeurIPS 2013 Fabian L. Wauthier, Nebojsa Jojic, Michael. I. Jordan

The Lasso is a cornerstone of modern multivariate data analysis, yet its performance suffers in the common situation in which covariates are correlated.

Estimation, Optimization, and Parallelism when Data is Sparse

no code implementations NeurIPS 2013 John Duchi, Michael. I. Jordan, Brendan Mcmahan

We study stochastic optimization problems when the \emph{data} is sparse, which is in a sense dual to the current understanding of high-dimensional statistical learning and optimization.

Stochastic Optimization

Information-theoretic lower bounds for distributed statistical estimation with communication constraints

no code implementations NeurIPS 2013 Yuchen Zhang, John Duchi, Michael. I. Jordan, Martin J. Wainwright

We establish minimax risk lower bounds for distributed statistical estimation given a budget $B$ of the total number of bits that may be communicated.

General Classification

Local Privacy and Minimax Bounds: Sharp Rates for Probability Estimation

no code implementations NeurIPS 2013 John Duchi, Martin J. Wainwright, Michael. I. Jordan

We provide a detailed study of the estimation of probability distributions---discrete and continuous---in a stringent setting in which data is kept private even from the statistician.

Survey Sampling

MLI: An API for Distributed Machine Learning

no code implementations21 Oct 2013 Evan R. Sparks, Ameet Talwalkar, Virginia Smith, Jey Kottalam, Xinghao Pan, Joseph Gonzalez, Michael J. Franklin, Michael. I. Jordan, Tim Kraska

MLI is an Application Programming Interface designed to address the challenges of building Machine Learn- ing algorithms in a distributed setting based on data-centric computing.

On statistics, computation and scalability

no code implementations30 Sep 2013 Michael. I. Jordan

How should statistical procedures be designed so as to be scalable computationally to the massive datasets that are increasingly the norm?

Mixed Membership Models for Time Series

no code implementations13 Sep 2013 Emily B. Fox, Michael. I. Jordan

Although much of the literature on mixed membership models considers the setting in which exchangeable collections of data are associated with each member of a set of entities, it is equally natural to consider problems in which an entire time series is viewed as an entity and the goal is to characterize the time series in terms of a set of underlying dynamic attributes or "dynamic regimes".

Time Series Time Series Analysis

Optimistic Concurrency Control for Distributed Unsupervised Learning

no code implementations NeurIPS 2013 Xinghao Pan, Joseph E. Gonzalez, Stefanie Jegelka, Tamara Broderick, Michael. I. Jordan

Research on distributed machine learning algorithms has focused primarily on one of two extremes - algorithms that obey strict concurrency constraints or algorithms that obey few or no such constraints.

Streaming Variational Bayes

2 code implementations NeurIPS 2013 Tamara Broderick, Nicholas Boyd, Andre Wibisono, Ashia C. Wilson, Michael. I. Jordan

We present SDA-Bayes, a framework for (S)treaming, (D)istributed, (A)synchronous computation of a Bayesian posterior.

Variational Inference

Distributed Low-rank Subspace Segmentation

no code implementations20 Apr 2013 Ameet Talwalkar, Lester Mackey, Yadong Mu, Shih-Fu Chang, Michael. I. Jordan

Vision problems ranging from image clustering to motion segmentation to semi-supervised learning can naturally be framed as subspace segmentation problems, in which one aims to recover multiple low-dimensional subspaces from noisy and corrupted input data.

Event Detection Face Recognition +2

Loopy Belief Propagation for Approximate Inference: An Empirical Study

1 code implementation23 Jan 2013 Kevin Murphy, Yair Weiss, Michael. I. Jordan

Recently, researchers have demonstrated that loopy belief propagation - the use of Pearls polytree algorithm IN a Bayesian network WITH loops OF error- correcting codes. The most dramatic instance OF this IS the near Shannon - limit performance OF Turbo Codes codes whose decoding algorithm IS equivalent TO loopy belief propagation IN a chain - structured Bayesian network.

Variational MCMC

no code implementations10 Jan 2013 Nando de Freitas, Pedro Hojen-Sorensen, Michael. I. Jordan, Stuart Russell

One of these algorithms is a mixture of two MCMC kernels: a random walk Metropolis kernel and a blockMetropolis-Hastings (MH) kernel with a variational approximation as proposaldistribution.

Finite Sample Convergence Rates of Zero-Order Stochastic Optimization Methods

no code implementations NeurIPS 2012 Andre Wibisono, Martin J. Wainwright, Michael. I. Jordan, John C. Duchi

We consider derivative-free algorithms for stochastic optimization problems that use only noisy function values rather than gradients, analyzing their finite-sample convergence rates.

Stochastic Optimization

Ancestor Sampling for Particle Gibbs

no code implementations NeurIPS 2012 Fredrik Lindsten, Thomas Schön, Michael. I. Jordan

We present a novel method in the family of particle MCMC methods that we refer to as particle Gibbs with ancestor sampling (PG-AS).

Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

no code implementations NeurIPS 2012 Ke Jiang, Brian Kulis, Michael. I. Jordan

Links between probabilistic and non-probabilistic learning algorithms can arise by performing small-variance asymptotics, i. e., letting the variance of particular distributions in a graphical model go to zero.

Nested Hierarchical Dirichlet Processes

no code implementations25 Oct 2012 John Paisley, Chong Wang, David M. Blei, Michael. I. Jordan

We develop a nested hierarchical Dirichlet process (nHDP) for hierarchical topic modeling.

Variational Inference

Privacy Aware Learning

no code implementations NeurIPS 2012 John C. Duchi, Michael. I. Jordan, Martin J. Wainwright

We study statistical risk minimization problems under a privacy model in which the data is kept confidential even from the learner.

Active Learning for Crowd-Sourced Databases

no code implementations17 Sep 2012 Barzan Mozafari, Purnamrita Sarkar, Michael J. Franklin, Michael. I. Jordan, Samuel Madden

Based on this observation, we present two new active learning algorithms to combine humans and algorithms together in a crowd-sourced database.

Active Learning

The asymptotics of ranking algorithms

no code implementations7 Apr 2012 John C. Duchi, Lester Mackey, Michael. I. Jordan

With these negative results as motivation, we present a new approach to supervised ranking based on aggregation of partial preferences, and we develop $U$-statistic-based empirical risk minimization procedures.

Bayesian Bias Mitigation for Crowdsourcing

no code implementations NeurIPS 2011 Fabian L. Wauthier, Michael. I. Jordan

This approach can account for more complex bias patterns that arise in ambiguous or hard labeling tasks and allows us to merge data curation and learning into a single computation.

Active Learning

Divide-and-Conquer Matrix Factorization

no code implementations NeurIPS 2011 Lester W. Mackey, Michael. I. Jordan, Ameet Talwalkar

This work introduces Divide-Factor-Combine (DFC), a parallel divide-and-conquer framework for noisy matrix factorization.

Collaborative Filtering

Combinatorial clustering and the beta negative binomial process

no code implementations8 Nov 2011 Tamara Broderick, Lester Mackey, John Paisley, Michael. I. Jordan

We show that the NBP is conjugate to the beta process, and we characterize the posterior distribution under the beta-negative binomial process (BNBP) and hierarchical models based on the BNBP (the HBNBP).

Object Recognition Semantic Segmentation

Distributed Matrix Completion and Robust Factorization

no code implementations5 Jul 2011 Lester Mackey, Ameet Talwalkar, Michael. I. Jordan

If learning methods are to scale to the massive sizes of modern datasets, it is essential for the field of machine learning to embrace parallel and distributed computing.

Collaborative Filtering Distributed Computing +1

Cluster Forests

no code implementations14 Apr 2011 Donghui Yan, Aiyou Chen, Michael. I. Jordan

The search for good local clusterings is guided by a cluster quality measure kappa.

Clustering Ensemble

Random Conic Pursuit for Semidefinite Programming

no code implementations NeurIPS 2010 Ariel Kleiner, Ali Rahimi, Michael. I. Jordan

We present a novel algorithm, Random Conic Pursuit, that solves semidefinite programs (SDPs) via repeated optimization over randomly selected two-dimensional subcones of the PSD cone.

Unsupervised Kernel Dimension Reduction

no code implementations NeurIPS 2010 Meihong Wang, Fei Sha, Michael. I. Jordan

In this framework, kernel-based measures of independence are used to derive low-dimensional representations that maximally capture information in covariates in order to predict responses.

Classification Dimensionality Reduction +1

Variational Inference over Combinatorial Spaces

no code implementations NeurIPS 2010 Alexandre Bouchard-Côté, Michael. I. Jordan

Since the discovery of sophisticated fully polynomial randomized algorithms for a range of #P problems (Karzanov et al., 1991; Jerrum et al., 2001; Wilson, 2004), theoretical work on approximate inference in combinatorial spaces has focused on Markov chain Monte Carlo methods.

Variational Inference

Heavy-Tailed Process Priors for Selective Shrinkage

no code implementations NeurIPS 2010 Fabian L. Wauthier, Michael. I. Jordan

Heavy-tailed distributions are often used to enhance the robustness of regression and classification methods to outliers in output space.

Gaussian Processes General Classification

Asymptotically Optimal Regularization in Smooth Parametric Models

no code implementations NeurIPS 2009 Percy S. Liang, Guillaume Bouchard, Francis R. Bach, Michael. I. Jordan

Many types of regularization schemes have been employed in statistical learning, each one motivated by some assumption about the problem domain.

Multi-Task Learning

Sharing Features among Dynamical Systems with Beta Processes

no code implementations NeurIPS 2009 Emily Fox, Michael. I. Jordan, Erik B. Sudderth, Alan S. Willsky

We propose a Bayesian nonparametric approach to relating multiple time series via a set of latent, dynamical behaviors.

Time Series

Nonparametric Latent Feature Models for Link Prediction

no code implementations NeurIPS 2009 Kurt Miller, Michael. I. Jordan, Thomas L. Griffiths

As the availability and importance of relational data -- such as the friendships summarized on a social networking website -- increases, it becomes increasingly important to have good models for such data.

Link Prediction

A sticky HDP-HMM with application to speaker diarization

no code implementations15 May 2009 Emily B. Fox, Erik B. Sudderth, Michael. I. Jordan, Alan S. Willsky

To address this problem, we take a Bayesian nonparametric approach to speaker diarization that builds on the hierarchical Dirichlet process hidden Markov model (HDP-HMM) of Teh et al. [J. Amer.

Speaker Diarization