Search Results for author: Mark Schmidt

Found 54 papers, 21 papers with code

Enhancing Policy Gradient with the Polyak Step-Size Adaption

no code implementations11 Apr 2024 Yunxiang Li, Rui Yuan, Chen Fan, Mark Schmidt, Samuel Horváth, Robert M. Gower, Martin Takáč

Policy gradient is a widely utilized and foundational algorithm in the field of reinforcement learning (RL).

Reinforcement Learning (RL)

Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation

no code implementations3 Apr 2024 Aaron Mishkin, Mert Pilanci, Mark Schmidt

This improvement is comparable to a square-root of the condition number in the worst case and address criticism that guarantees for stochastic acceleration could be worse than those for SGD.

Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models

no code implementations29 Feb 2024 Frederik Kunstner, Robin Yadav, Alan Milligan, Mark Schmidt, Alberto Bietti

We show that the heavy-tailed class imbalance found in language modeling tasks leads to difficulties in the optimization dynamics.

Language Modelling

Analyzing and Improving Greedy 2-Coordinate Updates for Equality-Constrained Optimization via Steepest Descent in the 1-Norm

no code implementations3 Jul 2023 Amrutha Varshini Ramesh, Aaron Mishkin, Mark Schmidt, Yihan Zhou, Jonathan Wilder Lavington, Jennifer She

We show that bound- and summation-constrained steepest descent in the L1-norm guarantees more progress per iteration than previous rules and can be computed in only $O(n \log n)$ time.

Fast Convergence of Random Reshuffling under Over-Parameterization and the Polyak-Łojasiewicz Condition

no code implementations2 Apr 2023 Chen Fan, Christos Thrampoulidis, Mark Schmidt

Modern machine learning models are often over-parameterized and as a result they can interpolate the training data.

Simplifying Momentum-based Positive-definite Submanifold Optimization with Applications to Deep Learning

1 code implementation20 Feb 2023 Wu Lin, Valentin Duruisseaux, Melvin Leok, Frank Nielsen, Mohammad Emtiyaz Khan, Mark Schmidt

Riemannian submanifold optimization with momentum is computationally challenging because, to ensure that the iterates remain on the submanifold, we often need to solve difficult differential equations.

Target-based Surrogates for Stochastic Optimization

1 code implementation6 Feb 2023 Jonathan Wilder Lavington, Sharan Vaswani, Reza Babanezhad, Mark Schmidt, Nicolas Le Roux

Our target optimization framework uses the (expensive) gradient computation to construct surrogate functions in a \emph{target space} (e. g. the logits output by a linear model for classification) that can be minimized efficiently.

Imitation Learning Stochastic Optimization

Improved Policy Optimization for Online Imitation Learning

1 code implementation29 Jul 2022 Jonathan Wilder Lavington, Sharan Vaswani, Mark Schmidt

Specifically, if the class of policies is sufficiently expressive to contain the expert policy, we prove that DAGGER achieves constant regret.

Imitation Learning

Structured second-order methods via natural gradient descent

no code implementations22 Jul 2021 Wu Lin, Frank Nielsen, Mohammad Emtiyaz Khan, Mark Schmidt

In this paper, we propose new structured second-order methods and structured adaptive-gradient methods obtained by performing natural-gradient descent on structured parameter spaces.

Second-order methods

SVRG Meets AdaGrad: Painless Variance Reduction

no code implementations18 Feb 2021 Benjamin Dubois-Taine, Sharan Vaswani, Reza Babanezhad, Mark Schmidt, Simon Lacoste-Julien

Variance reduction (VR) methods for finite-sum minimization typically require the knowledge of problem-dependent constants that are often unknown and difficult to estimate.

Tractable structured natural gradient descent using local parameterizations

no code implementations15 Feb 2021 Wu Lin, Frank Nielsen, Mohammad Emtiyaz Khan, Mark Schmidt

Natural-gradient descent (NGD) on structured parameter spaces (e. g., low-rank covariances) is computationally challenging due to difficult Fisher-matrix computations.

Variational Inference

The pathway elaboration method for mean first passage time estimation in large continuous-time Markov chains with applications to nucleic acid kinetics

no code implementations11 Jan 2021 Sedigheh Zolaktaf, Frits Dannenberg, Mark Schmidt, Anne Condon, Erik Winfree

We then compare the performance of pathway elaboration with the stochastic simulation algorithm (SSA) for MFPT estimation on 237 of the reactions for which SSA is feasible.

Robust Asymmetric Learning in POMDPs

1 code implementation31 Dec 2020 Andrew Warrington, J. Wilder Lavington, Adam Ścibior, Mark Schmidt, Frank Wood

Policies for partially observed Markov decision processes can be efficiently learned by imitating policies for the corresponding fully observed Markov decision processes.

Imitation Learning

Homeomorphic-Invariance of EM: Non-Asymptotic Convergence in KL Divergence for Exponential Families via Mirror Descent

no code implementations2 Nov 2020 Frederik Kunstner, Raunak Kumar, Mark Schmidt

In this work we first show that for the common setting of exponential family distributions, viewing EM as a mirror descent algorithm leads to convergence rates in Kullback-Leibler (KL) divergence.

Variance-Reduced Methods for Machine Learning

no code implementations2 Oct 2020 Robert M. Gower, Mark Schmidt, Francis Bach, Peter Richtarik

Stochastic optimization lies at the heart of machine learning, and its cornerstone is stochastic gradient descent (SGD), a method introduced over 60 years ago.

BIG-bench Machine Learning Stochastic Optimization

Adaptive Gradient Methods Converge Faster with Over-Parameterization (and you can do a line-search)

no code implementations28 Sep 2020 Sharan Vaswani, Issam H. Laradji, Frederik Kunstner, Si Yi Meng, Mark Schmidt, Simon Lacoste-Julien

Under an interpolation assumption, we prove that AMSGrad with a constant step-size and momentum can converge to the minimizer at the faster $O(1/T)$ rate for smooth, convex functions.

Binary Classification

Handling the Positive-Definite Constraint in the Bayesian Learning Rule

1 code implementation ICML 2020 Wu Lin, Mark Schmidt, Mohammad Emtiyaz Khan

The Bayesian learning rule is a natural-gradient variational inference method, which not only contains many existing learning algorithms as special cases but also enables the design of new algorithms.

valid Variational Inference

Stein's Lemma for the Reparameterization Trick with Exponential Family Mixtures

1 code implementation29 Oct 2019 Wu Lin, Mohammad Emtiyaz Khan, Mark Schmidt

Our generalization enables us to establish a connection between Stein's lemma and the reparamterization trick to derive gradients of expectations of a large class of functions under weak assumptions.

LEMMA

Fast and Furious Convergence: Stochastic Second Order Methods under Interpolation

1 code implementation11 Oct 2019 Si Yi Meng, Sharan Vaswani, Issam Laradji, Mark Schmidt, Simon Lacoste-Julien

Under this condition, we show that the regularized subsampled Newton method (R-SSN) achieves global linear convergence with an adaptive step-size and a constant batch-size.

Binary Classification Second-order methods

xRAC: Execution and Access Control for Restricted Application Containers on Managed Hosts

1 code implementation8 Jul 2019 Frederik Hauser, Mark Schmidt, Michael Menth

If the user is permitted to use the RAC on a managed host, launching the RAC is authorized and access to protected network resources may be given, e. g., to internal networks, servers, or the Internet.

Networking and Internet Architecture Cryptography and Security

Fast and Simple Natural-Gradient Variational Inference with Mixture of Exponential-family Approximations

1 code implementation7 Jun 2019 Wu Lin, Mohammad Emtiyaz Khan, Mark Schmidt

Natural-gradient methods enable fast and simple algorithms for variational inference, but due to computational difficulties, their use is mostly limited to \emph{minimal} exponential-family (EF) approximations.

Bayesian Inference Variational Inference

Efficient Deep Gaussian Process Models for Variable-Sized Input

1 code implementation16 May 2019 Issam H. Laradji, Mark Schmidt, Vladimir Pavlovic, Minyoung Kim

The key advantage is that the combination of GP and DRF leads to a tractable model that can both handle a variable-sized input as well as learn deep long-range dependency structures of the data.

Gaussian Processes Uncertainty Quantification

Distributed Maximization of Submodular plus Diversity Functions for Multi-label Feature Selection on Huge Datasets

no code implementations20 Mar 2019 Mehrdad Ghadiri, Mark Schmidt

In this paper, we consider this problem as an optimization problem that seeks to maximize the sum of a sum-sum diversity function and a non-negative monotone submodular function.

Data Summarization feature selection +1

SLANG: Fast Structured Covariance Approximations for Bayesian Deep Learning with Natural Gradient

2 code implementations NeurIPS 2018 Aaron Mishkin, Frederik Kunstner, Didrik Nielsen, Mark Schmidt, Mohammad Emtiyaz Khan

Uncertainty estimation in large deep-learning models is a computationally challenging task, where it is difficult to form even a Gaussian approximation to the posterior distribution.

Variational Inference

Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron

no code implementations16 Oct 2018 Sharan Vaswani, Francis Bach, Mark Schmidt

Under this condition, we prove that constant step-size stochastic gradient descent (SGD) with Nesterov acceleration matches the convergence rate of the deterministic accelerated method for both convex and strongly-convex functions.

Combining Bayesian Optimization and Lipschitz Optimization

no code implementations10 Oct 2018 Mohamed Osama Ahmed, Sharan Vaswani, Mark Schmidt

Indeed, in a particular setting, we prove that using the Lipschitz information yields the same or a better bound on the regret compared to using Bayesian optimization on its own.

Bayesian Optimization Thompson Sampling

A Less Biased Evaluation of Out-of-distribution Sample Detectors

3 code implementations13 Sep 2018 Alireza Shafaei, Mark Schmidt, James J. Little

What makes this problem different from a typical supervised learning setting is that the distribution of outliers used in training may not be the same as the distribution of outliers encountered in the application.

Image Classification

Where are the Blobs: Counting by Localization with Point Supervision

3 code implementations ECCV 2018 Issam H. Laradji, Negar Rostamzadeh, Pedro O. Pinheiro, David Vazquez, Mark Schmidt

However, we propose a detection-based method that does not need to estimate the size and shape of the objects and that outperforms regression-based methods.

Object Object Counting +1

Let's Make Block Coordinate Descent Converge Faster: Faster Greedy Rules, Message-Passing, Active-Set Complexity, and Superlinear Convergence

1 code implementation23 Dec 2017 Julie Nutini, Issam Laradji, Mark Schmidt

Block coordinate descent (BCD) methods are widely used for large-scale numerical optimization because of their cheap iteration costs, low memory requirements, amenability to parallelization, and ability to exploit problem structure.

Optimization and Control 90C06

Online Learning Rate Adaptation with Hypergradient Descent

3 code implementations ICLR 2018 Atilim Gunes Baydin, Robert Cornish, David Martinez Rubio, Mark Schmidt, Frank Wood

We introduce a general method for improving the convergence rate of gradient-based optimizers that is easy to implement and works well in practice.

Hyperparameter Optimization Stochastic Optimization

Horde of Bandits using Gaussian Markov Random Fields

no code implementations7 Mar 2017 Sharan Vaswani, Mark Schmidt, Laks. V. S. Lakshmanan

The gang of bandits (GOB) model \cite{cesa2013gang} is a recent contextual bandits framework that shares information between a set of bandit problems, related by a known (possibly noisy) graph.

Clustering Multi-Armed Bandits +2

Model-Independent Online Learning for Influence Maximization

no code implementations ICML 2017 Sharan Vaswani, Branislav Kveton, Zheng Wen, Mohammad Ghavamzadeh, Laks Lakshmanan, Mark Schmidt

We consider influence maximization (IM) in social networks, which is the problem of maximizing the number of users that become aware of a product by selecting a set of "seed" users to expose the product to.

Fast Patch-based Style Transfer of Arbitrary Style

6 code implementations13 Dec 2016 Tian Qi Chen, Mark Schmidt

This results in a procedure for artistic style transfer that is efficient but also allows arbitrary content and style images.

Image Generation Style Transfer

Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition

no code implementations16 Aug 2016 Hamed Karimi, Julie Nutini, Mark Schmidt

In 1963, Polyak proposed a simple condition that is sufficient to show a global linear convergence rate for gradient descent.

Play and Learn: Using Video Games to Train Computer Vision Models

no code implementations5 Aug 2016 Alireza Shafaei, James J. Little, Mark Schmidt

We present experiments assessing the effectiveness on real-world data of systems trained on synthetic RGB images that are extracted from a video game.

Depth Estimation Domain Adaptation +3

StopWasting My Gradients: Practical SVRG

no code implementations NeurIPS 2015 Reza Harikandeh, Mohamed Osama Ahmed, Alim Virani, Mark Schmidt, Jakub Konečný, Scott Sallinen

We present and analyze several strategies for improving the performance ofstochastic variance-reduced gradient (SVRG) methods.

Stop Wasting My Gradients: Practical SVRG

no code implementations5 Nov 2015 Reza Babanezhad, Mohamed Osama Ahmed, Alim Virani, Mark Schmidt, Jakub Konečný, Scott Sallinen

We present and analyze several strategies for improving the performance of stochastic variance-reduced gradient (SVRG) methods.

Coordinate Descent Converges Faster with the Gauss-Southwell Rule Than Random Selection

no code implementations1 Jun 2015 Julie Nutini, Mark Schmidt, Issam H. Laradji, Michael Friedlander, Hoyt Koepke

There has been significant recent work on the theory and application of randomized coordinate descent algorithms, beginning with the work of Nesterov [SIAM J.

Influence Maximization with Bandits

no code implementations27 Feb 2015 Sharan Vaswani, Laks. V. S. Lakshmanan, Mark Schmidt

We consider the problem of \emph{influence maximization}, the problem of maximizing the number of people that become aware of a product by finding the `best' set of `seed' users to expose the product to.

Hierarchical Maximum-Margin Clustering

no code implementations6 Feb 2015 Guang-Tong Zhou, Sung Ju Hwang, Mark Schmidt, Leonid Sigal, Greg Mori

We present a hierarchical maximum-margin clustering method for unsupervised data analysis.

Clustering

Convex Optimization for Big Data

no code implementations4 Nov 2014 Volkan Cevher, Stephen Becker, Mark Schmidt

This article reviews recent advances in convex optimization algorithms for Big Data, which aim to reduce the computational, storage, and communications bottlenecks.

Minimizing Finite Sums with the Stochastic Average Gradient

2 code implementations10 Sep 2013 Mark Schmidt, Nicolas Le Roux, Francis Bach

Further, in many cases the convergence rate of the new method is also faster than black-box deterministic gradient methods, in terms of the number of gradient evaluations.

A Stochastic Gradient Method with an Exponential Convergence _Rate for Finite Training Sets

no code implementations NeurIPS 2012 Nicolas L. Roux, Mark Schmidt, Francis R. Bach

We propose a new stochastic gradient method for optimizing the sum of a finite set of smooth functions, where the sum is strongly convex.

BIG-bench Machine Learning

Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization

no code implementations NeurIPS 2011 Mark Schmidt, Nicolas L. Roux, Francis R. Bach

We consider the problem of optimizing the sum of a smooth convex function and a non-smooth convex function using proximal-gradient methods, where an error is present in the calculation of the gradient of the smooth term or in the proximity operator with respect to the second term.

Cannot find the paper you are looking for? You can Submit a new open access paper.