Search Results for author: Manfred K. Warmuth

Found 32 papers, 3 papers with code

LocoProp: Enhancing BackProp via Local Loss Optimization

no code implementations11 Jun 2021 Ehsan Amid, Rohan Anil, Manfred K. Warmuth

We start by motivating the problem as minimizing a squared loss between the pre-activations of each layer and a local target, plus a regularizer term on the weights.

Exponentiated Gradient Reweighting for Robust Training Under Label Noise and Beyond

no code implementations3 Apr 2021 Negin Majidi, Ehsan Amid, Hossein Talebi, Manfred K. Warmuth

Many learning tasks in machine learning can be viewed as taking a gradient step towards minimizing the average loss of a batch of examples in each training iteration.

A case where a spindly two-layer linear network whips any neural network with a fully connected input layer

no code implementations16 Oct 2020 Manfred K. Warmuth, Wojciech Kotłowski, Ehsan Amid

It was conjectured that any neural network of any structure and arbitrary differentiable transfer functions at the nodes cannot learn the following problem sample efficiently when trained with gradient descent: The instances are the rows of a $d$-dimensional Hadamard matrix and the target is one of the features, i. e. very sparse.

Reparameterizing Mirror Descent as Gradient Descent

no code implementations NeurIPS 2020 Ehsan Amid, Manfred K. Warmuth

We present a general framework for casting a mirror descent update as a gradient descent update on a different set of parameters.

TriMap: Large-scale Dimensionality Reduction Using Triplets

1 code implementation1 Oct 2019 Ehsan Amid, Manfred K. Warmuth

We empirically show the excellent performance of TriMap on a large variety of datasets in terms of the quality of the embedding as well as the runtime.

Dimensionality Reduction

An Implicit Form of Krasulina's k-PCA Update without the Orthonormality Constraint

no code implementations11 Sep 2019 Ehsan Amid, Manfred K. Warmuth

We show that Krasulina's update corresponds to a projected gradient descent step on the Stiefel manifold of the orthonormal $k$-frames, while Oja's update amounts to a gradient descent step using the unprojected gradient.

Unbiased estimators for random design regression

no code implementations8 Jul 2019 Michał Dereziński, Manfred K. Warmuth, Daniel Hsu

We use them to show that for any input distribution and $\epsilon>0$ there is a random design consisting of $O(d\log d+ d/\epsilon)$ points from which an unbiased estimator can be constructed whose square loss over the entire distribution is with high probability bounded by $1+\epsilon$ times the loss of the optimum.

Robust Bi-Tempered Logistic Loss Based on Bregman Divergences

5 code implementations NeurIPS 2019 Ehsan Amid, Manfred K. Warmuth, Rohan Anil, Tomer Koren

We introduce a temperature into the exponential function and replace the softmax output layer of neural nets by a high temperature generalization.

Adaptive scale-invariant online algorithms for learning linear models

no code implementations20 Feb 2019 Michał Kempka, Wojciech Kotłowski, Manfred K. Warmuth

We consider online learning with linear models, where the algorithm predicts on sequentially revealed instances (feature vectors), and is compared against the best linear function (comparator) in hindsight.

Divergence-Based Motivation for Online EM and Combining Hidden Variable Models

no code implementations11 Feb 2019 Ehsan Amid, Manfred K. Warmuth

Expectation-Maximization (EM) is a prominent approach for parameter estimation of hidden (aka latent) variable models.

Latent Variable Models

Unlabeled sample compression schemes and corner peelings for ample and maximum classes

no code implementations5 Dec 2018 Jérémie Chalopin, Victor Chepoi, Shay Moran, Manfred K. Warmuth

On the positive side we present a new construction of an unlabeled sample compression scheme for maximum classes.

Correcting the bias in least squares regression with volume-rescaled sampling

no code implementations4 Oct 2018 Michał Dereziński, Manfred K. Warmuth, Daniel Hsu

Without any assumptions on the noise, the linear least squares solution for any i. i. d.

Reverse iterative volume sampling for linear regression

no code implementations6 Jun 2018 Michał Dereziński, Manfred K. Warmuth

We can only afford to attain the responses for a small subset of the points that are then used to construct linear predictions for all points in the dataset.

Online Non-Additive Path Learning under Full and Partial Information

no code implementations18 Apr 2018 Corinna Cortes, Vitaly Kuznetsov, Mehryar Mohri, Holakou Rahmanian, Manfred K. Warmuth

We study the problem of online path learning with non-additive gains, which is a central problem appearing in several applications, including ensemble structured prediction.

Structured Prediction

Speech Recognition: Keyword Spotting Through Image Recognition

no code implementations10 Mar 2018 Sanjay Krishna Gouda, Salil Kanetkar, David Harrison, Manfred K. Warmuth

The problem of identifying voice commands has always been a challenge due to the presence of noise and variability in speed, pitch, etc.

Image Classification Keyword Spotting +1

A more globally accurate dimensionality reduction method using triplets

1 code implementation1 Mar 2018 Ehsan Amid, Manfred K. Warmuth

We first show that the commonly used dimensionality reduction (DR) methods such as t-SNE and LargeVis poorly capture the global structure of the data in the low dimensional embedding.

Dimensionality Reduction

Leveraged volume sampling for linear regression

no code implementations NeurIPS 2018 Michał Dereziński, Manfred K. Warmuth, Daniel Hsu

We then develop a new rescaled variant of volume sampling that produces an unbiased estimate which avoids this bad behavior and has at least as good a tail bound as leverage score sampling: sample size $k=O(d\log d + d/\epsilon)$ suffices to guarantee total loss at most $1+\epsilon$ times the minimum with high probability.

Point Processes

Subsampling for Ridge Regression via Regularized Volume Sampling

no code implementations14 Oct 2017 Michał Dereziński, Manfred K. Warmuth

However, when labels are expensive, we are forced to select only a small subset of vectors $\mathbf{x}_i$ for which we obtain the labels $y_i$.

Online Dynamic Programming

no code implementations NeurIPS 2017 Holakou Rahmanian, Manfred K. Warmuth

We consider the problem of repeatedly solving a variant of the same dynamic programming problem in successive trials.

Unbiased estimates for linear regression via volume sampling

no code implementations NeurIPS 2017 Michał Dereziński, Manfred K. Warmuth

Pseudo inverse plays an important part in solving the linear least squares problem, where we try to predict a label for each column of $X$.

Two-temperature logistic regression based on the Tsallis divergence

no code implementations19 May 2017 Ehsan Amid, Manfred K. Warmuth, Sriram Srinivasan

We explain this by showing that $t_1 < 1$ caps the surrogate loss and $t_2 >1$ makes the predictive distribution have a heavy tail.

Low-dimensional Data Embedding via Robust Ranking

no code implementations30 Nov 2016 Ehsan Amid, Nikos Vlassis, Manfred K. Warmuth

We describe a new method called t-ETE for finding a low-dimensional embedding of a set of objects in Euclidean space.

PCA with Gaussian perturbations

no code implementations16 Jun 2015 Wojciech Kotłowski, Manfred K. Warmuth

We develop a simple algorithm that needs $O(kn^2)$ per trial whose regret is off by a small factor of $O(n^{1/4})$.

Labeled compression schemes for extremal classes

no code implementations30 May 2015 Shay Moran, Manfred K. Warmuth

We consider a generalization of maximum classes called extremal classes.

The limits of squared Euclidean distance regularization

no code implementations NeurIPS 2014 Michal Derezinski, Manfred K. Warmuth

We conjecture that our hardness results hold for any training algorithm that is based on the squared Euclidean distance regularization (i. e. Back-propagation with the Weight Decay heuristic).

A Bayesian Probability Calculus for Density Matrices

no code implementations9 Aug 2014 Manfred K. Warmuth, Dima Kuzmin

Finite probability distributions are a special case where the density matrix is restricted to be diagonal.

On-line PCA with Optimal Regrets

no code implementations17 Jun 2013 Jiazhong Nie, Wojciech Kotlowski, Manfred K. Warmuth

Furthermore, we show that when considering regret bounds as function of a loss budget, EG remains optimal and strictly outperforms GD.

Putting Bayes to sleep

no code implementations NeurIPS 2012 Dmitry Adamskiy, Manfred K. Warmuth, Wouter M. Koolen

If the nature of the data is changing over time in that different models predict well on different segments of the data, then adaptivity is typically achieved by mixing into the weights in each round a bit of the initial prior (kind of like a weak restart).

Learning Eigenvectors for Free

no code implementations NeurIPS 2011 Wouter M. Koolen, Wojciech Kotlowski, Manfred K. Warmuth

In this extension, the alphabet of $n$ outcomes is replaced by the set of all dyads, i. e. outer products $\u\u^\top$ where $\u$ is a vector in $\R^n$ of unit length.

Cannot find the paper you are looking for? You can Submit a new open access paper.