Search Results for author: Manfred K. Warmuth

Found 40 papers, 4 papers with code

Repeated Games against Budgeted Adversaries

no code implementations • NeurIPS 2010 • Jacob D. Abernethy, Manfred K. Warmuth

We study repeated zero-sum games against an adversary on a budget.

Paper
Add Code

Learning Eigenvectors for Free

no code implementations • NeurIPS 2011 • Wouter M. Koolen, Wojciech Kotlowski, Manfred K. Warmuth

In this extension, the alphabet of $n$ outcomes is replaced by the set of all dyads, i. e. outer products $\u\u^\top$ where $\u$ is a vector in $\R^n$ of unit length.

Paper
Add Code

Putting Bayes to sleep

no code implementations • NeurIPS 2012 • Dmitry Adamskiy, Manfred K. Warmuth, Wouter M. Koolen

If the nature of the data is changing over time in that different models predict well on different segments of the data, then adaptivity is typically achieved by mixing into the weights in each round a bit of the initial prior (kind of like a weak restart).

Paper
Add Code

On-line PCA with Optimal Regrets

no code implementations • 17 Jun 2013 • Jiazhong Nie, Wojciech Kotlowski, Manfred K. Warmuth

Furthermore, we show that when considering regret bounds as function of a loss budget, EG remains optimal and strictly outperforms GD.

Paper
Add Code

A Bayesian Probability Calculus for Density Matrices

no code implementations • 9 Aug 2014 • Manfred K. Warmuth, Dima Kuzmin

Finite probability distributions are a special case where the density matrix is restricted to be diagonal.

Paper
Add Code

The limits of squared Euclidean distance regularization

no code implementations • NeurIPS 2014 • Michal Derezinski, Manfred K. Warmuth

We conjecture that our hardness results hold for any training algorithm that is based on the squared Euclidean distance regularization (i. e. Back-propagation with the Weight Decay heuristic).

Paper
Add Code

Labeled compression schemes for extremal classes

no code implementations • 30 May 2015 • Shay Moran, Manfred K. Warmuth

We consider a generalization of maximum classes called extremal classes.

Paper
Add Code

PCA with Gaussian perturbations

no code implementations • 16 Jun 2015 • Wojciech Kotłowski, Manfred K. Warmuth

We develop a simple algorithm that needs $O(kn^2)$ per trial whose regret is off by a small factor of $O(n^{1/4})$.

Paper
Add Code

Low-dimensional Data Embedding via Robust Ranking

no code implementations • 30 Nov 2016 • Ehsan Amid, Nikos Vlassis, Manfred K. Warmuth

We describe a new method called t-ETE for finding a low-dimensional embedding of a set of objects in Euclidean space.

Paper
Add Code

Unbiased estimates for linear regression via volume sampling

no code implementations • NeurIPS 2017 • Michał Dereziński, Manfred K. Warmuth

Pseudo inverse plays an important part in solving the linear least squares problem, where we try to predict a label for each column of $X$.

regression

Paper
Add Code

Two-temperature logistic regression based on the Tsallis divergence

no code implementations • 19 May 2017 • Ehsan Amid, Manfred K. Warmuth, Sriram Srinivasan

We explain this by showing that $t_1 < 1$ caps the surrogate loss and $t_2 >1$ makes the predictive distribution have a heavy tail.

regression Vocal Bursts Valence Prediction

Paper
Add Code

Online Dynamic Programming

no code implementations • NeurIPS 2017 • Holakou Rahmanian, Manfred K. Warmuth

We consider the problem of repeatedly solving a variant of the same dynamic programming problem in successive trials.

Paper
Add Code

Subsampling for Ridge Regression via Regularized Volume Sampling

no code implementations • 14 Oct 2017 • Michał Dereziński, Manfred K. Warmuth

However, when labels are expensive, we are forced to select only a small subset of vectors $\mathbf{x}_i$ for which we obtain the labels $y_i$.

regression

Paper
Add Code

Leveraged volume sampling for linear regression

no code implementations • NeurIPS 2018 • Michał Dereziński, Manfred K. Warmuth, Daniel Hsu

We then develop a new rescaled variant of volume sampling that produces an unbiased estimate which avoids this bad behavior and has at least as good a tail bound as leverage score sampling: sample size $k=O(d\log d + d/\epsilon)$ suffices to guarantee total loss at most $1+\epsilon$ times the minimum with high probability.

Point Processes regression

Paper
Add Code

A more globally accurate dimensionality reduction method using triplets

1 code implementation • 1 Mar 2018 • Ehsan Amid, Manfred K. Warmuth

We first show that the commonly used dimensionality reduction (DR) methods such as t-SNE and LargeVis poorly capture the global structure of the data in the low dimensional embedding.

Dimensionality Reduction

285

Paper
Code

Speech Recognition: Keyword Spotting Through Image Recognition

no code implementations • 10 Mar 2018 • Sanjay Krishna Gouda, Salil Kanetkar, David Harrison, Manfred K. Warmuth

The problem of identifying voice commands has always been a challenge due to the presence of noise and variability in speed, pitch, etc.

Image Classification Keyword Spotting +2

Paper
Add Code

Online Non-Additive Path Learning under Full and Partial Information

no code implementations • 18 Apr 2018 • Corinna Cortes, Vitaly Kuznetsov, Mehryar Mohri, Holakou Rahmanian, Manfred K. Warmuth

We study the problem of online path learning with non-additive gains, which is a central problem appearing in several applications, including ensemble structured prediction.

Structured Prediction

Paper
Add Code

Reverse iterative volume sampling for linear regression

no code implementations • 6 Jun 2018 • Michał Dereziński, Manfred K. Warmuth

We can only afford to attain the responses for a small subset of the points that are then used to construct linear predictions for all points in the dataset.

BIG-bench Machine Learning regression

Paper
Add Code

Correcting the bias in least squares regression with volume-rescaled sampling

no code implementations • 4 Oct 2018 • Michał Dereziński, Manfred K. Warmuth, Daniel Hsu

Without any assumptions on the noise, the linear least squares solution for any i. i. d.

regression

Paper
Add Code

Unlabeled sample compression schemes and corner peelings for ample and maximum classes

no code implementations • 5 Dec 2018 • Jérémie Chalopin, Victor Chepoi, Shay Moran, Manfred K. Warmuth

On the positive side we present a new construction of an unlabeled sample compression scheme for maximum classes.

BIG-bench Machine Learning

Paper
Add Code

Minimax experimental design: Bridging the gap between statistical and worst-case approaches to least squares regression

no code implementations • 4 Feb 2019 • Michał Dereziński, Kenneth L. Clarkson, Michael W. Mahoney, Manfred K. Warmuth

In the process, we develop a new algorithm for a joint sampling distribution called volume sampling, and we propose a new i. i. d.

Experimental Design regression

Paper
Add Code

Divergence-Based Motivation for Online EM and Combining Hidden Variable Models

no code implementations • 11 Feb 2019 • Ehsan Amid, Manfred K. Warmuth

Expectation-Maximization (EM) is a prominent approach for parameter estimation of hidden (aka latent) variable models.

Paper
Add Code

Adaptive scale-invariant online algorithms for learning linear models

no code implementations • 20 Feb 2019 • Michał Kempka, Wojciech Kotłowski, Manfred K. Warmuth

We consider online learning with linear models, where the algorithm predicts on sequentially revealed instances (feature vectors), and is compared against the best linear function (comparator) in hindsight.

Paper
Add Code

Robust Bi-Tempered Logistic Loss Based on Bregman Divergences

6 code implementations • NeurIPS 2019 • Ehsan Amid, Manfred K. Warmuth, Rohan Anil, Tomer Koren

We introduce a temperature into the exponential function and replace the softmax output layer of neural nets by a high temperature generalization.

146

Paper
Code

Unbiased estimators for random design regression

no code implementations • 8 Jul 2019 • Michał Dereziński, Manfred K. Warmuth, Daniel Hsu

We use them to show that for any input distribution and $\epsilon>0$ there is a random design consisting of $O(d\log d+ d/\epsilon)$ points from which an unbiased estimator can be constructed whose expected square loss over the entire distribution is bounded by $1+\epsilon$ times the loss of the optimum.

regression

Paper
Add Code

An Implicit Form of Krasulina's k-PCA Update without the Orthonormality Constraint

no code implementations • 11 Sep 2019 • Ehsan Amid, Manfred K. Warmuth

We show that Krasulina's update corresponds to a projected gradient descent step on the Stiefel manifold of the orthonormal $k$-frames, while Oja's update amounts to a gradient descent step using the unprojected gradient.

Paper
Add Code

TriMap: Large-scale Dimensionality Reduction Using Triplets

1 code implementation • 1 Oct 2019 • Ehsan Amid, Manfred K. Warmuth

We empirically show the excellent performance of TriMap on a large variety of datasets in terms of the quality of the embedding as well as the runtime.

Dimensionality Reduction

285

Paper
Code

Reparameterizing Mirror Descent as Gradient Descent

no code implementations • NeurIPS 2020 • Ehsan Amid, Manfred K. Warmuth

We present a general framework for casting a mirror descent update as a gradient descent update on a different set of parameters.

Paper
Add Code

A case where a spindly two-layer linear network whips any neural network with a fully connected input layer

no code implementations • 16 Oct 2020 • Manfred K. Warmuth, Wojciech Kotłowski, Ehsan Amid

It was conjectured that any neural network of any structure and arbitrary differentiable transfer functions at the nodes cannot learn the following problem sample efficiently when trained with gradient descent: The instances are the rows of a $d$-dimensional Hadamard matrix and the target is one of the features, i. e. very sparse.

Paper
Add Code

Rank-smoothed Pairwise Learning In Perceptual Quality Assessment

no code implementations • 21 Nov 2020 • Hossein Talebi, Ehsan Amid, Peyman Milanfar, Manfred K. Warmuth

Training a model on these pairwise preferences is a common deep learning approach.

Image Quality Assessment

Paper
Add Code

Exponentiated Gradient Reweighting for Robust Training Under Label Noise and Beyond

no code implementations • 3 Apr 2021 • Negin Majidi, Ehsan Amid, Hossein Talebi, Manfred K. Warmuth

Many learning tasks in machine learning can be viewed as taking a gradient step towards minimizing the average loss of a batch of examples in each training iteration.

Paper
Add Code

LocoProp: Enhancing BackProp via Local Loss Optimization

1 code implementation • 11 Jun 2021 • Ehsan Amid, Rohan Anil, Manfred K. Warmuth

Second-order methods have shown state-of-the-art performance for optimizing deep neural networks.

Second-order methods

32,705

Paper
Code

Step-size Adaptation Using Exponentiated Gradient Updates

no code implementations • 31 Jan 2022 • Ehsan Amid, Rohan Anil, Christopher Fifty, Manfred K. Warmuth

In this paper, we update the step-size scale and the gain variables with exponentiated gradient updates instead.

Paper
Add Code

Learning from Randomly Initialized Neural Network Features

no code implementations • 13 Feb 2022 • Ehsan Amid, Rohan Anil, Wojciech Kotłowski, Manfred K. Warmuth

We present the surprising result that randomly initialized neural networks are good feature extractors in expectation.

Paper
Add Code

Layerwise Bregman Representation Learning with Applications to Knowledge Distillation

no code implementations • 15 Sep 2022 • Ehsan Amid, Rohan Anil, Christopher Fifty, Manfred K. Warmuth

In this work, we propose a novel approach for layerwise representation learning of a trained neural network.

Knowledge Distillation Representation Learning

Paper
Add Code

A Mechanism for Sample-Efficient In-Context Learning for Sparse Retrieval Tasks

no code implementations • 26 May 2023 • Jacob Abernethy, Alekh Agarwal, Teodor V. Marinov, Manfred K. Warmuth

We study the phenomenon of \textit{in-context learning} (ICL) exhibited by large language models, where they can adapt to a new learning task, given a handful of labeled examples, without any explicit parameter optimization.

In-Context Learning Retrieval

Paper
Add Code

Optimal Transport with Tempered Exponential Measures

no code implementations • 7 Sep 2023 • Ehsan Amid, Frank Nielsen, Richard Nock, Manfred K. Warmuth

In the field of optimal transport, two prominent subfields face each other: (i) unregularized optimal transport, "\`a-la-Kantorovich", which leads to extremely sparse plans but with algorithms that scale poorly, and (ii) entropic-regularized optimal transport, "\`a-la-Sinkhorn-Cuturi", which gets near-linear approximation algorithms but leads to maximally un-sparse plans.

Paper
Add Code

The Tempered Hilbert Simplex Distance and Its Application To Non-linear Embeddings of TEMs

no code implementations • 22 Nov 2023 • Ehsan Amid, Frank Nielsen, Richard Nock, Manfred K. Warmuth

Tempered Exponential Measures (TEMs) are a parametric generalization of the exponential family of distributions maximizing the tempered entropy function among positive measures subject to a probability normalization of their power densities.

Paper
Add Code

Tempered Calculus for ML: Application to Hyperbolic Model Embedding

no code implementations • 6 Feb 2024 • Richard Nock, Ehsan Amid, Frank Nielsen, Alexander Soen, Manfred K. Warmuth

Most mathematical distortions used in ML are fundamentally integral in nature: $f$-divergences, Bregman divergences, (regularized) optimal transport distances, integral probability metrics, geodesic distances, etc.

Paper
Add Code

Noise misleads rotation invariant algorithms on sparse targets

no code implementations • 5 Mar 2024 • Manfred K. Warmuth, Wojciech Kotłowski, Matt Jones, Ehsan Amid

It is well known that the class of rotation invariant algorithms are suboptimal even for learning sparse linear problems when the number of examples is below the "dimension" of the problem.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.