Search Results for author: Manfred K. Warmuth

Found 40 papers, 4 papers with code

Noise misleads rotation invariant algorithms on sparse targets

no code implementations5 Mar 2024 Manfred K. Warmuth, Wojciech Kotłowski, Matt Jones, Ehsan Amid

It is well known that the class of rotation invariant algorithms are suboptimal even for learning sparse linear problems when the number of examples is below the "dimension" of the problem.

Tempered Calculus for ML: Application to Hyperbolic Model Embedding

no code implementations6 Feb 2024 Richard Nock, Ehsan Amid, Frank Nielsen, Alexander Soen, Manfred K. Warmuth

Most mathematical distortions used in ML are fundamentally integral in nature: $f$-divergences, Bregman divergences, (regularized) optimal transport distances, integral probability metrics, geodesic distances, etc.

The Tempered Hilbert Simplex Distance and Its Application To Non-linear Embeddings of TEMs

no code implementations22 Nov 2023 Ehsan Amid, Frank Nielsen, Richard Nock, Manfred K. Warmuth

Tempered Exponential Measures (TEMs) are a parametric generalization of the exponential family of distributions maximizing the tempered entropy function among positive measures subject to a probability normalization of their power densities.

Optimal Transport with Tempered Exponential Measures

no code implementations7 Sep 2023 Ehsan Amid, Frank Nielsen, Richard Nock, Manfred K. Warmuth

In the field of optimal transport, two prominent subfields face each other: (i) unregularized optimal transport, "\`a-la-Kantorovich", which leads to extremely sparse plans but with algorithms that scale poorly, and (ii) entropic-regularized optimal transport, "\`a-la-Sinkhorn-Cuturi", which gets near-linear approximation algorithms but leads to maximally un-sparse plans.

A Mechanism for Sample-Efficient In-Context Learning for Sparse Retrieval Tasks

no code implementations26 May 2023 Jacob Abernethy, Alekh Agarwal, Teodor V. Marinov, Manfred K. Warmuth

We study the phenomenon of \textit{in-context learning} (ICL) exhibited by large language models, where they can adapt to a new learning task, given a handful of labeled examples, without any explicit parameter optimization.

In-Context Learning Retrieval

Learning from Randomly Initialized Neural Network Features

no code implementations13 Feb 2022 Ehsan Amid, Rohan Anil, Wojciech Kotłowski, Manfred K. Warmuth

We present the surprising result that randomly initialized neural networks are good feature extractors in expectation.

Step-size Adaptation Using Exponentiated Gradient Updates

no code implementations31 Jan 2022 Ehsan Amid, Rohan Anil, Christopher Fifty, Manfred K. Warmuth

In this paper, we update the step-size scale and the gain variables with exponentiated gradient updates instead.

LocoProp: Enhancing BackProp via Local Loss Optimization

1 code implementation11 Jun 2021 Ehsan Amid, Rohan Anil, Manfred K. Warmuth

Second-order methods have shown state-of-the-art performance for optimizing deep neural networks.

Second-order methods

Exponentiated Gradient Reweighting for Robust Training Under Label Noise and Beyond

no code implementations3 Apr 2021 Negin Majidi, Ehsan Amid, Hossein Talebi, Manfred K. Warmuth

Many learning tasks in machine learning can be viewed as taking a gradient step towards minimizing the average loss of a batch of examples in each training iteration.

A case where a spindly two-layer linear network whips any neural network with a fully connected input layer

no code implementations16 Oct 2020 Manfred K. Warmuth, Wojciech Kotłowski, Ehsan Amid

It was conjectured that any neural network of any structure and arbitrary differentiable transfer functions at the nodes cannot learn the following problem sample efficiently when trained with gradient descent: The instances are the rows of a $d$-dimensional Hadamard matrix and the target is one of the features, i. e. very sparse.

Reparameterizing Mirror Descent as Gradient Descent

no code implementations NeurIPS 2020 Ehsan Amid, Manfred K. Warmuth

We present a general framework for casting a mirror descent update as a gradient descent update on a different set of parameters.

TriMap: Large-scale Dimensionality Reduction Using Triplets

1 code implementation1 Oct 2019 Ehsan Amid, Manfred K. Warmuth

We empirically show the excellent performance of TriMap on a large variety of datasets in terms of the quality of the embedding as well as the runtime.

Dimensionality Reduction

An Implicit Form of Krasulina's k-PCA Update without the Orthonormality Constraint

no code implementations11 Sep 2019 Ehsan Amid, Manfred K. Warmuth

We show that Krasulina's update corresponds to a projected gradient descent step on the Stiefel manifold of the orthonormal $k$-frames, while Oja's update amounts to a gradient descent step using the unprojected gradient.

Unbiased estimators for random design regression

no code implementations8 Jul 2019 Michał Dereziński, Manfred K. Warmuth, Daniel Hsu

We use them to show that for any input distribution and $\epsilon>0$ there is a random design consisting of $O(d\log d+ d/\epsilon)$ points from which an unbiased estimator can be constructed whose expected square loss over the entire distribution is bounded by $1+\epsilon$ times the loss of the optimum.


Robust Bi-Tempered Logistic Loss Based on Bregman Divergences

6 code implementations NeurIPS 2019 Ehsan Amid, Manfred K. Warmuth, Rohan Anil, Tomer Koren

We introduce a temperature into the exponential function and replace the softmax output layer of neural nets by a high temperature generalization.

Adaptive scale-invariant online algorithms for learning linear models

no code implementations20 Feb 2019 Michał Kempka, Wojciech Kotłowski, Manfred K. Warmuth

We consider online learning with linear models, where the algorithm predicts on sequentially revealed instances (feature vectors), and is compared against the best linear function (comparator) in hindsight.

Divergence-Based Motivation for Online EM and Combining Hidden Variable Models

no code implementations11 Feb 2019 Ehsan Amid, Manfred K. Warmuth

Expectation-Maximization (EM) is a prominent approach for parameter estimation of hidden (aka latent) variable models.

Reverse iterative volume sampling for linear regression

no code implementations6 Jun 2018 Michał Dereziński, Manfred K. Warmuth

We can only afford to attain the responses for a small subset of the points that are then used to construct linear predictions for all points in the dataset.

BIG-bench Machine Learning regression

Online Non-Additive Path Learning under Full and Partial Information

no code implementations18 Apr 2018 Corinna Cortes, Vitaly Kuznetsov, Mehryar Mohri, Holakou Rahmanian, Manfred K. Warmuth

We study the problem of online path learning with non-additive gains, which is a central problem appearing in several applications, including ensemble structured prediction.

Structured Prediction

Speech Recognition: Keyword Spotting Through Image Recognition

no code implementations10 Mar 2018 Sanjay Krishna Gouda, Salil Kanetkar, David Harrison, Manfred K. Warmuth

The problem of identifying voice commands has always been a challenge due to the presence of noise and variability in speed, pitch, etc.

Image Classification Keyword Spotting +2

A more globally accurate dimensionality reduction method using triplets

1 code implementation1 Mar 2018 Ehsan Amid, Manfred K. Warmuth

We first show that the commonly used dimensionality reduction (DR) methods such as t-SNE and LargeVis poorly capture the global structure of the data in the low dimensional embedding.

Dimensionality Reduction

Leveraged volume sampling for linear regression

no code implementations NeurIPS 2018 Michał Dereziński, Manfred K. Warmuth, Daniel Hsu

We then develop a new rescaled variant of volume sampling that produces an unbiased estimate which avoids this bad behavior and has at least as good a tail bound as leverage score sampling: sample size $k=O(d\log d + d/\epsilon)$ suffices to guarantee total loss at most $1+\epsilon$ times the minimum with high probability.

Point Processes regression

Subsampling for Ridge Regression via Regularized Volume Sampling

no code implementations14 Oct 2017 Michał Dereziński, Manfred K. Warmuth

However, when labels are expensive, we are forced to select only a small subset of vectors $\mathbf{x}_i$ for which we obtain the labels $y_i$.


Online Dynamic Programming

no code implementations NeurIPS 2017 Holakou Rahmanian, Manfred K. Warmuth

We consider the problem of repeatedly solving a variant of the same dynamic programming problem in successive trials.

Two-temperature logistic regression based on the Tsallis divergence

no code implementations19 May 2017 Ehsan Amid, Manfred K. Warmuth, Sriram Srinivasan

We explain this by showing that $t_1 < 1$ caps the surrogate loss and $t_2 >1$ makes the predictive distribution have a heavy tail.

regression Vocal Bursts Valence Prediction

Unbiased estimates for linear regression via volume sampling

no code implementations NeurIPS 2017 Michał Dereziński, Manfred K. Warmuth

Pseudo inverse plays an important part in solving the linear least squares problem, where we try to predict a label for each column of $X$.


Low-dimensional Data Embedding via Robust Ranking

no code implementations30 Nov 2016 Ehsan Amid, Nikos Vlassis, Manfred K. Warmuth

We describe a new method called t-ETE for finding a low-dimensional embedding of a set of objects in Euclidean space.

PCA with Gaussian perturbations

no code implementations16 Jun 2015 Wojciech Kotłowski, Manfred K. Warmuth

We develop a simple algorithm that needs $O(kn^2)$ per trial whose regret is off by a small factor of $O(n^{1/4})$.

Labeled compression schemes for extremal classes

no code implementations30 May 2015 Shay Moran, Manfred K. Warmuth

We consider a generalization of maximum classes called extremal classes.

The limits of squared Euclidean distance regularization

no code implementations NeurIPS 2014 Michal Derezinski, Manfred K. Warmuth

We conjecture that our hardness results hold for any training algorithm that is based on the squared Euclidean distance regularization (i. e. Back-propagation with the Weight Decay heuristic).

A Bayesian Probability Calculus for Density Matrices

no code implementations9 Aug 2014 Manfred K. Warmuth, Dima Kuzmin

Finite probability distributions are a special case where the density matrix is restricted to be diagonal.

On-line PCA with Optimal Regrets

no code implementations17 Jun 2013 Jiazhong Nie, Wojciech Kotlowski, Manfred K. Warmuth

Furthermore, we show that when considering regret bounds as function of a loss budget, EG remains optimal and strictly outperforms GD.

Putting Bayes to sleep

no code implementations NeurIPS 2012 Dmitry Adamskiy, Manfred K. Warmuth, Wouter M. Koolen

If the nature of the data is changing over time in that different models predict well on different segments of the data, then adaptivity is typically achieved by mixing into the weights in each round a bit of the initial prior (kind of like a weak restart).

Learning Eigenvectors for Free

no code implementations NeurIPS 2011 Wouter M. Koolen, Wojciech Kotlowski, Manfred K. Warmuth

In this extension, the alphabet of $n$ outcomes is replaced by the set of all dyads, i. e. outer products $\u\u^\top$ where $\u$ is a vector in $\R^n$ of unit length.

Cannot find the paper you are looking for? You can Submit a new open access paper.