You need to log in to edit.

You can create a new account if you don't have one.

Or, discuss a change on Slack.

You can create a new account if you don't have one.

Or, discuss a change on Slack.

no code implementations • 15 Sep 2022 • Ehsan Amid, Rohan Anil, Christopher Fifty, Manfred K. Warmuth

In this work, we propose a novel approach for layerwise representation learning of a trained neural network.

no code implementations • 13 Feb 2022 • Ehsan Amid, Rohan Anil, Wojciech Kotłowski, Manfred K. Warmuth

We present the surprising result that randomly initialized neural networks are good feature extractors in expectation.

no code implementations • 31 Jan 2022 • Ehsan Amid, Rohan Anil, Christopher Fifty, Manfred K. Warmuth

In this paper, we update the step-size scale and the gain variables with exponentiated gradient updates instead.

1 code implementation • 11 Jun 2021 • Ehsan Amid, Rohan Anil, Manfred K. Warmuth

Second-order methods have shown state-of-the-art performance for optimizing deep neural networks.

no code implementations • 3 Apr 2021 • Negin Majidi, Ehsan Amid, Hossein Talebi, Manfred K. Warmuth

Many learning tasks in machine learning can be viewed as taking a gradient step towards minimizing the average loss of a batch of examples in each training iteration.

no code implementations • 21 Nov 2020 • Hossein Talebi, Ehsan Amid, Peyman Milanfar, Manfred K. Warmuth

Training a model on these pairwise preferences is a common deep learning approach.

no code implementations • 16 Oct 2020 • Manfred K. Warmuth, Wojciech Kotłowski, Ehsan Amid

It was conjectured that any neural network of any structure and arbitrary differentiable transfer functions at the nodes cannot learn the following problem sample efficiently when trained with gradient descent: The instances are the rows of a $d$-dimensional Hadamard matrix and the target is one of the features, i. e. very sparse.

no code implementations • NeurIPS 2020 • Ehsan Amid, Manfred K. Warmuth

We present a general framework for casting a mirror descent update as a gradient descent update on a different set of parameters.

1 code implementation • 1 Oct 2019 • Ehsan Amid, Manfred K. Warmuth

We empirically show the excellent performance of TriMap on a large variety of datasets in terms of the quality of the embedding as well as the runtime.

no code implementations • 11 Sep 2019 • Ehsan Amid, Manfred K. Warmuth

We show that Krasulina's update corresponds to a projected gradient descent step on the Stiefel manifold of the orthonormal $k$-frames, while Oja's update amounts to a gradient descent step using the unprojected gradient.

no code implementations • 8 Jul 2019 • Michał Dereziński, Manfred K. Warmuth, Daniel Hsu

We use them to show that for any input distribution and $\epsilon>0$ there is a random design consisting of $O(d\log d+ d/\epsilon)$ points from which an unbiased estimator can be constructed whose expected square loss over the entire distribution is bounded by $1+\epsilon$ times the loss of the optimum.

6 code implementations • NeurIPS 2019 • Ehsan Amid, Manfred K. Warmuth, Rohan Anil, Tomer Koren

We introduce a temperature into the exponential function and replace the softmax output layer of neural nets by a high temperature generalization.

no code implementations • 20 Feb 2019 • Michał Kempka, Wojciech Kotłowski, Manfred K. Warmuth

We consider online learning with linear models, where the algorithm predicts on sequentially revealed instances (feature vectors), and is compared against the best linear function (comparator) in hindsight.

no code implementations • 11 Feb 2019 • Ehsan Amid, Manfred K. Warmuth

Expectation-Maximization (EM) is a prominent approach for parameter estimation of hidden (aka latent) variable models.

no code implementations • 4 Feb 2019 • Michał Dereziński, Kenneth L. Clarkson, Michael W. Mahoney, Manfred K. Warmuth

In the process, we develop a new algorithm for a joint sampling distribution called volume sampling, and we propose a new i. i. d.

no code implementations • 5 Dec 2018 • Jérémie Chalopin, Victor Chepoi, Shay Moran, Manfred K. Warmuth

On the positive side we present a new construction of an unlabeled sample compression scheme for maximum classes.

no code implementations • 4 Oct 2018 • Michał Dereziński, Manfred K. Warmuth, Daniel Hsu

Without any assumptions on the noise, the linear least squares solution for any i. i. d.

no code implementations • 6 Jun 2018 • Michał Dereziński, Manfred K. Warmuth

We can only afford to attain the responses for a small subset of the points that are then used to construct linear predictions for all points in the dataset.

no code implementations • 18 Apr 2018 • Corinna Cortes, Vitaly Kuznetsov, Mehryar Mohri, Holakou Rahmanian, Manfred K. Warmuth

We study the problem of online path learning with non-additive gains, which is a central problem appearing in several applications, including ensemble structured prediction.

no code implementations • 10 Mar 2018 • Sanjay Krishna Gouda, Salil Kanetkar, David Harrison, Manfred K. Warmuth

The problem of identifying voice commands has always been a challenge due to the presence of noise and variability in speed, pitch, etc.

1 code implementation • 1 Mar 2018 • Ehsan Amid, Manfred K. Warmuth

We first show that the commonly used dimensionality reduction (DR) methods such as t-SNE and LargeVis poorly capture the global structure of the data in the low dimensional embedding.

no code implementations • NeurIPS 2018 • Michał Dereziński, Manfred K. Warmuth, Daniel Hsu

We then develop a new rescaled variant of volume sampling that produces an unbiased estimate which avoids this bad behavior and has at least as good a tail bound as leverage score sampling: sample size $k=O(d\log d + d/\epsilon)$ suffices to guarantee total loss at most $1+\epsilon$ times the minimum with high probability.

no code implementations • 14 Oct 2017 • Michał Dereziński, Manfred K. Warmuth

However, when labels are expensive, we are forced to select only a small subset of vectors $\mathbf{x}_i$ for which we obtain the labels $y_i$.

no code implementations • NeurIPS 2017 • Holakou Rahmanian, Manfred K. Warmuth

We consider the problem of repeatedly solving a variant of the same dynamic programming problem in successive trials.

no code implementations • NeurIPS 2017 • Michał Dereziński, Manfred K. Warmuth

Pseudo inverse plays an important part in solving the linear least squares problem, where we try to predict a label for each column of $X$.

no code implementations • 19 May 2017 • Ehsan Amid, Manfred K. Warmuth, Sriram Srinivasan

We explain this by showing that $t_1 < 1$ caps the surrogate loss and $t_2 >1$ makes the predictive distribution have a heavy tail.

no code implementations • 30 Nov 2016 • Ehsan Amid, Nikos Vlassis, Manfred K. Warmuth

We describe a new method called t-ETE for finding a low-dimensional embedding of a set of objects in Euclidean space.

no code implementations • 16 Jun 2015 • Wojciech Kotłowski, Manfred K. Warmuth

We develop a simple algorithm that needs $O(kn^2)$ per trial whose regret is off by a small factor of $O(n^{1/4})$.

no code implementations • 30 May 2015 • Shay Moran, Manfred K. Warmuth

We consider a generalization of maximum classes called extremal classes.

no code implementations • NeurIPS 2014 • Michal Derezinski, Manfred K. Warmuth

We conjecture that our hardness results hold for any training algorithm that is based on the squared Euclidean distance regularization (i. e. Back-propagation with the Weight Decay heuristic).

no code implementations • 9 Aug 2014 • Manfred K. Warmuth, Dima Kuzmin

Finite probability distributions are a special case where the density matrix is restricted to be diagonal.

no code implementations • 17 Jun 2013 • Jiazhong Nie, Wojciech Kotlowski, Manfred K. Warmuth

Furthermore, we show that when considering regret bounds as function of a loss budget, EG remains optimal and strictly outperforms GD.

no code implementations • NeurIPS 2012 • Dmitry Adamskiy, Manfred K. Warmuth, Wouter M. Koolen

If the nature of the data is changing over time in that different models predict well on different segments of the data, then adaptivity is typically achieved by mixing into the weights in each round a bit of the initial prior (kind of like a weak restart).

no code implementations • NeurIPS 2011 • Wouter M. Koolen, Wojciech Kotlowski, Manfred K. Warmuth

In this extension, the alphabet of $n$ outcomes is replaced by the set of all dyads, i. e. outer products $\u\u^\top$ where $\u$ is a vector in $\R^n$ of unit length.

no code implementations • NeurIPS 2010 • Jacob D. Abernethy, Manfred K. Warmuth

We study repeated zero-sum games against an adversary on a budget.

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.