no code implementations • 8 Mar 2018 • Deborah Cohen, Amit Daniely, Amir Globerson, Gal Elidan
Complex classifiers may exhibit "embarassing" failures in cases where humans can easily provide a justified classification.
no code implementations • 7 May 2018 • Craig Boutilier, Alon Cohen, Amit Daniely, Avinatan Hassidim, Yishay Mansour, Ofer Meshi, Martin Mladenov, Dale Schuurmans
From an RL perspective, we show that Q-learning with sampled action sets is sound.
no code implementations • NeurIPS 2016 • Amit Daniely, Roy Frostig, Yoram Singer
We develop a general duality between neural networks and compositional kernels, striving towards a better understanding of deep learning.
no code implementations • NeurIPS 2017 • Amit Daniely
We show that the standard stochastic gradient decent (SGD) algorithm is guaranteed to learn, in polynomial time, a function that is competitive with the best function in the conjugate kernel space of the network, as defined in Daniely, Frostig and Singer.
no code implementations • 22 Mar 2017 • Amit Daniely, Roy Frostig, Vineet Gupta, Yoram Singer
We describe and analyze a simple random feature scheme (RFS) from prescribed compositional kernels.
no code implementations • 27 Feb 2017 • Amit Daniely
As many functions of the above form can be well approximated by poly-size depth three networks with poly-bounded weights, this establishes a separation between depth two and depth three networks w. r. t.\ the uniform distribution on $\mathbb{S}^{d-1}\times \mathbb{S}^{d-1}$.
no code implementations • 30 Nov 2016 • Gali Noti, Effi Levi, Yoav Kolumbus, Amit Daniely
A large body of work in behavioral fields attempts to develop models that describe the way people, as opposed to rational agents, make decisions.
no code implementations • 19 Apr 2016 • Amit Daniely, Nevena Lazic, Yoram Singer, Kunal Talwar
In stark contrast, our approach of using improper learning, using a larger hypothesis class allows the sketch size to have a logarithmic dependence on the degree.
no code implementations • 21 May 2015 • Amit Daniely
We show that no efficient learning algorithm has non-trivial worst-case performance even under the guarantees that $\mathrm{Err}_{\mathrm{HALF}}(\mathcal{D}) \le \eta$ for arbitrarily small constant $\eta>0$, and that $\mathcal{D}$ is supported in $\{\pm 1\}^n\times \{\pm 1\}$.
no code implementations • 11 Mar 2016 • Galit Bary-Weisberg, Amit Daniely, Shai Shalev-Shwartz
The model of learning with \emph{local membership queries} interpolates between the PAC model and the membership queries model by allowing the learner to query the label of any example that is similar to an example in the training set.
no code implementations • 26 Oct 2014 • Amit Daniely
We present a PTAS for agnostically learning halfspaces w. r. t.
no code implementations • 25 Feb 2015 • Amit Daniely, Alon Gonen, Shai Shalev-Shwartz
Strongly adaptive algorithms are algorithms whose performance on every time interval is close to optimal.
no code implementations • 13 Aug 2013 • Amit Daniely, Sivan Sabato, Shai Ben-David, Shai Shalev-Shwartz
We study the sample complexity of multiclass prediction in several learning settings.
no code implementations • 13 Apr 2014 • Amit Daniely, Shai Shalev-Shwatz
Using the recently developed framework of [Daniely et al, 2014], we show that under a natural assumption on the complexity of refuting random K-SAT formulas, learning DNF formulas is hard.
no code implementations • 30 Jul 2014 • Maria-Florina Balcan, Amit Daniely, Ruta Mehta, Ruth Urner, Vijay V. Vazirani
In this work we advance this line of work by providing sample complexity guarantees and efficient algorithms for a number of important classes.
no code implementations • 10 May 2014 • Amit Daniely, Shai Shalev-Shwartz
Furthermore, we show that the sample complexity of these learners is better than the sample complexity of the ERM rule, thus settling in negative an open question due to Collins (2005).
no code implementations • 3 Nov 2012 • Amit Daniely, Nati Linial, Shai Shalev-Shwartz
The best approximation ratio achievable by an efficient algorithm is $O\left(\frac{1/\gamma}{\sqrt{\log(1/\gamma)}}\right)$ and is achieved using an algorithm from the above class.
no code implementations • 10 Nov 2013 • Amit Daniely, Nati Linial, Shai Shalev-Shwartz
The biggest challenge in proving complexity results is to establish hardness of {\em improper learning} (a. k. a.
no code implementations • NeurIPS 2013 • Amit Daniely, Nati Linial, Shai Shalev Shwartz
That is, if more data is available, beyond the sample complexity limit, is it possible to use the extra examples to speed up the computation time required to perform the learning task?
no code implementations • 5 Feb 2013 • Amit Daniely, Tom Helbertal
We consider two scenarios of multiclass online learning of a hypothesis class $H\subseteq Y^X$.
no code implementations • NeurIPS 2019 • Amit Daniely, Vitaly Feldman
The only lower bound we are aware of is for PAC learning an artificial class of functions with respect to a uniform distribution (Kasiviswanathan et al. 2011).
no code implementations • NeurIPS 2012 • Amit Daniely, Sivan Sabato, Shai S. Shwartz
We analyze both the estimation error and the approximation error of these methods.
no code implementations • 7 Apr 2019 • Amit Daniely, Yishay Mansour
Our end result is an online algorithm that can combine a "base" online algorithm, having a guaranteed competitive ratio, with a range of online algorithms that guarantee a small regret over any interval of time.
no code implementations • 20 Jun 2019 • Alon Brutzkus, Amit Daniely, Eran Malach
In recent years, there are many attempts to understand popular heuristics.
no code implementations • 11 Jul 2019 • Alon Brutzkus, Amit Daniely, Eran Malach
Since its inception in the 1980s, ID3 has become one of the most successful and widely used algorithms for learning decision trees.
no code implementations • NeurIPS 2019 • Amit Daniely, Elad Granot
We show that for any depth $t$, if the inputs are in $[-1, 1]^d$, the sample complexity of $H$ is $\tilde O\left(\frac{dR^2}{\epsilon^2}\right)$.
no code implementations • NeurIPS 2020 • Amit Daniely
Many results in recent years established polynomial time learnability of various models via neural networks algorithms.
no code implementations • 9 Feb 2020 • Yossi Arjevani, Amit Daniely, Stefanie Jegelka, Hongzhou Lin
Recent advances in randomized incremental methods for minimizing $L$-smooth $\mu$-strongly convex finite sums have culminated in tight complexity of $\tilde{O}((n+\sqrt{n L/\mu})\log(1/\epsilon))$ and $O(n+\sqrt{nL/\epsilon})$, where $\mu>0$ and $\mu=0$, respectively, and $n$ denotes the number of individual functions.
no code implementations • NeurIPS 2020 • Amit Daniely, Eran Malach
On the other hand, under the same distributions, these parities cannot be learned efficiently by linear methods.
no code implementations • 28 Mar 2020 • Amit Daniely
We prove that a single step of gradient decent over depth two network, with $q$ hidden neurons, starting from orthogonal initialization, can memorize $\Omega\left(\frac{dq}{\log^4(d)}\right)$ independent and randomly labeled Gaussians in $\mathbb{R}^d$.
no code implementations • NeurIPS 2020 • Amit Daniely, Gal Vardi
A natural approach to settle the discrepancy is to assume that the network's weights are "well-behaved" and posses some generic properties that may allow efficient learning.
no code implementations • NeurIPS 2020 • Amit Daniely, Hadas Schacham
We consider ReLU networks with random weights, in which the dimension decreases at each layer.
no code implementations • 20 Jan 2021 • Amit Daniely, Gal Vardi
We also establish lower bounds on the complexity of learning intersections of a constant number of halfspaces, and ReLU networks with a constant number of hidden neurons.
no code implementations • 20 May 2021 • Amit Daniely, Elad Granot
In this work, we present a polynomial-time algorithm that can learn a depth-two ReLU network from queries under mild general position assumptions.
no code implementations • NeurIPS 2021 • Alon Cohen, Amit Daniely, Yoel Drori, Tomer Koren, Mariano Schain
In the general non-convex smooth optimization setting, we give a simple and efficient algorithm that requires $O( \sigma^2/\epsilon^4 + \tau/\epsilon^2 )$ steps for finding an $\epsilon$-stationary point $x$, where $\tau$ is the \emph{average} delay $\smash{\frac{1}{T}\sum_{t=1}^T d_t}$ and $\sigma^2$ is the variance of the stochastic gradients.
no code implementations • 10 Feb 2022 • Olivier Bousquet, Amit Daniely, Haim Kaplan, Yishay Mansour, Shay Moran, Uri Stemmer
Our transformation readily implies monotone learners in a variety of contexts: for example it extends Pestov's result to classification tasks with an arbitrary number of labels.
no code implementations • 26 Sep 2022 • Amit Daniely, Gal Katzhendler
Recently, Daniely and Granot [arXiv:1910. 05697] introduced a new notion of complexity called Approximate Description Length (ADL).
no code implementations • 17 Nov 2022 • Amit Daniely, Elad Granot
We investigate the sample complexity of bounded two-layer neural networks using different activation functions.
no code implementations • 23 Nov 2023 • Gilad Yehudai, Alon Cohen, Amit Daniely, Yoel Drori, Tomer Koren, Mariano Schain
We introduce a novel dynamic learning-rate scheduling scheme grounded in theory with the goal of simplifying the manual and time-consuming tuning of schedules in practice.
no code implementations • 15 Jan 2024 • Amit Daniely, Mariano Schain, Gilad Yehudai
Optimizing Neural networks is a difficult task which is still not well understood.
1 code implementation • ICLR 2020 • Daniel Gissin, Shai Shalev-Shwartz, Amit Daniely
A leading hypothesis for the surprising generalization of neural networks is that the dynamics of gradient descent bias the model towards simple solutions, by searching through the solution space in an incremental order of complexity.