Search Results for author: Amit Daniely

Found 41 papers, 1 papers with code

The Implicit Bias of Depth: How Incremental Learning Drives Generalization

1 code implementation • ICLR 2020 • Daniel Gissin, Shai Shalev-Shwartz, Amit Daniely

A leading hypothesis for the surprising generalization of neural networks is that the dynamics of gradient descent bias the model towards simple solutions, by searching through the solution space in an incremental order of complexity.

Binary Classification Incremental Learning

Paper
Code

Learning Rules-First Classifiers

no code implementations • 8 Mar 2018 • Deborah Cohen, Amit Daniely, Amir Globerson, Gal Elidan

Complex classifiers may exhibit "embarassing" failures in cases where humans can easily provide a justified classification.

General Classification Sentiment Analysis

Paper
Add Code

Planning and Learning with Stochastic Action Sets

no code implementations • 7 May 2018 • Craig Boutilier, Alon Cohen, Amit Daniely, Avinatan Hassidim, Yishay Mansour, Ofer Meshi, Martin Mladenov, Dale Schuurmans

From an RL perspective, we show that Q-learning with sampled action sets is sound.

Q-Learning Reinforcement Learning (RL)

Paper
Add Code

Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity

no code implementations • NeurIPS 2016 • Amit Daniely, Roy Frostig, Yoram Singer

We develop a general duality between neural networks and compositional kernels, striving towards a better understanding of deep learning.

Paper
Add Code

SGD Learns the Conjugate Kernel Class of the Network

no code implementations • NeurIPS 2017 • Amit Daniely

We show that the standard stochastic gradient decent (SGD) algorithm is guaranteed to learn, in polynomial time, a function that is competitive with the best function in the conjugate kernel space of the network, as defined in Daniely, Frostig and Singer.

Paper
Add Code

Random Features for Compositional Kernels

no code implementations • 22 Mar 2017 • Amit Daniely, Roy Frostig, Vineet Gupta, Yoram Singer

We describe and analyze a simple random feature scheme (RFS) from prescribed compositional kernels.

Paper
Add Code

Depth Separation for Neural Networks

no code implementations • 27 Feb 2017 • Amit Daniely

As many functions of the above form can be well approximated by poly-size depth three networks with poly-bounded weights, this establishes a separation between depth two and depth three networks w. r. t.\ the uniform distribution on $\mathbb{S}^{d-1}\times \mathbb{S}^{d-1}$.

Paper
Add Code

Behavior-Based Machine-Learning: A Hybrid Approach for Predicting Human Decision Making

no code implementations • 30 Nov 2016 • Gali Noti, Effi Levi, Yoav Kolumbus, Amit Daniely

A large body of work in behavioral fields attempts to develop models that describe the way people, as opposed to rational agents, make decisions.

BIG-bench Machine Learning Decision Making

Paper
Add Code

Sketching and Neural Networks

no code implementations • 19 Apr 2016 • Amit Daniely, Nevena Lazic, Yoram Singer, Kunal Talwar

In stark contrast, our approach of using improper learning, using a larger hypothesis class allows the sketch size to have a logarithmic dependence on the degree.

Paper
Add Code

Complexity Theoretic Limitations on Learning Halfspaces

no code implementations • 21 May 2015 • Amit Daniely

We show that no efficient learning algorithm has non-trivial worst-case performance even under the guarantees that $\mathrm{Err}_{\mathrm{HALF}}(\mathcal{D}) \le \eta$ for arbitrarily small constant $\eta>0$, and that $\mathcal{D}$ is supported in $\{\pm 1\}^n\times \{\pm 1\}$.

Paper
Add Code

Distribution Free Learning with Local Queries

no code implementations • 11 Mar 2016 • Galit Bary-Weisberg, Amit Daniely, Shai Shalev-Shwartz

The model of learning with \emph{local membership queries} interpolates between the PAC model and the membership queries model by allowing the learner to query the label of any example that is similar to an example in the training set.

Paper
Add Code

A PTAS for Agnostically Learning Halfspaces

no code implementations • 26 Oct 2014 • Amit Daniely

We present a PTAS for agnostically learning halfspaces w. r. t.

regression

Paper
Add Code

Strongly Adaptive Online Learning

no code implementations • 25 Feb 2015 • Amit Daniely, Alon Gonen, Shai Shalev-Shwartz

Strongly adaptive algorithms are algorithms whose performance on every time interval is close to optimal.

Paper
Add Code

Multiclass learnability and the ERM principle

no code implementations • 13 Aug 2013 • Amit Daniely, Sivan Sabato, Shai Ben-David, Shai Shalev-Shwartz

We study the sample complexity of multiclass prediction in several learning settings.

Binary Classification General Classification

Paper
Add Code

Complexity theoretic limitations on learning DNF's

no code implementations • 13 Apr 2014 • Amit Daniely, Shai Shalev-Shwatz

Using the recently developed framework of [Daniely et al, 2014], we show that under a natural assumption on the complexity of refuting random K-SAT formulas, learning DNF formulas is hard.

Paper
Add Code

Learning Economic Parameters from Revealed Preferences

no code implementations • 30 Jul 2014 • Maria-Florina Balcan, Amit Daniely, Ruta Mehta, Ruth Urner, Vijay V. Vazirani

In this work we advance this line of work by providing sample complexity guarantees and efficient algorithms for a number of important classes.

Open-Ended Question Answering

Paper
Add Code

Optimal Learners for Multiclass Problems

no code implementations • 10 May 2014 • Amit Daniely, Shai Shalev-Shwartz

Furthermore, we show that the sample complexity of these learners is better than the sample complexity of the ERM rule, thus settling in negative an open question due to Collins (2005).

Binary Classification Open-Ended Question Answering

Paper
Add Code

The complexity of learning halfspaces using generalized linear methods

no code implementations • 3 Nov 2012 • Amit Daniely, Nati Linial, Shai Shalev-Shwartz

The best approximation ratio achievable by an efficient algorithm is $O\left(\frac{1/\gamma}{\sqrt{\log(1/\gamma)}}\right)$ and is achieved using an algorithm from the above class.

regression

Paper
Add Code

From average case complexity to improper learning complexity

no code implementations • 10 Nov 2013 • Amit Daniely, Nati Linial, Shai Shalev-Shwartz

The biggest challenge in proving complexity results is to establish hardness of {\em improper learning} (a. k. a.

Learning Theory

Paper
Add Code

More data speeds up training time in learning halfspaces over sparse vectors

no code implementations • NeurIPS 2013 • Amit Daniely, Nati Linial, Shai Shalev Shwartz

That is, if more data is available, beyond the sample complexity limit, is it possible to use the extra examples to speed up the computation time required to perform the learning task?

PAC learning

Paper
Add Code

The price of bandit information in multiclass online classification

no code implementations • 5 Feb 2013 • Amit Daniely, Tom Helbertal

We consider two scenarios of multiclass online learning of a hypothesis class $H\subseteq Y^X$.

Classification General Classification +1

Paper
Add Code

Locally Private Learning without Interaction Requires Separation

no code implementations • NeurIPS 2019 • Amit Daniely, Vitaly Feldman

The only lower bound we are aware of is for PAC learning an artificial class of functions with respect to a uniform distribution (Kasiviswanathan et al. 2011).

PAC learning

Paper
Add Code

Multiclass Learning Approaches: A Theoretical Comparison with Implications

no code implementations • NeurIPS 2012 • Amit Daniely, Sivan Sabato, Shai S. Shwartz

We analyze both the estimation error and the approximation error of these methods.

Binary Classification Classification +1

Paper
Add Code

Competitive ratio versus regret minimization: achieving the best of both worlds

no code implementations • 7 Apr 2019 • Amit Daniely, Yishay Mansour

Our end result is an online algorithm that can combine a "base" online algorithm, having a guaranteed competitive ratio, with a range of online algorithms that guarantee a small regret over any interval of time.

Paper
Add Code

ID3 Learns Juntas for Smoothed Product Distributions

no code implementations • 20 Jun 2019 • Alon Brutzkus, Amit Daniely, Eran Malach

In recent years, there are many attempts to understand popular heuristics.

Paper
Add Code

On the Optimality of Trees Generated by ID3

no code implementations • 11 Jul 2019 • Alon Brutzkus, Amit Daniely, Eran Malach

Since its inception in the 1980s, ID3 has become one of the most successful and widely used algorithms for learning decision trees.

Paper
Add Code

Generalization Bounds for Neural Networks via Approximate Description Length

no code implementations • NeurIPS 2019 • Amit Daniely, Elad Granot

We show that for any depth $t$, if the inputs are in $[-1, 1]^d$, the sample complexity of $H$ is $\tilde O\left(\frac{dR^2}{\epsilon^2}\right)$.

Generalization Bounds

Paper
Add Code

Neural Networks Learning and Memorization with (almost) no Over-Parameterization

no code implementations • NeurIPS 2020 • Amit Daniely

Many results in recent years established polynomial time learnability of various models via neural networks algorithms.

Memorization

Paper
Add Code

On the Complexity of Minimizing Convex Finite Sums Without Using the Indices of the Individual Functions

no code implementations • 9 Feb 2020 • Yossi Arjevani, Amit Daniely, Stefanie Jegelka, Hongzhou Lin

Recent advances in randomized incremental methods for minimizing $L$-smooth $\mu$-strongly convex finite sums have culminated in tight complexity of $\tilde{O}((n+\sqrt{n L/\mu})\log(1/\epsilon))$ and $O(n+\sqrt{nL/\epsilon})$, where $\mu>0$ and $\mu=0$, respectively, and $n$ denotes the number of individual functions.

Paper
Add Code

Learning Parities with Neural Networks

no code implementations • NeurIPS 2020 • Amit Daniely, Eran Malach

On the other hand, under the same distributions, these parities cannot be learned efficiently by linear methods.

Paper
Add Code

Memorizing Gaussians with no over-parameterizaion via gradient decent on neural networks

no code implementations • 28 Mar 2020 • Amit Daniely

We prove that a single step of gradient decent over depth two network, with $q$ hidden neurons, starting from orthogonal initialization, can memorize $\Omega\left(\frac{dq}{\log^4(d)}\right)$ independent and randomly labeled Gaussians in $\mathbb{R}^d$.

valid

Paper
Add Code

Hardness of Learning Neural Networks with Natural Weights

no code implementations • NeurIPS 2020 • Amit Daniely, Gal Vardi

A natural approach to settle the discrepancy is to assume that the network's weights are "well-behaved" and posses some generic properties that may allow efficient learning.

Paper
Add Code

Most ReLU Networks Suffer from $\ell^2$ Adversarial Perturbations

no code implementations • NeurIPS 2020 • Amit Daniely, Hadas Schacham

We consider ReLU networks with random weights, in which the dimension decreases at each layer.

Paper
Add Code

From Local Pseudorandom Generators to Hardness of Learning

no code implementations • 20 Jan 2021 • Amit Daniely, Gal Vardi

We also establish lower bounds on the complexity of learning intersections of a constant number of halfspaces, and ReLU networks with a constant number of hidden neurons.

PAC learning

Paper
Add Code

An Exact Poly-Time Membership-Queries Algorithm for Extraction a three-Layer ReLU Network

no code implementations • 20 May 2021 • Amit Daniely, Elad Granot

In this work, we present a polynomial-time algorithm that can learn a depth-two ReLU network from queries under mild general position assumptions.

BIG-bench Machine Learning Model extraction +1

Paper
Add Code

Asynchronous Stochastic Optimization Robust to Arbitrary Delays

no code implementations • NeurIPS 2021 • Alon Cohen, Amit Daniely, Yoel Drori, Tomer Koren, Mariano Schain

In the general non-convex smooth optimization setting, we give a simple and efficient algorithm that requires $O( \sigma^2/\epsilon^4 + \tau/\epsilon^2 )$ steps for finding an $\epsilon$-stationary point $x$, where $\tau$ is the \emph{average} delay $\smash{\frac{1}{T}\sum_{t=1}^T d_t}$ and $\sigma^2$ is the variance of the stochastic gradients.

Distributed Optimization

Paper
Add Code

Monotone Learning

no code implementations • 10 Feb 2022 • Olivier Bousquet, Amit Daniely, Haim Kaplan, Yishay Mansour, Shay Moran, Uri Stemmer

Our transformation readily implies monotone learners in a variety of contexts: for example it extends Pestov's result to classification tasks with an arbitrary number of labels.

Binary Classification Classification +1

Paper
Add Code

Approximate Description Length, Covering Numbers, and VC Dimension

no code implementations • 26 Sep 2022 • Amit Daniely, Gal Katzhendler

Recently, Daniely and Granot [arXiv:1910. 05697] introduced a new notion of complexity called Approximate Description Length (ADL).

Generalization Bounds

Paper
Add Code

On the Sample Complexity of Two-Layer Networks: Lipschitz vs. Element-Wise Lipschitz Activation

no code implementations • 17 Nov 2022 • Amit Daniely, Elad Granot

We investigate the sample complexity of bounded two-layer neural networks using different activation functions.

Paper
Add Code

Locally Optimal Descent for Dynamic Stepsize Scheduling

no code implementations • 23 Nov 2023 • Gilad Yehudai, Alon Cohen, Amit Daniely, Yoel Drori, Tomer Koren, Mariano Schain

We introduce a novel dynamic learning-rate scheduling scheme grounded in theory with the goal of simplifying the manual and time-consuming tuning of schedules in practice.

Scheduling Stochastic Optimization

Paper
Add Code

RedEx: Beyond Fixed Representation Methods via Convex Optimization

no code implementations • 15 Jan 2024 • Amit Daniely, Mariano Schain, Gilad Yehudai

Optimizing Neural networks is a difficult task which is still not well understood.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.