Search Results for author: Shai Shalev-Shwartz

Found 57 papers, 9 papers with code

Provable Guarantees on Learning Hierarchical Generative Models with Deep CNNs

no code implementations Eran Malach, Shai Shalev-Shwartz

To show any positive theoretical results, one must make assumptions on the data distribution.

Less is More: Selective Layer Finetuning with SubTuning

1 code implementation13 Feb 2023 Gal Kaplun, Andrey Gurevich, Tal Swisa, Mazor David, Shai Shalev-Shwartz, Eran Malach

Finetuning a pretrained model has become a standard approach for training neural networks on novel tasks, resulting in fast convergence and improved performance.

Multi-Task Learning

Standing on the Shoulders of Giant Frozen Language Models

no code implementations21 Apr 2022 Yoav Levine, Itay Dalmedigos, Ori Ram, Yoel Zeldes, Daniel Jannai, Dor Muhlgay, Yoni Osin, Opher Lieber, Barak Lenz, Shai Shalev-Shwartz, Amnon Shashua, Kevin Leyton-Brown, Yoav Shoham

To demonstrate this, we introduce three novel methods for leveraging frozen models: input-dependent prompt tuning, frozen readers, and recursive LMs, each of which vastly improves on current frozen-model approaches.

Knowledge Distillation: Bad Models Can Be Good Role Models

no code implementations28 Mar 2022 Gal Kaplun, Eran Malach, Preetum Nakkiran, Shai Shalev-Shwartz

We relate the notion of such samplers to knowledge distillation, where a student network imitates the outputs of a teacher on unlabeled data.

Knowledge Distillation Learning Theory

The Connection Between Approximation, Depth Separation and Learnability in Neural Networks

no code implementations31 Jan 2021 Eran Malach, Gilad Yehudai, Shai Shalev-Shwartz, Ohad Shamir

On the other hand, the fact that deep networks can efficiently express a target function does not mean that this target function can be learned efficiently by deep neural networks.

The Implications of Local Correlation on Learning Some Deep Functions

no code implementations NeurIPS 2020 Eran Malach, Shai Shalev-Shwartz

In fact, the proofs of such hardness results show that even weakly learning deep networks is hard.

Computational Separation Between Convolutional and Fully-Connected Networks

no code implementations ICLR 2021 Eran Malach, Shai Shalev-Shwartz

Convolutional neural networks (CNN) exhibit unmatched performance in a multitude of computer vision tasks.

When Hardness of Approximation Meets Hardness of Learning

no code implementations18 Aug 2020 Eran Malach, Shai Shalev-Shwartz

A supervised learning algorithm has access to a distribution of labeled examples, and needs to return a function (hypothesis) that correctly labels the examples.

On the Ethics of Building AI in a Responsible Manner

no code implementations30 Mar 2020 Shai Shalev-Shwartz, Shaked Shammah, Amnon Shashua

The AI-alignment problem arises when there is a discrepancy between the goals that a human designer specifies to an AI learner and a potential catastrophic outcome that does not reflect what the human designer really wants.

BIG-bench Machine Learning Ethics

Proving the Lottery Ticket Hypothesis: Pruning is All You Need

no code implementations ICML 2020 Eran Malach, Gilad Yehudai, Shai Shalev-Shwartz, Ohad Shamir

The lottery ticket hypothesis (Frankle and Carbin, 2018), states that a randomly-initialized network contains a small subnetwork such that, when trained in isolation, can compete with the performance of the original network.

Learning Boolean Circuits with Neural Networks

no code implementations25 Oct 2019 Eran Malach, Shai Shalev-Shwartz

To separate hard from easy to learn distributions, we observe the property of local correlation: correlation between local patterns of the input and the target label.

The Implicit Bias of Depth: How Incremental Learning Drives Generalization

1 code implementation ICLR 2020 Daniel Gissin, Shai Shalev-Shwartz, Amit Daniely

A leading hypothesis for the surprising generalization of neural networks is that the dynamics of gradient descent bias the model towards simple solutions, by searching through the solution space in an incremental order of complexity.

Binary Classification Incremental Learning

Discriminative Active Learning

2 code implementations ICLR 2019 Daniel Gissin, Shai Shalev-Shwartz

We propose a new batch mode active learning algorithm designed for neural networks and large query batch sizes.

Active Learning Binary Classification +3

Decoupling Gating from Linearity

no code implementations ICLR 2019 Jonathan Fiat, Eran Malach, Shai Shalev-Shwartz

Specifically, we show a memorization result for networks of size $\tilde{\Omega}(\frac{m}{d})$, and improved generalization bounds.

Generalization Bounds Memorization

Is Deeper Better only when Shallow is Good?

1 code implementation NeurIPS 2019 Eran Malach, Shai Shalev-Shwartz

Using this result we prove that, at least in some distributions, the success of learning deep networks depends on whether the distribution can be well approximated by shallower networks, and we conjecture that this property holds in general.

Learning Theory Open-Ended Question Answering

A Provably Correct Algorithm for Deep Learning that Actually Works

no code implementations26 Mar 2018 Eran Malach, Shai Shalev-Shwartz

We describe a layer-by-layer algorithm for training deep convolutional networks, where each step involves gradient updates for a two layer network followed by a simple clustering algorithm.

Clustering

SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data

no code implementations ICLR 2018 Alon Brutzkus, Amir Globerson, Eran Malach, Shai Shalev-Shwartz

Neural networks exhibit good generalization behavior in the over-parameterized regime, where the number of network parameters exceeds the number of observations.

Generalization Bounds

On a Formal Model of Safe and Scalable Self-driving Cars

3 code implementations21 Aug 2017 Shai Shalev-Shwartz, Shaked Shammah, Amnon Shashua

In the second part we describe a design of a system that adheres to our safety assurance requirements and is scalable to millions of cars.

Autonomous Driving Self-Driving Cars

Weight Sharing is Crucial to Succesful Optimization

no code implementations2 Jun 2017 Shai Shalev-Shwartz, Ohad Shamir, Shaked Shammah

Exploiting the great expressive power of Deep Neural Network architectures, relies on the ability to train them.

Failures of Gradient-Based Deep Learning

1 code implementation ICML 2017 Shai Shalev-Shwartz, Ohad Shamir, Shaked Shammah

In recent years, Deep Learning has become the go-to solution for a broad range of applications, often outperforming state-of-the-art.

Fast Rates for Empirical Risk Minimization of Strict Saddle Problems

no code implementations16 Jan 2017 Alon Gonen, Shai Shalev-Shwartz

We derive bounds on the sample complexity of empirical risk minimization (ERM) in the context of minimizing non-convex risks that admit the strict saddle property.

Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving

no code implementations11 Oct 2016 Shai Shalev-Shwartz, Shaked Shammah, Amnon Shashua

Second, the Markov Decision Process model often used in robotics is problematic in our case because of unpredictable behavior of other agents in this multi-agent scenario.

Autonomous Driving Multi-agent Reinforcement Learning +3

Learning a Metric Embedding for Face Recognition using the Multibatch Method

no code implementations NeurIPS 2016 Oren Tadmor, Yonatan Wexler, Tal Rosenwein, Shai Shalev-Shwartz, Amnon Shashua

This work is motivated by the engineering task of achieving a near state-of-the-art face recognition on a minimal computing budget running on an embedded system.

Face Recognition

On the Sample Complexity of End-to-end Training vs. Semantic Abstraction Training

no code implementations23 Apr 2016 Shai Shalev-Shwartz, Amnon Shashua

We compare the end-to-end training approach to a modular approach in which a system is decomposed into semantically meaningful components.

Autonomous Driving

Distribution Free Learning with Local Queries

no code implementations11 Mar 2016 Galit Bary-Weisberg, Amit Daniely, Shai Shalev-Shwartz

The model of learning with \emph{local membership queries} interpolates between the PAC model and the membership queries model by allowing the learner to query the label of any example that is similar to an example in the training set.

Solving Ridge Regression using Sketched Preconditioned SVRG

no code implementations7 Feb 2016 Alon Gonen, Francesco Orabona, Shai Shalev-Shwartz

We develop a novel preconditioning method for ridge regression, based on recent linear sketching methods.

regression

SDCA without Duality, Regularization, and Individual Convexity

no code implementations4 Feb 2016 Shai Shalev-Shwartz

Stochastic Dual Coordinate Ascent is a popular method for solving regularized loss minimization for the case of convex losses.

Long-term Planning by Short-term Prediction

no code implementations4 Feb 2016 Shai Shalev-Shwartz, Nir Ben-Zrihem, Aviad Cohen, Amnon Shashua

We argue that dual versions of the MDP framework (that depend on the value function and the $Q$ function) are problematic for autonomous driving applications due to the non Markovian of the natural state space representation, and due to the continuous state and action spaces.

Autonomous Driving

Minimizing the Maximal Loss: How and Why?

no code implementations4 Feb 2016 Shai Shalev-Shwartz, Yonatan Wexler

Second, it is often argued that there is no sense to minimize the loss on the training set too much, as it will not be reflected in the generalization loss.

Average Stability is Invariant to Data Preconditioning. Implications to Exp-concave Empirical Risk Minimization

no code implementations15 Jan 2016 Alon Gonen, Shai Shalev-Shwartz

We show that the average stability notion introduced by \cite{kearns1999algorithmic, bousquet2002stability} is invariant to data preconditioning, for a wide class of generalized linear models that includes most of the known exp-concave losses.

Beyond Convexity: Stochastic Quasi-Convex Optimization

no code implementations NeurIPS 2015 Elad Hazan, Kfir. Y. Levy, Shai Shalev-Shwartz

The Normalized Gradient Descent (NGD) algorithm, is an adaptation of Gradient Descent, which updates according to the direction of the gradients, rather than the gradients themselves.

Faster SGD Using Sketched Conditioning

no code implementations8 Jun 2015 Alon Gonen, Shai Shalev-Shwartz

We propose a novel method for speeding up stochastic optimization algorithms via sketching methods, which recently became a powerful tool for accelerating algorithms for numerical linear algebra.

Stochastic Optimization

On Lower and Upper Bounds for Smooth and Strongly Convex Optimization Problems

no code implementations23 Mar 2015 Yossi Arjevani, Shai Shalev-Shwartz, Ohad Shamir

This, in turn, reveals a powerful connection between a class of optimization algorithms and the analytic theory of polynomials whereby new lower and upper bounds are derived.

valid

On Graduated Optimization for Stochastic Non-Convex Problems

1 code implementation12 Mar 2015 Elad Hazan, Kfir. Y. Levy, Shai Shalev-Shwartz

We extend our algorithm and analysis to the setting of stochastic non-convex optimization with noisy gradient feedback, attaining the same convergence rate.

Strongly Adaptive Online Learning

no code implementations25 Feb 2015 Amit Daniely, Alon Gonen, Shai Shalev-Shwartz

Strongly adaptive algorithms are algorithms whose performance on every time interval is close to optimal.

SDCA without Duality

no code implementations22 Feb 2015 Shai Shalev-Shwartz

Stochastic Dual Coordinate Ascent is a popular method for solving regularized loss minimization for the case of convex losses.

SelfieBoost: A Boosting Algorithm for Deep Learning

no code implementations13 Nov 2014 Shai Shalev-Shwartz

We describe and analyze a new boosting algorithm for deep learning called SelfieBoost.

Optimal Learners for Multiclass Problems

no code implementations10 May 2014 Amit Daniely, Shai Shalev-Shwartz

Furthermore, we show that the sample complexity of these learners is better than the sample complexity of the ERM rule, thus settling in negative an open question due to Collins (2005).

Binary Classification Open-Ended Question Answering

Subspace Learning with Partial Information

no code implementations19 Feb 2014 Alon Gonen, Dan Rosenbaum, Yonina Eldar, Shai Shalev-Shwartz

The goal of subspace learning is to find a $k$-dimensional subspace of $\mathbb{R}^d$, such that the expected squared distance between instance vectors and the subspace is as small as possible.

From average case complexity to improper learning complexity

no code implementations10 Nov 2013 Amit Daniely, Nati Linial, Shai Shalev-Shwartz

The biggest challenge in proving complexity results is to establish hardness of {\em improper learning} (a. k. a.

Learning Theory

Accelerated Proximal Stochastic Dual Coordinate Ascent for Regularized Loss Minimization

no code implementations10 Sep 2013 Shai Shalev-Shwartz, Tong Zhang

We introduce a proximal version of the stochastic dual coordinate ascent method and show how to accelerate the method using an inner-outer iteration procedure.

BIG-bench Machine Learning regression

Accelerated Mini-Batch Stochastic Dual Coordinate Ascent

no code implementations NeurIPS 2013 Shai Shalev-Shwartz, Tong Zhang

Stochastic dual coordinate ascent (SDCA) is an effective technique for solving regularized loss minimization problems in machine learning.

BIG-bench Machine Learning

An Algorithm for Training Polynomial Networks

no code implementations26 Apr 2013 Roi Livni, Shai Shalev-Shwartz, Ohad Shamir

The main goal of this paper is the derivation of an efficient layer-by-layer algorithm for training such networks, which we denote as the \emph{Basis Learner}.

Learning Sparse Low-Threshold Linear Classifiers

no code implementations13 Dec 2012 Sivan Sabato, Shai Shalev-Shwartz, Nathan Srebro, Daniel Hsu, Tong Zhang

We consider the problem of learning a non-negative linear classifier with a $1$-norm of at most $k$, and a fixed threshold, under the hinge-loss.

The complexity of learning halfspaces using generalized linear methods

no code implementations3 Nov 2012 Amit Daniely, Nati Linial, Shai Shalev-Shwartz

The best approximation ratio achievable by an efficient algorithm is $O\left(\frac{1/\gamma}{\sqrt{\log(1/\gamma)}}\right)$ and is achieved using an algorithm from the above class.

regression

Efficient Active Learning of Halfspaces: an Aggressive Approach

no code implementations17 Aug 2012 Alon Gonen, Sivan Sabato, Shai Shalev-Shwartz

Our efficient aggressive active learner of half-spaces has formal approximation guarantees that hold when the pool is separable with a margin.

Active Learning

ShareBoost: Efficient multiclass learning with feature sharing

no code implementations NeurIPS 2011 Shai Shalev-Shwartz, Yonatan Wexler, Amnon Shashua

We consider the problem of learning a multiclass predictor that uses only few features, and in particular, the number of used features should increase sub-linearly with the number of possible classes.

Mind the Duality Gap: Logarithmic regret algorithms for online optimization

no code implementations NeurIPS 2008 Shai Shalev-Shwartz, Sham M. Kakade

We describe a primal-dual framework for the design and analysis of online strongly convex optimization algorithms.

Fast Rates for Regularized Objectives

no code implementations NeurIPS 2008 Karthik Sridharan, Shai Shalev-Shwartz, Nathan Srebro

We show that the empirical minimizer of a stochastic strongly convex objective, where the stochastic component is linear, converges to the population minimizer with rate $O(1/n)$.

Cannot find the paper you are looking for? You can Submit a new open access paper.