Search Results for author: Shai Shalev-Shwartz

Found 57 papers, 9 papers with code

Provable Guarantees on Learning Hierarchical Generative Models with Deep CNNs

no code implementations • Eran Malach, Shai Shalev-Shwartz

To show any positive theoretical results, one must make assumptions on the data distribution.

Paper
Add Code

Jamba: A Hybrid Transformer-Mamba Language Model

no code implementations • 28 Mar 2024 • Opher Lieber, Barak Lenz, Hofit Bata, Gal Cohen, Jhonathan Osin, Itay Dalmedigos, Erez Safahi, Shaked Meirom, Yonatan Belinkov, Shai Shalev-Shwartz, Omri Abend, Raz Alon, Tomer Asida, Amir Bergman, Roman Glozman, Michael Gokhman, Avashalom Manevich, Nir Ratner, Noam Rozen, Erez Shwartz, Mor Zusman, Yoav Shoham

We present Jamba, a new base large language model based on a novel hybrid Transformer-Mamba mixture-of-experts (MoE) architecture.

Language Modelling Large Language Model

Paper
Add Code

Managing AI Risks in an Era of Rapid Progress

no code implementations • 26 Oct 2023 • Yoshua Bengio, Geoffrey Hinton, Andrew Yao, Dawn Song, Pieter Abbeel, Yuval Noah Harari, Ya-Qin Zhang, Lan Xue, Shai Shalev-Shwartz, Gillian Hadfield, Jeff Clune, Tegan Maharaj, Frank Hutter, Atılım Güneş Baydin, Sheila Mcilraith, Qiqi Gao, Ashwin Acharya, David Krueger, Anca Dragan, Philip Torr, Stuart Russell, Daniel Kahneman, Jan Brauner, Sören Mindermann

In this short consensus paper, we outline risks from upcoming, advanced AI systems.

Paper
Add Code

Less is More: Selective Layer Finetuning with SubTuning

1 code implementation • 13 Feb 2023 • Gal Kaplun, Andrey Gurevich, Tal Swisa, Mazor David, Shai Shalev-Shwartz, Eran Malach

Finetuning a pretrained model has become a standard approach for training neural networks on novel tasks, resulting in fast convergence and improved performance.

Multi-Task Learning

Paper
Code

MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning

no code implementations • 1 May 2022 • Ehud Karpas, Omri Abend, Yonatan Belinkov, Barak Lenz, Opher Lieber, Nir Ratner, Yoav Shoham, Hofit Bata, Yoav Levine, Kevin Leyton-Brown, Dor Muhlgay, Noam Rozen, Erez Schwartz, Gal Shachaf, Shai Shalev-Shwartz, Amnon Shashua, Moshe Tenenholtz

Huge language models (LMs) have ushered in a new era for AI, serving as a gateway to natural-language-based knowledge tasks.

Paper
Add Code

Standing on the Shoulders of Giant Frozen Language Models

no code implementations • 21 Apr 2022 • Yoav Levine, Itay Dalmedigos, Ori Ram, Yoel Zeldes, Daniel Jannai, Dor Muhlgay, Yoni Osin, Opher Lieber, Barak Lenz, Shai Shalev-Shwartz, Amnon Shashua, Kevin Leyton-Brown, Yoav Shoham

To demonstrate this, we introduce three novel methods for leveraging frozen models: input-dependent prompt tuning, frozen readers, and recursive LMs, each of which vastly improves on current frozen-model approaches.

Paper
Add Code

Knowledge Distillation: Bad Models Can Be Good Role Models

no code implementations • 28 Mar 2022 • Gal Kaplun, Eran Malach, Preetum Nakkiran, Shai Shalev-Shwartz

We relate the notion of such samplers to knowledge distillation, where a student network imitates the outputs of a teacher on unlabeled data.

Knowledge Distillation Learning Theory

Paper
Add Code

Provable Learning of Convolutional Neural Networks with Data Driven Features

no code implementations • 29 Sep 2021 • Alon Brutzkus, Amir Globerson, Eran Malach, Shai Shalev-Shwartz

Convolutional networks (CNN) are computationally hard to learn.

Paper
Add Code

The Connection Between Approximation, Depth Separation and Learnability in Neural Networks

no code implementations • 31 Jan 2021 • Eran Malach, Gilad Yehudai, Shai Shalev-Shwartz, Ohad Shamir

On the other hand, the fact that deep networks can efficiently express a target function does not mean that this target function can be learned efficiently by deep neural networks.

Paper
Add Code

The Implications of Local Correlation on Learning Some Deep Functions

no code implementations • NeurIPS 2020 • Eran Malach, Shai Shalev-Shwartz

In fact, the proofs of such hardness results show that even weakly learning deep networks is hard.

Paper
Add Code

Computational Separation Between Convolutional and Fully-Connected Networks

no code implementations • ICLR 2021 • Eran Malach, Shai Shalev-Shwartz

Convolutional neural networks (CNN) exhibit unmatched performance in a multitude of computer vision tasks.

Paper
Add Code

When Hardness of Approximation Meets Hardness of Learning

no code implementations • 18 Aug 2020 • Eran Malach, Shai Shalev-Shwartz

A supervised learning algorithm has access to a distribution of labeled examples, and needs to return a function (hypothesis) that correctly labels the examples.

Paper
Add Code

On the Ethics of Building AI in a Responsible Manner

no code implementations • 30 Mar 2020 • Shai Shalev-Shwartz, Shaked Shammah, Amnon Shashua

The AI-alignment problem arises when there is a discrepancy between the goals that a human designer specifies to an AI learner and a potential catastrophic outcome that does not reflect what the human designer really wants.

BIG-bench Machine Learning Ethics

Paper
Add Code

Proving the Lottery Ticket Hypothesis: Pruning is All You Need

no code implementations • ICML 2020 • Eran Malach, Gilad Yehudai, Shai Shalev-Shwartz, Ohad Shamir

The lottery ticket hypothesis (Frankle and Carbin, 2018), states that a randomly-initialized network contains a small subnetwork such that, when trained in isolation, can compete with the performance of the original network.

Paper
Add Code

Learning Boolean Circuits with Neural Networks

no code implementations • 25 Oct 2019 • Eran Malach, Shai Shalev-Shwartz

To separate hard from easy to learn distributions, we observe the property of local correlation: correlation between local patterns of the input and the target label.

Paper
Add Code

The Implicit Bias of Depth: How Incremental Learning Drives Generalization

1 code implementation • ICLR 2020 • Daniel Gissin, Shai Shalev-Shwartz, Amit Daniely

A leading hypothesis for the surprising generalization of neural networks is that the dynamics of gradient descent bias the model towards simple solutions, by searching through the solution space in an incremental order of complexity.

Binary Classification Incremental Learning

Paper
Code

SenseBERT: Driving Some Sense into BERT

no code implementations • ACL 2020 • Yoav Levine, Barak Lenz, Or Dagan, Ori Ram, Dan Padnos, Or Sharir, Shai Shalev-Shwartz, Amnon Shashua, Yoav Shoham

The ability to learn from large unlabeled corpora has allowed neural language models to advance the frontier in natural language understanding.

Ranked #11 on Word Sense Disambiguation on Words in Context

Language Modelling Natural Language Inference +2

Paper
Add Code

Discriminative Active Learning

2 code implementations • ICLR 2019 • Daniel Gissin, Shai Shalev-Shwartz

We propose a new batch mode active learning algorithm designed for neural networks and large query batch sizes.

Active Learning Binary Classification +3

520

Paper
Code

Decoupling Gating from Linearity

no code implementations • ICLR 2019 • Jonathan Fiat, Eran Malach, Shai Shalev-Shwartz

Specifically, we show a memorization result for networks of size $\tilde{\Omega}(\frac{m}{d})$, and improved generalization bounds.

Generalization Bounds Memorization

Paper
Add Code

Is Deeper Better only when Shallow is Good?

1 code implementation • NeurIPS 2019 • Eran Malach, Shai Shalev-Shwartz

Using this result we prove that, at least in some distributions, the success of learning deep networks depends on whether the distribution can be well approximated by shallower networks, and we conjecture that this property holds in general.

Learning Theory Open-Ended Question Answering

Paper
Code

A Provably Correct Algorithm for Deep Learning that Actually Works

no code implementations • 26 Mar 2018 • Eran Malach, Shai Shalev-Shwartz

We describe a layer-by-layer algorithm for training deep convolutional networks, where each step involves gradient updates for a two layer network followed by a simple clustering algorithm.

Clustering

Paper
Add Code

SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data

no code implementations • ICLR 2018 • Alon Brutzkus, Amir Globerson, Eran Malach, Shai Shalev-Shwartz

Neural networks exhibit good generalization behavior in the over-parameterized regime, where the number of network parameters exceeds the number of observations.

Generalization Bounds

Paper
Add Code

On a Formal Model of Safe and Scalable Self-driving Cars

3 code implementations • 21 Aug 2017 • Shai Shalev-Shwartz, Shaked Shammah, Amnon Shashua

In the second part we describe a design of a system that adheres to our safety assurance requirements and is scalable to millions of cars.

Autonomous Driving Self-Driving Cars

325

Paper
Code

Decoupling "when to update" from "how to update"

1 code implementation • NeurIPS 2017 • Eran Malach, Shai Shalev-Shwartz

Unfortunately, this approach often leads to noisy labels.

Face Recognition Gender Classification +1

Paper
Code

Weight Sharing is Crucial to Succesful Optimization

no code implementations • 2 Jun 2017 • Shai Shalev-Shwartz, Ohad Shamir, Shaked Shammah

Exploiting the great expressive power of Deep Neural Network architectures, relies on the ability to train them.

Paper
Add Code

Failures of Gradient-Based Deep Learning

1 code implementation • ICML 2017 • Shai Shalev-Shwartz, Ohad Shamir, Shaked Shammah

In recent years, Deep Learning has become the go-to solution for a broad range of applications, often outperforming state-of-the-art.

Paper
Code

Fast Rates for Empirical Risk Minimization of Strict Saddle Problems

no code implementations • 16 Jan 2017 • Alon Gonen, Shai Shalev-Shwartz

We derive bounds on the sample complexity of empirical risk minimization (ERM) in the context of minimizing non-convex risks that admit the strict saddle property.

Paper
Add Code

Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving

no code implementations • 11 Oct 2016 • Shai Shalev-Shwartz, Shaked Shammah, Amnon Shashua

Second, the Markov Decision Process model often used in robotics is problematic in our case because of unpredictable behavior of other agents in this multi-agent scenario.

Autonomous Driving Multi-agent Reinforcement Learning +3

Paper
Add Code

Learning a Metric Embedding for Face Recognition using the Multibatch Method

no code implementations • NeurIPS 2016 • Oren Tadmor, Yonatan Wexler, Tal Rosenwein, Shai Shalev-Shwartz, Amnon Shashua

This work is motivated by the engineering task of achieving a near state-of-the-art face recognition on a minimal computing budget running on an embedded system.

Face Recognition

Paper
Add Code

On the Sample Complexity of End-to-end Training vs. Semantic Abstraction Training

no code implementations • 23 Apr 2016 • Shai Shalev-Shwartz, Amnon Shashua

We compare the end-to-end training approach to a modular approach in which a system is decomposed into semantically meaningful components.

Autonomous Driving

Paper
Add Code

Distribution Free Learning with Local Queries

no code implementations • 11 Mar 2016 • Galit Bary-Weisberg, Amit Daniely, Shai Shalev-Shwartz

The model of learning with \emph{local membership queries} interpolates between the PAC model and the membership queries model by allowing the learner to query the label of any example that is similar to an example in the training set.

Paper
Add Code

Solving Ridge Regression using Sketched Preconditioned SVRG

no code implementations • 7 Feb 2016 • Alon Gonen, Francesco Orabona, Shai Shalev-Shwartz

We develop a novel preconditioning method for ridge regression, based on recent linear sketching methods.

regression

Paper
Add Code

SDCA without Duality, Regularization, and Individual Convexity

no code implementations • 4 Feb 2016 • Shai Shalev-Shwartz

Stochastic Dual Coordinate Ascent is a popular method for solving regularized loss minimization for the case of convex losses.

Paper
Add Code

Long-term Planning by Short-term Prediction

no code implementations • 4 Feb 2016 • Shai Shalev-Shwartz, Nir Ben-Zrihem, Aviad Cohen, Amnon Shashua

We argue that dual versions of the MDP framework (that depend on the value function and the $Q$ function) are problematic for autonomous driving applications due to the non Markovian of the natural state space representation, and due to the continuous state and action spaces.

Autonomous Driving

Paper
Add Code

Minimizing the Maximal Loss: How and Why?

no code implementations • 4 Feb 2016 • Shai Shalev-Shwartz, Yonatan Wexler

Second, it is often argued that there is no sense to minimize the loss on the training set too much, as it will not be reflected in the generalization loss.

Paper
Add Code

Average Stability is Invariant to Data Preconditioning. Implications to Exp-concave Empirical Risk Minimization

no code implementations • 15 Jan 2016 • Alon Gonen, Shai Shalev-Shwartz

We show that the average stability notion introduced by \cite{kearns1999algorithmic, bousquet2002stability} is invariant to data preconditioning, for a wide class of generalized linear models that includes most of the known exp-concave losses.

Paper
Add Code

Beyond Convexity: Stochastic Quasi-Convex Optimization

no code implementations • NeurIPS 2015 • Elad Hazan, Kfir. Y. Levy, Shai Shalev-Shwartz

The Normalized Gradient Descent (NGD) algorithm, is an adaptation of Gradient Descent, which updates according to the direction of the gradients, rather than the gradients themselves.

Paper
Add Code

Faster SGD Using Sketched Conditioning

no code implementations • 8 Jun 2015 • Alon Gonen, Shai Shalev-Shwartz

We propose a novel method for speeding up stochastic optimization algorithms via sketching methods, which recently became a powerful tool for accelerating algorithms for numerical linear algebra.

Stochastic Optimization

Paper
Add Code

On Lower and Upper Bounds for Smooth and Strongly Convex Optimization Problems

no code implementations • 23 Mar 2015 • Yossi Arjevani, Shai Shalev-Shwartz, Ohad Shamir

This, in turn, reveals a powerful connection between a class of optimization algorithms and the analytic theory of polynomials whereby new lower and upper bounds are derived.

valid

Paper
Add Code

On Graduated Optimization for Stochastic Non-Convex Problems

1 code implementation • 12 Mar 2015 • Elad Hazan, Kfir. Y. Levy, Shai Shalev-Shwartz

We extend our algorithm and analysis to the setting of stochastic non-convex optimization with noisy gradient feedback, attaining the same convergence rate.

Paper
Code

Strongly Adaptive Online Learning

no code implementations • 25 Feb 2015 • Amit Daniely, Alon Gonen, Shai Shalev-Shwartz

Strongly adaptive algorithms are algorithms whose performance on every time interval is close to optimal.

Paper
Add Code

SDCA without Duality

no code implementations • 22 Feb 2015 • Shai Shalev-Shwartz

Stochastic Dual Coordinate Ascent is a popular method for solving regularized loss minimization for the case of convex losses.

Paper
Add Code

SelfieBoost: A Boosting Algorithm for Deep Learning

no code implementations • 13 Nov 2014 • Shai Shalev-Shwartz

We describe and analyze a new boosting algorithm for deep learning called SelfieBoost.

Paper
Add Code

On the Computational Efficiency of Training Neural Networks

1 code implementation • NeurIPS 2014 • Roi Livni, Shai Shalev-Shwartz, Ohad Shamir

It is well-known that neural networks are computationally hard to train.

Computational Efficiency

244

Paper
Code

Optimal Learners for Multiclass Problems

no code implementations • 10 May 2014 • Amit Daniely, Shai Shalev-Shwartz

Furthermore, we show that the sample complexity of these learners is better than the sample complexity of the ERM rule, thus settling in negative an open question due to Collins (2005).

Binary Classification Open-Ended Question Answering

Paper
Add Code

Subspace Learning with Partial Information

no code implementations • 19 Feb 2014 • Alon Gonen, Dan Rosenbaum, Yonina Eldar, Shai Shalev-Shwartz

The goal of subspace learning is to find a $k$-dimensional subspace of $\mathbb{R}^d$, such that the expected squared distance between instance vectors and the subspace is as small as possible.

Paper
Add Code

From average case complexity to improper learning complexity

no code implementations • 10 Nov 2013 • Amit Daniely, Nati Linial, Shai Shalev-Shwartz

The biggest challenge in proving complexity results is to establish hardness of {\em improper learning} (a. k. a.

Learning Theory

Paper
Add Code

Accelerated Proximal Stochastic Dual Coordinate Ascent for Regularized Loss Minimization

no code implementations • 10 Sep 2013 • Shai Shalev-Shwartz, Tong Zhang

We introduce a proximal version of the stochastic dual coordinate ascent method and show how to accelerate the method using an inner-outer iteration procedure.

BIG-bench Machine Learning regression

Paper
Add Code

Multiclass learnability and the ERM principle

no code implementations • 13 Aug 2013 • Amit Daniely, Sivan Sabato, Shai Ben-David, Shai Shalev-Shwartz

We study the sample complexity of multiclass prediction in several learning settings.

Binary Classification General Classification

Paper
Add Code

Accelerated Mini-Batch Stochastic Dual Coordinate Ascent

no code implementations • NeurIPS 2013 • Shai Shalev-Shwartz, Tong Zhang

Stochastic dual coordinate ascent (SDCA) is an effective technique for solving regularized loss minimization problems in machine learning.

BIG-bench Machine Learning

Paper
Add Code

An Algorithm for Training Polynomial Networks

no code implementations • 26 Apr 2013 • Roi Livni, Shai Shalev-Shwartz, Ohad Shamir

The main goal of this paper is the derivation of an efficient layer-by-layer algorithm for training such networks, which we denote as the \emph{Basis Learner}.

Paper
Add Code

Learning Sparse Low-Threshold Linear Classifiers

no code implementations • 13 Dec 2012 • Sivan Sabato, Shai Shalev-Shwartz, Nathan Srebro, Daniel Hsu, Tong Zhang

We consider the problem of learning a non-negative linear classifier with a $1$-norm of at most $k$, and a fixed threshold, under the hinge-loss.

Paper
Add Code

The complexity of learning halfspaces using generalized linear methods

no code implementations • 3 Nov 2012 • Amit Daniely, Nati Linial, Shai Shalev-Shwartz

The best approximation ratio achievable by an efficient algorithm is $O\left(\frac{1/\gamma}{\sqrt{\log(1/\gamma)}}\right)$ and is achieved using an algorithm from the above class.

regression

Paper
Add Code

Efficient Active Learning of Halfspaces: an Aggressive Approach

no code implementations • 17 Aug 2012 • Alon Gonen, Sivan Sabato, Shai Shalev-Shwartz

Our efficient aggressive active learner of half-spaces has formal approximation guarantees that hold when the pool is separable with a margin.

Active Learning

Paper
Add Code

ShareBoost: Efficient multiclass learning with feature sharing

no code implementations • NeurIPS 2011 • Shai Shalev-Shwartz, Yonatan Wexler, Amnon Shashua

We consider the problem of learning a multiclass predictor that uses only few features, and in particular, the number of used features should increase sub-linearly with the number of possible classes.

Paper
Add Code

Mind the Duality Gap: Logarithmic regret algorithms for online optimization

no code implementations • NeurIPS 2008 • Shai Shalev-Shwartz, Sham M. Kakade

We describe a primal-dual framework for the design and analysis of online strongly convex optimization algorithms.

Paper
Add Code

Fast Rates for Regularized Objectives

no code implementations • NeurIPS 2008 • Karthik Sridharan, Shai Shalev-Shwartz, Nathan Srebro

We show that the empirical minimizer of a stochastic strongly convex objective, where the stochastic component is linear, converges to the population minimizer with rate $O(1/n)$.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.