no code implementations • Eran Malach, Shai Shalev-Shwartz
To show any positive theoretical results, one must make assumptions on the data distribution.
no code implementations • 3 Dec 2024 • Shai Shalev-Shwartz, Amnon Shashua, Gal Beniamini, Yoav Levine, Or Sharir, Noam Wies, Ido Ben-Shaul, Tomer Nussbaum, Shir Granot Peled
Artificial Expert Intelligence (AEI) seeks to transcend the limitations of both Artificial General Intelligence (AGI) and narrow AI by integrating domain-specific expertise with critical, precise reasoning capabilities akin to those of top human experts.
no code implementations • 7 May 2024 • Kai-Chia Mo, Shai Shalev-Shwartz, Nisæl Shártov
We describe a novel subgradient following apparatus for calculating the optimum of convex problems with variational penalties.
2 code implementations • 28 Mar 2024 • Opher Lieber, Barak Lenz, Hofit Bata, Gal Cohen, Jhonathan Osin, Itay Dalmedigos, Erez Safahi, Shaked Meirom, Yonatan Belinkov, Shai Shalev-Shwartz, Omri Abend, Raz Alon, Tomer Asida, Amir Bergman, Roman Glozman, Michael Gokhman, Avashalom Manevich, Nir Ratner, Noam Rozen, Erez Shwartz, Mor Zusman, Yoav Shoham
We present Jamba, a new base large language model based on a novel hybrid Transformer-Mamba mixture-of-experts (MoE) architecture.
no code implementations • 26 Oct 2023 • Yoshua Bengio, Geoffrey Hinton, Andrew Yao, Dawn Song, Pieter Abbeel, Trevor Darrell, Yuval Noah Harari, Ya-Qin Zhang, Lan Xue, Shai Shalev-Shwartz, Gillian Hadfield, Jeff Clune, Tegan Maharaj, Frank Hutter, Atılım Güneş Baydin, Sheila Mcilraith, Qiqi Gao, Ashwin Acharya, David Krueger, Anca Dragan, Philip Torr, Stuart Russell, Daniel Kahneman, Jan Brauner, Sören Mindermann
Artificial Intelligence (AI) is progressing rapidly, and companies are shifting their focus to developing generalist AI systems that can autonomously act and pursue goals.
1 code implementation • 13 Feb 2023 • Gal Kaplun, Andrey Gurevich, Tal Swisa, Mazor David, Shai Shalev-Shwartz, Eran Malach
Finetuning a pretrained model has become a standard approach for training neural networks on novel tasks, resulting in fast convergence and improved performance.
no code implementations • 1 May 2022 • Ehud Karpas, Omri Abend, Yonatan Belinkov, Barak Lenz, Opher Lieber, Nir Ratner, Yoav Shoham, Hofit Bata, Yoav Levine, Kevin Leyton-Brown, Dor Muhlgay, Noam Rozen, Erez Schwartz, Gal Shachaf, Shai Shalev-Shwartz, Amnon Shashua, Moshe Tenenholtz
Huge language models (LMs) have ushered in a new era for AI, serving as a gateway to natural-language-based knowledge tasks.
no code implementations • 21 Apr 2022 • Yoav Levine, Itay Dalmedigos, Ori Ram, Yoel Zeldes, Daniel Jannai, Dor Muhlgay, Yoni Osin, Opher Lieber, Barak Lenz, Shai Shalev-Shwartz, Amnon Shashua, Kevin Leyton-Brown, Yoav Shoham
To demonstrate this, we introduce three novel methods for leveraging frozen models: input-dependent prompt tuning, frozen readers, and recursive LMs, each of which vastly improves on current frozen-model approaches.
no code implementations • 28 Mar 2022 • Gal Kaplun, Eran Malach, Preetum Nakkiran, Shai Shalev-Shwartz
We relate the notion of such samplers to knowledge distillation, where a student network imitates the outputs of a teacher on unlabeled data.
no code implementations • 29 Sep 2021 • Alon Brutzkus, Amir Globerson, Eran Malach, Shai Shalev-Shwartz
Convolutional networks (CNN) are computationally hard to learn.
no code implementations • 31 Jan 2021 • Eran Malach, Gilad Yehudai, Shai Shalev-Shwartz, Ohad Shamir
On the other hand, the fact that deep networks can efficiently express a target function does not mean that this target function can be learned efficiently by deep neural networks.
no code implementations • NeurIPS 2020 • Eran Malach, Shai Shalev-Shwartz
In fact, the proofs of such hardness results show that even weakly learning deep networks is hard.
no code implementations • ICLR 2021 • Eran Malach, Shai Shalev-Shwartz
Convolutional neural networks (CNN) exhibit unmatched performance in a multitude of computer vision tasks.
no code implementations • 18 Aug 2020 • Eran Malach, Shai Shalev-Shwartz
A supervised learning algorithm has access to a distribution of labeled examples, and needs to return a function (hypothesis) that correctly labels the examples.
no code implementations • 30 Mar 2020 • Shai Shalev-Shwartz, Shaked Shammah, Amnon Shashua
The AI-alignment problem arises when there is a discrepancy between the goals that a human designer specifies to an AI learner and a potential catastrophic outcome that does not reflect what the human designer really wants.
no code implementations • ICML 2020 • Eran Malach, Gilad Yehudai, Shai Shalev-Shwartz, Ohad Shamir
The lottery ticket hypothesis (Frankle and Carbin, 2018), states that a randomly-initialized network contains a small subnetwork such that, when trained in isolation, can compete with the performance of the original network.
no code implementations • 25 Oct 2019 • Eran Malach, Shai Shalev-Shwartz
To separate hard from easy to learn distributions, we observe the property of local correlation: correlation between local patterns of the input and the target label.
1 code implementation • ICLR 2020 • Daniel Gissin, Shai Shalev-Shwartz, Amit Daniely
A leading hypothesis for the surprising generalization of neural networks is that the dynamics of gradient descent bias the model towards simple solutions, by searching through the solution space in an incremental order of complexity.
no code implementations • ACL 2020 • Yoav Levine, Barak Lenz, Or Dagan, Ori Ram, Dan Padnos, Or Sharir, Shai Shalev-Shwartz, Amnon Shashua, Yoav Shoham
The ability to learn from large unlabeled corpora has allowed neural language models to advance the frontier in natural language understanding.
Ranked #11 on
Word Sense Disambiguation
on Words in Context
2 code implementations • ICLR 2019 • Daniel Gissin, Shai Shalev-Shwartz
We propose a new batch mode active learning algorithm designed for neural networks and large query batch sizes.
no code implementations • ICLR 2019 • Jonathan Fiat, Eran Malach, Shai Shalev-Shwartz
Specifically, we show a memorization result for networks of size $\tilde{\Omega}(\frac{m}{d})$, and improved generalization bounds.
1 code implementation • NeurIPS 2019 • Eran Malach, Shai Shalev-Shwartz
Using this result we prove that, at least in some distributions, the success of learning deep networks depends on whether the distribution can be well approximated by shallower networks, and we conjecture that this property holds in general.
no code implementations • 26 Mar 2018 • Eran Malach, Shai Shalev-Shwartz
We describe a layer-by-layer algorithm for training deep convolutional networks, where each step involves gradient updates for a two layer network followed by a simple clustering algorithm.
no code implementations • ICLR 2018 • Alon Brutzkus, Amir Globerson, Eran Malach, Shai Shalev-Shwartz
Neural networks exhibit good generalization behavior in the over-parameterized regime, where the number of network parameters exceeds the number of observations.
3 code implementations • 21 Aug 2017 • Shai Shalev-Shwartz, Shaked Shammah, Amnon Shashua
In the second part we describe a design of a system that adheres to our safety assurance requirements and is scalable to millions of cars.
1 code implementation • NeurIPS 2017 • Eran Malach, Shai Shalev-Shwartz
Unfortunately, this approach often leads to noisy labels.
no code implementations • 2 Jun 2017 • Shai Shalev-Shwartz, Ohad Shamir, Shaked Shammah
Exploiting the great expressive power of Deep Neural Network architectures, relies on the ability to train them.
1 code implementation • ICML 2017 • Shai Shalev-Shwartz, Ohad Shamir, Shaked Shammah
In recent years, Deep Learning has become the go-to solution for a broad range of applications, often outperforming state-of-the-art.
no code implementations • 16 Jan 2017 • Alon Gonen, Shai Shalev-Shwartz
We derive bounds on the sample complexity of empirical risk minimization (ERM) in the context of minimizing non-convex risks that admit the strict saddle property.
no code implementations • 11 Oct 2016 • Shai Shalev-Shwartz, Shaked Shammah, Amnon Shashua
Second, the Markov Decision Process model often used in robotics is problematic in our case because of unpredictable behavior of other agents in this multi-agent scenario.
no code implementations • NeurIPS 2016 • Oren Tadmor, Yonatan Wexler, Tal Rosenwein, Shai Shalev-Shwartz, Amnon Shashua
This work is motivated by the engineering task of achieving a near state-of-the-art face recognition on a minimal computing budget running on an embedded system.
no code implementations • 23 Apr 2016 • Shai Shalev-Shwartz, Amnon Shashua
We compare the end-to-end training approach to a modular approach in which a system is decomposed into semantically meaningful components.
no code implementations • 11 Mar 2016 • Galit Bary-Weisberg, Amit Daniely, Shai Shalev-Shwartz
The model of learning with \emph{local membership queries} interpolates between the PAC model and the membership queries model by allowing the learner to query the label of any example that is similar to an example in the training set.
no code implementations • 7 Feb 2016 • Alon Gonen, Francesco Orabona, Shai Shalev-Shwartz
We develop a novel preconditioning method for ridge regression, based on recent linear sketching methods.
no code implementations • 4 Feb 2016 • Shai Shalev-Shwartz, Nir Ben-Zrihem, Aviad Cohen, Amnon Shashua
We argue that dual versions of the MDP framework (that depend on the value function and the $Q$ function) are problematic for autonomous driving applications due to the non Markovian of the natural state space representation, and due to the continuous state and action spaces.
no code implementations • 4 Feb 2016 • Shai Shalev-Shwartz
Stochastic Dual Coordinate Ascent is a popular method for solving regularized loss minimization for the case of convex losses.
no code implementations • 4 Feb 2016 • Shai Shalev-Shwartz, Yonatan Wexler
Second, it is often argued that there is no sense to minimize the loss on the training set too much, as it will not be reflected in the generalization loss.
no code implementations • 15 Jan 2016 • Alon Gonen, Shai Shalev-Shwartz
We show that the average stability notion introduced by \cite{kearns1999algorithmic, bousquet2002stability} is invariant to data preconditioning, for a wide class of generalized linear models that includes most of the known exp-concave losses.
no code implementations • NeurIPS 2015 • Elad Hazan, Kfir. Y. Levy, Shai Shalev-Shwartz
The Normalized Gradient Descent (NGD) algorithm, is an adaptation of Gradient Descent, which updates according to the direction of the gradients, rather than the gradients themselves.
no code implementations • 8 Jun 2015 • Alon Gonen, Shai Shalev-Shwartz
We propose a novel method for speeding up stochastic optimization algorithms via sketching methods, which recently became a powerful tool for accelerating algorithms for numerical linear algebra.
no code implementations • 23 Mar 2015 • Yossi Arjevani, Shai Shalev-Shwartz, Ohad Shamir
This, in turn, reveals a powerful connection between a class of optimization algorithms and the analytic theory of polynomials whereby new lower and upper bounds are derived.
1 code implementation • 12 Mar 2015 • Elad Hazan, Kfir. Y. Levy, Shai Shalev-Shwartz
We extend our algorithm and analysis to the setting of stochastic non-convex optimization with noisy gradient feedback, attaining the same convergence rate.
no code implementations • 25 Feb 2015 • Amit Daniely, Alon Gonen, Shai Shalev-Shwartz
Strongly adaptive algorithms are algorithms whose performance on every time interval is close to optimal.
no code implementations • 22 Feb 2015 • Shai Shalev-Shwartz
Stochastic Dual Coordinate Ascent is a popular method for solving regularized loss minimization for the case of convex losses.
no code implementations • 13 Nov 2014 • Shai Shalev-Shwartz
We describe and analyze a new boosting algorithm for deep learning called SelfieBoost.
1 code implementation • NeurIPS 2014 • Roi Livni, Shai Shalev-Shwartz, Ohad Shamir
It is well-known that neural networks are computationally hard to train.
no code implementations • 10 May 2014 • Amit Daniely, Shai Shalev-Shwartz
Furthermore, we show that the sample complexity of these learners is better than the sample complexity of the ERM rule, thus settling in negative an open question due to Collins (2005).
no code implementations • 19 Feb 2014 • Alon Gonen, Dan Rosenbaum, Yonina Eldar, Shai Shalev-Shwartz
The goal of subspace learning is to find a $k$-dimensional subspace of $\mathbb{R}^d$, such that the expected squared distance between instance vectors and the subspace is as small as possible.
no code implementations • 10 Nov 2013 • Amit Daniely, Nati Linial, Shai Shalev-Shwartz
The biggest challenge in proving complexity results is to establish hardness of {\em improper learning} (a. k. a.
no code implementations • 10 Sep 2013 • Shai Shalev-Shwartz, Tong Zhang
We introduce a proximal version of the stochastic dual coordinate ascent method and show how to accelerate the method using an inner-outer iteration procedure.
no code implementations • 13 Aug 2013 • Amit Daniely, Sivan Sabato, Shai Ben-David, Shai Shalev-Shwartz
We study the sample complexity of multiclass prediction in several learning settings.
no code implementations • NeurIPS 2013 • Shai Shalev-Shwartz, Tong Zhang
Stochastic dual coordinate ascent (SDCA) is an effective technique for solving regularized loss minimization problems in machine learning.
no code implementations • 26 Apr 2013 • Roi Livni, Shai Shalev-Shwartz, Ohad Shamir
The main goal of this paper is the derivation of an efficient layer-by-layer algorithm for training such networks, which we denote as the \emph{Basis Learner}.
no code implementations • 13 Dec 2012 • Sivan Sabato, Shai Shalev-Shwartz, Nathan Srebro, Daniel Hsu, Tong Zhang
We consider the problem of learning a non-negative linear classifier with a $1$-norm of at most $k$, and a fixed threshold, under the hinge-loss.
no code implementations • 3 Nov 2012 • Amit Daniely, Nati Linial, Shai Shalev-Shwartz
The best approximation ratio achievable by an efficient algorithm is $O\left(\frac{1/\gamma}{\sqrt{\log(1/\gamma)}}\right)$ and is achieved using an algorithm from the above class.
no code implementations • 17 Aug 2012 • Alon Gonen, Sivan Sabato, Shai Shalev-Shwartz
Our efficient aggressive active learner of half-spaces has formal approximation guarantees that hold when the pool is separable with a margin.
no code implementations • NeurIPS 2011 • Shai Shalev-Shwartz, Yonatan Wexler, Amnon Shashua
We consider the problem of learning a multiclass predictor that uses only few features, and in particular, the number of used features should increase sub-linearly with the number of possible classes.
no code implementations • NeurIPS 2008 • Shai Shalev-Shwartz, Sham M. Kakade
We describe a primal-dual framework for the design and analysis of online strongly convex optimization algorithms.
no code implementations • NeurIPS 2008 • Karthik Sridharan, Shai Shalev-Shwartz, Nathan Srebro
We show that the empirical minimizer of a stochastic strongly convex objective, where the stochastic component is linear, converges to the population minimizer with rate $O(1/n)$.