1 code implementation • 16 Feb 2023 • Hadar Sivan, Moshe Gabel, Assaf Schuster
Popular machine learning approaches forgo second-order information due to the difficulty of computing curvature in high dimensions.
no code implementations • 28 Jul 2022 • Guy Shapira, Assaf Schuster
We present REDEEMER (REinforcement baseD cEp pattErn MinER), a novel reinforcement and active learning approach aimed at mining CEP patterns that allow expansion of the knowledge extracted while reducing the human effort required.
no code implementations • 18 Jun 2022 • Yuval Sieradzki, Nitzan Hodos, Gal Yehuda, Assaf Schuster
We show that a CFNN can approximate the indicator of a $d$-dimensional ball to arbitrary accuracy with only 2 layers and $\mathcal{O}(1)$ neurons, where a 2-layer deterministic network was shown to require $\Omega(e^d)$ neurons, an exponential improvement (arXiv:1610. 09887).
1 code implementation • USENIX Annual Technical Conference 2021 • Saar Eliad, Ido Hakimi, Alon De Jager, Mark Silberstein, Assaf Schuster
Fine-tuning is an increasingly common technique that leverages transfer learning to dramatically expedite the training of huge, high-quality models.
no code implementations • 23 Jun 2021 • Rotem Zamir Aviv, Ido Hakimi, Assaf Schuster, Kfir Y. Levy
We consider stochastic convex optimization problems, where several machines act asynchronously in parallel while sharing a common memory.
no code implementations • ICLR 2020 • Saar Barkai, Ido Hakimi, Assaf Schuster
In this paper we define the Gap as a measure of gradient staleness and propose Gap-Aware (GA), a novel asynchronous-distributed method that penalizes stale gradients linearly to the Gap and performs well even when scaling to large numbers of workers.
no code implementations • ICML 2020 • Gal Yehuda, Moshe Gabel, Assaf Schuster
Can deep neural networks learn to solve any task, and in particular problems of high complexity?
no code implementations • 28 Nov 2019 • Michael Kamp, Mario Boley, Michael Mock, Daniel Keren, Assaf Schuster, Izchak Sharfman
The learning performance of such a protocol is intuitively optimal if approximately the same loss is incurred as in a hypothetical serial setting.
no code implementations • 24 Sep 2019 • Saar Barkai, Ido Hakimi, Assaf Schuster
In this paper we define the Gap as a measure of gradient staleness and propose Gap-Aware (GA), a novel asynchronous-distributed method that penalizes stale gradients linearly to the Gap and performs well even when scaling to large numbers of workers.
no code implementations • 26 Jul 2019 • Ido Hakimi, Saar Barkai, Moshe Gabel, Assaf Schuster
We propose DANA: a novel technique for asynchronous distributed SGD with momentum that mitigates gradient staleness by computing the gradient on an estimated future position of the model's parameters.
no code implementations • ICLR 2019 • Ido Hakimi, Saar Barkai, Moshe Gabel, Assaf Schuster
We propose DANA, a novel approach that scales out-of-the-box to large clusters using the same hyperparameters and learning schedule optimized for training on a single worker, while maintaining similar final accuracy without additional overhead.