Search Results for author: Ido Hakimi

Found 8 papers, 2 papers with code

q2d: Turning Questions into Dialogs to Teach Models How to Search

no code implementations27 Apr 2023 Yonatan Bitton, Shlomi Cohen-Ganor, Ido Hakimi, Yoad Lewenberg, Roee Aharoni, Enav Weinreb

One of the exciting capabilities of recent language models for dialog is their ability to independently search for relevant information to ground a given dialog response.

Language Modelling Large Language Model +1

Learning Under Delayed Feedback: Implicitly Adapting to Gradient Delays

no code implementations23 Jun 2021 Rotem Zamir Aviv, Ido Hakimi, Assaf Schuster, Kfir Y. Levy

We consider stochastic convex optimization problems, where several machines act asynchronously in parallel while sharing a common memory.

Gap-Aware Mitigation of Gradient Staleness

no code implementations ICLR 2020 Saar Barkai, Ido Hakimi, Assaf Schuster

In this paper we define the Gap as a measure of gradient staleness and propose Gap-Aware (GA), a novel asynchronous-distributed method that penalizes stale gradients linearly to the Gap and performs well even when scaling to large numbers of workers.

Cloud Computing

Gap Aware Mitigation of Gradient Staleness

no code implementations24 Sep 2019 Saar Barkai, Ido Hakimi, Assaf Schuster

In this paper we define the Gap as a measure of gradient staleness and propose Gap-Aware (GA), a novel asynchronous-distributed method that penalizes stale gradients linearly to the Gap and performs well even when scaling to large numbers of workers.

Cloud Computing

Taming Momentum in a Distributed Asynchronous Environment

no code implementations26 Jul 2019 Ido Hakimi, Saar Barkai, Moshe Gabel, Assaf Schuster

We propose DANA: a novel technique for asynchronous distributed SGD with momentum that mitigates gradient staleness by computing the gradient on an estimated future position of the model's parameters.

16k Distributed Computing

DANA: Scalable Out-of-the-box Distributed ASGD Without Retuning

no code implementations ICLR 2019 Ido Hakimi, Saar Barkai, Moshe Gabel, Assaf Schuster

We propose DANA, a novel approach that scales out-of-the-box to large clusters using the same hyperparameters and learning schedule optimized for training on a single worker, while maintaining similar final accuracy without additional overhead.

Distributed Computing

Faster Neural Network Training with Approximate Tensor Operations

1 code implementation NeurIPS 2021 Menachem Adelman, Kfir Y. Levy, Ido Hakimi, Mark Silberstein

We propose a novel technique for faster deep neural network training which systematically applies sample-based approximation to the constituent tensor operations, i. e., matrix multiplications and convolutions.

Cannot find the paper you are looking for? You can Submit a new open access paper.