Search Results for author: Tal Ben-Nun

Found 27 papers, 14 papers with code

Lion Cub: Minimizing Communication Overhead in Distributed Lion

no code implementations25 Nov 2024 Satoki Ishikawa, Tal Ben-Nun, Brian Van Essen, Rio Yokota, Nikoli Dryden

Communication overhead is a key challenge in distributed deep learning, especially on slower Ethernet interconnects, and given current hardware trends, communication is likely to become a major bottleneck.

Quantization

Cached Operator Reordering: A Unified View for Fast GNN Training

no code implementations23 Aug 2023 Julia Bazinska, Andrei Ivanov, Tal Ben-Nun, Nikoli Dryden, Maciej Besta, Siyuan Shen, Torsten Hoefler

Graph Neural Networks (GNNs) are a powerful tool for handling structured graph data and addressing tasks such as node classification, graph classification, and clustering.

Graph Attention Graph Classification +1

STen: Productive and Efficient Sparsity in PyTorch

no code implementations15 Apr 2023 Andrei Ivanov, Nikoli Dryden, Tal Ben-Nun, Saleh Ashkboos, Torsten Hoefler

As deep learning models grow, sparsity is becoming an increasingly critical component of deep neural networks, enabling improved performance and reduced storage.

A Theory of I/O-Efficient Sparse Neural Network Inference

no code implementations3 Jan 2023 Niels Gleinig, Tal Ben-Nun, Torsten Hoefler

We can only process data that is stored in fast memory, which incurs data movement (input/output-operations, or I/Os) between the two units.

ENS-10: A Dataset For Post-Processing Ensemble Weather Forecasts

1 code implementation29 Jun 2022 Saleh Ashkboos, Langwen Huang, Nikoli Dryden, Tal Ben-Nun, Peter Dueben, Lukas Gianinazzi, Luca Kummer, Torsten Hoefler

We propose the ENS-10 prediction correction task for improving the forecast quality at a 48-hour lead time through ensemble post-processing.

Weather Forecasting

A Data-Centric Optimization Framework for Machine Learning

1 code implementation20 Oct 2021 Oliver Rausch, Tal Ben-Nun, Nikoli Dryden, Andrei Ivanov, Shigang Li, Torsten Hoefler

Rapid progress in deep learning is leading to a diverse set of quickly changing models, with a dramatically growing demand for compute.

BIG-bench Machine Learning

Learning Combinatorial Node Labeling Algorithms

no code implementations7 Jun 2021 Lukas Gianinazzi, Maximilian Fries, Nikoli Dryden, Tal Ben-Nun, Maciej Besta, Torsten Hoefler

We present a novel neural architecture to solve graph optimization problems where the solution consists of arbitrary node labels, allowing us to solve hard problems like graph coloring.

BIG-bench Machine Learning Graph Attention +1

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

no code implementations31 Jan 2021 Torsten Hoefler, Dan Alistarh, Tal Ben-Nun, Nikoli Dryden, Alexandra Peste

The growing energy and performance costs of deep learning have driven the community to reduce the size of neural networks by selectively pruning components.

Clairvoyant Prefetching for Distributed Machine Learning I/O

no code implementations21 Jan 2021 Nikoli Dryden, Roman Böhringer, Tal Ben-Nun, Torsten Hoefler

I/O is emerging as a major bottleneck for machine learning training, especially in distributed environments.

BIG-bench Machine Learning

MixSize: Training Convnets With Mixed Image Sizes for Improved Accuracy, Speed and Scale Resiliency

2 code implementations1 Jan 2021 Elad Hoffer, Berry Weinstein, Itay Hubara, Tal Ben-Nun, Torsten Hoefler, Daniel Soudry

Although trained on images of a specific size, it is well established that CNNs can be used to evaluate a wide range of image sizes at test time, by adjusting the size of intermediate feature maps.

Deep Data Flow Analysis

no code implementations21 Nov 2020 Chris Cummins, Hugh Leather, Zacharias Fisches, Tal Ben-Nun, Torsten Hoefler, Michael O'Boyle

Compiler architects increasingly look to machine learning when building heuristics for compiler optimization.

BIG-bench Machine Learning Compiler Optimization

Deep Learning for Post-Processing Ensemble Weather Forecasts

1 code implementation18 May 2020 Peter Grönquist, Chengyuan Yao, Tal Ben-Nun, Nikoli Dryden, Peter Dueben, Shigang Li, Torsten Hoefler

Applied to global data, our mixed models achieve a relative improvement in ensemble forecast skill (CRPS) of over 14%.

Deep Learning Weather Forecasting

ProGraML: Graph-based Deep Learning for Program Optimization and Analysis

2 code implementations23 Mar 2020 Chris Cummins, Zacharias V. Fisches, Tal Ben-Nun, Torsten Hoefler, Hugh Leather

We introduce ProGraML - Program Graphs for Machine Learning - a novel graph-based program representation using a low level, language agnostic, and portable format; and machine learning models capable of performing complex downstream tasks over these graphs.

BIG-bench Machine Learning Deep Learning

Predicting Weather Uncertainty with Deep Convnets

no code implementations2 Nov 2019 Peter Grönquist, Tal Ben-Nun, Nikoli Dryden, Peter Dueben, Luca Lavarini, Shigang Li, Torsten Hoefler

Modern weather forecast models perform uncertainty quantification using ensemble prediction systems, which collect nonparametric statistics based on multiple perturbed simulations.

Uncertainty Quantification Weather Forecasting

Mix & Match: training convnets with mixed image sizes for improved accuracy, speed and scale resiliency

2 code implementations12 Aug 2019 Elad Hoffer, Berry Weinstein, Itay Hubara, Tal Ben-Nun, Torsten Hoefler, Daniel Soudry

Although trained on images of aspecific size, it is well established that CNNs can be used to evaluate a wide range of image sizes at test time, by adjusting the size of intermediate feature maps.

Taming Unbalanced Training Workloads in Deep Learning with Partial Collective Operations

no code implementations12 Aug 2019 Shigang Li, Tal Ben-Nun, Salvatore Di Girolamo, Dan Alistarh, Torsten Hoefler

Load imbalance pervasively exists in distributed deep learning training systems, either caused by the inherent imbalance in learned tasks or by the system itself.

Deep Learning

Stateful Dataflow Multigraphs: A Data-Centric Model for High-Performance Parallel Programs

3 code implementations27 Feb 2019 Tal Ben-Nun, Johannes De Fine Licht, Alexandros Nikolaos Ziogas, Timo Schneider, Torsten Hoefler

With the ubiquity of accelerators, such as FPGAs and GPUs, the complexity of high-performance programming is increasing beyond the skill-set of the average scientist in domains outside of computer science.

Programming Languages Distributed, Parallel, and Cluster Computing Performance

Graph Processing on FPGAs: Taxonomy, Survey, Challenges

no code implementations25 Feb 2019 Maciej Besta, Dimitri Stanojevic, Johannes De Fine Licht, Tal Ben-Nun, Torsten Hoefler

To facilitate understanding of this emerging domain, we present the first survey and taxonomy on graph computations on FPGAs.

Distributed, Parallel, and Cluster Computing Hardware Architecture

A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep Learning

1 code implementation29 Jan 2019 Tal Ben-Nun, Maciej Besta, Simon Huber, Alexandros Nikolaos Ziogas, Daniel Peter, Torsten Hoefler

We introduce Deep500: the first customizable benchmarking infrastructure that enables fair comparison of the plethora of deep learning frameworks, algorithms, libraries, and techniques.

Benchmarking Deep Learning +1

Augment your batch: better training with larger batches

1 code implementation27 Jan 2019 Elad Hoffer, Tal Ben-Nun, Itay Hubara, Niv Giladi, Torsten Hoefler, Daniel Soudry

We analyze the effect of batch augmentation on gradient variance and show that it empirically improves convergence for a wide variety of deep neural networks and datasets.

Neural Code Comprehension: A Learnable Representation of Code Semantics

1 code implementation NeurIPS 2018 Tal Ben-Nun, Alice Shoshana Jakobovits, Torsten Hoefler

In this paper, we propose a novel processing technique to learn code semantics, and apply it to a variety of program analysis tasks.

Clustering

μ-cuDNN: Accelerating Deep Learning Frameworks with Micro-Batching

1 code implementation13 Apr 2018 Yosuke Oyama, Tal Ben-Nun, Torsten Hoefler, Satoshi Matsuoka

NVIDIA cuDNN is a low-level library that provides GPU kernels frequently used in deep learning.

Deep Learning

Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis

1 code implementation26 Feb 2018 Tal Ben-Nun, Torsten Hoefler

We then review and model the different types of concurrency in DNNs: from the single operator, through parallelism in network inference and training, to distributed deep learning.

Deep Learning Neural Architecture Search +1

Cannot find the paper you are looking for? You can Submit a new open access paper.