Search Results for author: Torsten Hoefler

Found 66 papers, 36 papers with code

SliceGPT: Compress Large Language Models by Deleting Rows and Columns

1 code implementation26 Jan 2024 Saleh Ashkboos, Maximilian L. Croci, Marcelo Gennari do Nascimento, Torsten Hoefler, James Hensman

Large language models have become the cornerstone of natural language processing, but their use comes with substantial costs in terms of compute and memory resources.

Topologies of Reasoning: Demystifying Chains, Trees, and Graphs of Thoughts

no code implementations25 Jan 2024 Maciej Besta, Florim Memedi, Zhenyu Zhang, Robert Gerstenberger, Nils Blach, Piotr Nyczyk, Marcin Copik, Grzegorz Kwaśniewski, Jürgen Müller, Lukas Gianinazzi, Ales Kubicek, Hubert Niewiadomski, Onur Mutlu, Torsten Hoefler

Among these, prompt engineering coupled with structures has emerged as a promising paradigm, with designs such as Chain-of-Thought, Tree of Thoughts, or Graph of Thoughts, in which the overall LLM reasoning is guided by a structure such as a graph.

Mathematical Reasoning Prompt Engineering

Swing: Short-cutting Rings for Higher Bandwidth Allreduce

no code implementations17 Jan 2024 Daniele De Sensi, Tommaso Bonato, David Saam, Torsten Hoefler

The allreduce collective operation accounts for a significant fraction of the runtime of workloads running on distributed systems.

DiffDA: a Diffusion model for weather-scale Data Assimilation

no code implementations11 Jan 2024 Langwen Huang, Lukas Gianinazzi, Yuejiang Yu, Peter D. Dueben, Torsten Hoefler

The experiments also show that the initial conditions assimilated from sparse observations (less than 0. 77% of gridded data) and 48-hour forecast can be used for forecast models with a loss of lead time of at most 24 hours compared to initial conditions from state-of-the-art data assimilation in ERA5.

Denoising Weather Forecasting

How to Prune Your Language Model: Recovering Accuracy on the "Sparsity May Cry'' Benchmark

no code implementations21 Dec 2023 Eldar Kurtic, Torsten Hoefler, Dan Alistarh

Pruning large language models (LLMs) from the BERT family has emerged as a standard compression benchmark, and several pruning methods have been proposed for this task.

Knowledge Distillation Language Modelling

HOT: Higher-Order Dynamic Graph Representation Learning with Efficient Transformers

no code implementations30 Nov 2023 Maciej Besta, Afonso Claudino Catarino, Lukas Gianinazzi, Nils Blach, Piotr Nyczyk, Hubert Niewiadomski, Torsten Hoefler

A fundamental workload in this setting is dynamic link prediction: using a history of graph updates to predict whether a given pair of vertices will become connected.

Dynamic Link Prediction Graph Representation Learning

Chameleon: a heterogeneous and disaggregated accelerator system for retrieval-augmented language models

no code implementations15 Oct 2023 Wenqi Jiang, Marco Zeller, Roger Waleffe, Torsten Hoefler, Gustavo Alonso

The heterogeneity ensures efficient acceleration of both LM inference and retrieval, while the accelerator disaggregation enables the system to independently scale both types of accelerators to fulfill diverse RALM requirements.

Language Modelling Retrieval +1

QUIK: Towards End-to-End 4-Bit Inference on Generative Large Language Models

1 code implementation13 Oct 2023 Saleh Ashkboos, Ilia Markov, Elias Frantar, Tingxuan Zhong, Xincheng Wang, Jie Ren, Torsten Hoefler, Dan Alistarh

We show, for the first time, that the majority of inference computations for large generative models such as LLaMA, OPT, and Falcon can be performed with both weights and activations being cast to 4 bits, in a way that leads to practical speedups, while at the same time maintaining good accuracy.

Computational Efficiency Quantization

Cached Operator Reordering: A Unified View for Fast GNN Training

no code implementations23 Aug 2023 Julia Bazinska, Andrei Ivanov, Tal Ben-Nun, Nikoli Dryden, Maciej Besta, Siyuan Shen, Torsten Hoefler

Graph Neural Networks (GNNs) are a powerful tool for handling structured graph data and addressing tasks such as node classification, graph classification, and clustering.

Graph Attention Graph Classification +1

Graph of Thoughts: Solving Elaborate Problems with Large Language Models

1 code implementation18 Aug 2023 Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Michal Podstawski, Lukas Gianinazzi, Joanna Gajda, Tomasz Lehmann, Hubert Niewiadomski, Piotr Nyczyk, Torsten Hoefler

We introduce Graph of Thoughts (GoT): a framework that advances prompting capabilities in large language models (LLMs) beyond those offered by paradigms such as Chain-of-Thought or Tree of Thoughts (ToT).

Co-design Hardware and Algorithm for Vector Search

1 code implementation19 Jun 2023 Wenqi Jiang, Shigang Li, Yu Zhu, Johannes De Fine Licht, Zhenhao He, Runbin Shi, Cedric Renggli, Shuai Zhang, Theodoros Rekatsinas, Torsten Hoefler, Gustavo Alonso

Vector search has emerged as the foundation for large-scale information retrieval and machine learning systems, with search engines like Google and Bing processing tens of thousands of queries per second on petabyte-scale document datasets by evaluating vector similarities between encoded query texts and web documents.

Information Retrieval Retrieval

ASDL: A Unified Interface for Gradient Preconditioning in PyTorch

2 code implementations8 May 2023 Kazuki Osawa, Satoki Ishikawa, Rio Yokota, Shigang Li, Torsten Hoefler

Gradient preconditioning is a key technique to integrate the second-order information into gradients for improving and extending gradient-based learning algorithms.

STen: Productive and Efficient Sparsity in PyTorch

no code implementations15 Apr 2023 Andrei Ivanov, Nikoli Dryden, Tal Ben-Nun, Saleh Ashkboos, Torsten Hoefler

As deep learning models grow, sparsity is becoming an increasingly critical component of deep neural networks, enabling improved performance and reduced storage.

Myths and Legends in High-Performance Computing

no code implementations6 Jan 2023 Satoshi Matsuoka, Jens Domke, Mohamed Wahib, Aleksandr Drozd, Torsten Hoefler

While some laws end, new directions are emerging, such as algorithmic scaling or novel architecture research.

Vocal Bursts Intensity Prediction

A Theory of I/O-Efficient Sparse Neural Network Inference

no code implementations3 Jan 2023 Niels Gleinig, Tal Ben-Nun, Torsten Hoefler

We can only process data that is stored in fast memory, which incurs data movement (input/output-operations, or I/Os) between the two units.

PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices

1 code implementation25 Nov 2022 Kazuki Osawa, Shigang Li, Torsten Hoefler

Pipeline parallelism enables efficient training of Large Language Models (LLMs) on large-scale distributed accelerator clusters.

Spatial Mixture-of-Experts

1 code implementation24 Nov 2022 Nikoli Dryden, Torsten Hoefler

Many data have an underlying dependence on spatial location; it may be weather on the Earth, a simulation on a mesh, or a registered image.

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

11 code implementations31 Oct 2022 Elias Frantar, Saleh Ashkboos, Torsten Hoefler, Dan Alistarh

In this paper, we address this challenge, and propose GPTQ, a new one-shot weight quantization method based on approximate second-order information, that is both highly-accurate and highly-efficient.

Language Modelling Model Compression +1

Compressing multidimensional weather and climate data into neural networks

1 code implementation22 Oct 2022 Langwen Huang, Torsten Hoefler

We propose a new method of compressing this multidimensional weather and climate data: a coordinate-based neural network is trained to overfit the data, and the resulting parameters are taken as a compact representation of the original grid-based data.

Neural Graph Databases

no code implementations20 Sep 2022 Maciej Besta, Patrick Iff, Florian Scheidl, Kazuki Osawa, Nikoli Dryden, Michal Podstawski, Tiancheng Chen, Torsten Hoefler

In general, LPG2vec enables combining predictive power of the most powerful GNNs with the full scope of information encoded in the LPG model, paving the way for neural graph databases, a class of systems where the vast complexity of maintained data will benefit from modern and future graph machine learning methods.

Efficient Quantized Sparse Matrix Operations on Tensor Cores

1 code implementation14 Sep 2022 Shigang Li, Kazuki Osawa, Torsten Hoefler

We propose Magicube, a high-performance sparse-matrix library for low-precision integers on Tensor cores.

Quantization

HammingMesh: A Network Topology for Large-Scale Deep Learning

no code implementations3 Sep 2022 Torsten Hoefler, Tommaso Bonato, Daniele De Sensi, Salvatore Di Girolamo, Shigang Li, Marco Heddes, Jon Belk, Deepak Goel, Miguel Castro, Steve Scott

Numerous microarchitectural optimizations unlocked tremendous processing power for deep neural networks that in turn fueled the AI revolution.

Scheduling

ENS-10: A Dataset For Post-Processing Ensemble Weather Forecasts

1 code implementation29 Jun 2022 Saleh Ashkboos, Langwen Huang, Nikoli Dryden, Tal Ben-Nun, Peter Dueben, Lukas Gianinazzi, Luca Kummer, Torsten Hoefler

We propose the ENS-10 prediction correction task for improving the forecast quality at a 48-hour lead time through ensemble post-processing.

Weather Forecasting

Parallel and Distributed Graph Neural Networks: An In-Depth Concurrency Analysis

no code implementations19 May 2022 Maciej Besta, Torsten Hoefler

To alleviate this, we first design a taxonomy of parallelism in GNNs, considering data and model parallelism, and different forms of pipelining.

Graph Classification Link Prediction +1

Near-Optimal Sparse Allreduce for Distributed Deep Learning

1 code implementation19 Jan 2022 Shigang Li, Torsten Hoefler

However, it is very challenging to obtain real performance improvement because of (1) the difficulty of achieving an scalable and efficient sparse allreduce algorithm and (2) the sparsification overhead.

A Data-Centric Optimization Framework for Machine Learning

1 code implementation20 Oct 2021 Oliver Rausch, Tal Ben-Nun, Nikoli Dryden, Andrei Ivanov, Shigang Li, Torsten Hoefler

Rapid progress in deep learning is leading to a diverse set of quickly changing models, with a dramatically growing demand for compute.

BIG-bench Machine Learning

Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines

1 code implementation14 Jul 2021 Shigang Li, Torsten Hoefler

For a GPT-2 model with 1. 3 billion parameters running on 2, 048 GPU nodes of the Piz Daint supercomputer, Chimera improves the training throughput by 1. 16x-2. 34x over the state-of-the-art synchronous and asynchronous pipeline approaches.

Scheduling

Learning Combinatorial Node Labeling Algorithms

no code implementations7 Jun 2021 Lukas Gianinazzi, Maximilian Fries, Nikoli Dryden, Tal Ben-Nun, Maciej Besta, Torsten Hoefler

We present a novel neural architecture to solve graph optimization problems where the solution consists of arbitrary node labels, allowing us to solve hard problems like graph coloring.

BIG-bench Machine Learning Graph Attention +1

Motif Prediction with Graph Neural Networks

no code implementations26 May 2021 Maciej Besta, Raphael Grob, Cesare Miglioli, Nicola Bernold, Grzegorz Kwasniewski, Gabriel Gjini, Raghavendra Kanakagiri, Saleh Ashkboos, Lukas Gianinazzi, Nikoli Dryden, Torsten Hoefler

We also successfully apply our architecture for predicting more arbitrary clusters and communities, illustrating its potential for graph mining beyond motif analysis.

Graph Mining Link Prediction

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

no code implementations31 Jan 2021 Torsten Hoefler, Dan Alistarh, Tal Ben-Nun, Nikoli Dryden, Alexandra Peste

The growing energy and performance costs of deep learning have driven the community to reduce the size of neural networks by selectively pruning components.

Clairvoyant Prefetching for Distributed Machine Learning I/O

no code implementations21 Jan 2021 Nikoli Dryden, Roman Böhringer, Tal Ben-Nun, Torsten Hoefler

I/O is emerging as a major bottleneck for machine learning training, especially in distributed environments.

BIG-bench Machine Learning

MixSize: Training Convnets With Mixed Image Sizes for Improved Accuracy, Speed and Scale Resiliency

2 code implementations1 Jan 2021 Elad Hoffer, Berry Weinstein, Itay Hubara, Tal Ben-Nun, Torsten Hoefler, Daniel Soudry

Although trained on images of a specific size, it is well established that CNNs can be used to evaluate a wide range of image sizes at test time, by adjusting the size of intermediate feature maps.

Extracting Clean Performance Models from Tainted Programs

2 code implementations31 Dec 2020 Marcin Copik, Alexandru Calotoiu, Tobias Grosser, Nicolas Wicki, Felix Wolf, Torsten Hoefler

Performance models are well-known instruments to understand the scaling behavior of parallel applications.

Distributed, Parallel, and Cluster Computing Performance

Deep Data Flow Analysis

no code implementations21 Nov 2020 Chris Cummins, Hugh Leather, Zacharias Fisches, Tal Ben-Nun, Torsten Hoefler, Michael O'Boyle

Compiler architects increasingly look to machine learning when building heuristics for compiler optimization.

BIG-bench Machine Learning Compiler Optimization

Log(Graph): A Near-Optimal High-Performance Graph Representation

no code implementations29 Oct 2020 Maciej Besta, Dimitri Stanojevic, Tijana Zivic, Jagpreet Singh, Maurice Hoerold, Torsten Hoefler

Our high-performance Log(Graph) implementation based on modern bitwise operations and state-of-the-art succinct data structures achieves high compression ratios as well as performance.

Vocal Bursts Intensity Prediction

Neural Parameter Allocation Search

1 code implementation ICLR 2022 Bryan A. Plummer, Nikoli Dryden, Julius Frost, Torsten Hoefler, Kate Saenko

We introduce Neural Parameter Allocation Search (NPAS), a novel task where the goal is to train a neural network given an arbitrary, fixed parameter budget.

Image Classification Phrase Grounding

Deep Learning for Post-Processing Ensemble Weather Forecasts

1 code implementation18 May 2020 Peter Grönquist, Chengyuan Yao, Tal Ben-Nun, Nikoli Dryden, Peter Dueben, Shigang Li, Torsten Hoefler

Applied to global data, our mixed models achieve a relative improvement in ensemble forecast skill (CRPS) of over 14%.

Weather Forecasting

ProGraML: Graph-based Deep Learning for Program Optimization and Analysis

2 code implementations23 Mar 2020 Chris Cummins, Zacharias V. Fisches, Tal Ben-Nun, Torsten Hoefler, Hugh Leather

We introduce ProGraML - Program Graphs for Machine Learning - a novel graph-based program representation using a low level, language agnostic, and portable format; and machine learning models capable of performing complex downstream tasks over these graphs.

BIG-bench Machine Learning

Practice of Streaming Processing of Dynamic Graphs: Concepts, Models, and Systems

no code implementations29 Dec 2019 Maciej Besta, Marc Fischer, Vasiliki Kalavri, Michael Kapralov, Torsten Hoefler

We also crystallize the meaning of different concepts associated with streaming graph processing, such as dynamic, temporal, online, and time-evolving graphs, edge-centric processing, models for the maintenance of updates, and graph databases.

Distributed, Parallel, and Cluster Computing Databases Data Structures and Algorithms Performance

Predicting Weather Uncertainty with Deep Convnets

no code implementations2 Nov 2019 Peter Grönquist, Tal Ben-Nun, Nikoli Dryden, Peter Dueben, Luca Lavarini, Shigang Li, Torsten Hoefler

Modern weather forecast models perform uncertainty quantification using ensemble prediction systems, which collect nonparametric statistics based on multiple perturbed simulations.

Uncertainty Quantification Weather Forecasting

hlslib: Software Engineering for Hardware Design

2 code implementations10 Oct 2019 Johannes de Fine Licht, Torsten Hoefler

High-level synthesis (HLS) tools have brought FPGA development into the mainstream, by allowing programmers to design architectures using familiar languages such as C, C++, and OpenCL.

Hardware Architecture Distributed, Parallel, and Cluster Computing Software Engineering

Red-blue pebbling revisited: near optimal parallel matrix-matrix multiplication

1 code implementation26 Aug 2019 Grzegorz Kwasniewski, Marko Kabić, Maciej Besta, Joost VandeVondele, Raffaele Solcà, Torsten Hoefler

The key idea behind COSMA is to derive an optimal (up to a factor of 0. 03\% for 10MB of fast memory) sequential schedule and then parallelize it, preserving I/O optimality.

Computational Complexity Distributed, Parallel, and Cluster Computing Performance

Mix & Match: training convnets with mixed image sizes for improved accuracy, speed and scale resiliency

2 code implementations12 Aug 2019 Elad Hoffer, Berry Weinstein, Itay Hubara, Tal Ben-Nun, Torsten Hoefler, Daniel Soudry

Although trained on images of aspecific size, it is well established that CNNs can be used to evaluate a wide range of image sizes at test time, by adjusting the size of intermediate feature maps.

Taming Unbalanced Training Workloads in Deep Learning with Partial Collective Operations

no code implementations12 Aug 2019 Shigang Li, Tal Ben-Nun, Salvatore Di Girolamo, Dan Alistarh, Torsten Hoefler

Load imbalance pervasively exists in distributed deep learning training systems, either caused by the inherent imbalance in learned tasks or by the system itself.

FBLAS: Streaming Linear Algebra on FPGA

1 code implementation18 Jul 2019 Tiziano De Matteis, Johannes De Fine Licht, Torsten Hoefler

Spatial computing architectures pose an attractive alternative to mitigate control and data movement overheads typical of load-store architectures.

Distributed, Parallel, and Cluster Computing

Stateful Dataflow Multigraphs: A Data-Centric Model for High-Performance Parallel Programs

3 code implementations27 Feb 2019 Tal Ben-Nun, Johannes De Fine Licht, Alexandros Nikolaos Ziogas, Timo Schneider, Torsten Hoefler

With the ubiquity of accelerators, such as FPGAs and GPUs, the complexity of high-performance programming is increasing beyond the skill-set of the average scientist in domains outside of computer science.

Programming Languages Distributed, Parallel, and Cluster Computing Performance

Graph Processing on FPGAs: Taxonomy, Survey, Challenges

no code implementations25 Feb 2019 Maciej Besta, Dimitri Stanojevic, Johannes De Fine Licht, Tal Ben-Nun, Torsten Hoefler

To facilitate understanding of this emerging domain, we present the first survey and taxonomy on graph computations on FPGAs.

Distributed, Parallel, and Cluster Computing Hardware Architecture

A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep Learning

1 code implementation29 Jan 2019 Tal Ben-Nun, Maciej Besta, Simon Huber, Alexandros Nikolaos Ziogas, Daniel Peter, Torsten Hoefler

We introduce Deep500: the first customizable benchmarking infrastructure that enables fair comparison of the plethora of deep learning frameworks, algorithms, libraries, and techniques.

Benchmarking Vocal Bursts Intensity Prediction

Augment your batch: better training with larger batches

1 code implementation27 Jan 2019 Elad Hoffer, Tal Ben-Nun, Itay Hubara, Niv Giladi, Torsten Hoefler, Daniel Soudry

We analyze the effect of batch augmentation on gradient variance and show that it empirically improves convergence for a wide variety of deep neural networks and datasets.

The Convergence of Sparsified Gradient Methods

no code implementations NeurIPS 2018 Dan Alistarh, Torsten Hoefler, Mikael Johansson, Sarit Khirirat, Nikola Konstantinov, Cédric Renggli

Distributed training of massive machine learning models, in particular deep neural networks, via Stochastic Gradient Descent (SGD) is becoming commonplace.

Quantization

Neural Code Comprehension: A Learnable Representation of Code Semantics

1 code implementation NeurIPS 2018 Tal Ben-Nun, Alice Shoshana Jakobovits, Torsten Hoefler

In this paper, we propose a novel processing technique to learn code semantics, and apply it to a variety of program analysis tasks.

Clustering

Transformations of High-Level Synthesis Codes for High-Performance Computing

2 code implementations21 May 2018 Johannes de Fine Licht, Simon Meierhans, Torsten Hoefler

Specialized hardware architectures promise a major step in performance and energy efficiency over the traditional load/store devices currently employed in large scale computing systems.

Distributed, Parallel, and Cluster Computing Programming Languages I.1.3; C.1.4; D.1.3

μ-cuDNN: Accelerating Deep Learning Frameworks with Micro-Batching

1 code implementation13 Apr 2018 Yosuke Oyama, Tal Ben-Nun, Torsten Hoefler, Satoshi Matsuoka

NVIDIA cuDNN is a low-level library that provides GPU kernels frequently used in deep learning.

Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis

1 code implementation26 Feb 2018 Tal Ben-Nun, Torsten Hoefler

We then review and model the different types of concurrency in DNNs: from the single operator, through parallelism in network inference and training, to distributed deep learning.

Neural Architecture Search Stochastic Optimization

Scaling betweenness centrality using communication-efficient sparse matrix multiplication

2 code implementations22 Sep 2016 Edgar Solomonik, Maciej Besta, Flavio Vella, Torsten Hoefler

Betweenness centrality (BC) is a crucial graph problem that measures the significance of a vertex by the number of shortest paths leading through it.

Distributed, Parallel, and Cluster Computing Discrete Mathematics Mathematical Software G.1.0; G.2.2

AllConcur: Leaderless Concurrent Atomic Broadcast (Extended Version)

1 code implementation20 Aug 2016 Marius Poke, Torsten Hoefler, Colin W. Glass

In this work, we propose AllConcur, a distributed system that provides agreement through a leaderless concurrent atomic broadcast algorithm, thus, not suffering from the bottleneck of a central coordinator.

Distributed, Parallel, and Cluster Computing

Sparse Tensor Algebra as a Parallel Programming Model

2 code implementations30 Nov 2015 Edgar Solomonik, Torsten Hoefler

Dense and sparse tensors allow the representation of most bulk data structures in computational science applications.

Mathematical Software

Cannot find the paper you are looking for? You can Submit a new open access paper.