1 code implementation • 20 Jan 2025 • Maciej Besta, Julia Barth, Eric Schreiber, Ales Kubicek, Afonso Catarino, Robert Gerstenberger, Piotr Nyczyk, Patrick Iff, Yueling Li, Sam Houliston, Tomasz Sternal, Marcin Copik, Grzegorz Kwaśniewski, Jürgen Müller, Łukasz Flis, Hannes Eberhard, Hubert Niewiadomski, Torsten Hoefler
Reasoning language models (RLMs), also known as Large Reasoning Models (LRMs), such as OpenAI's o1 and o3, DeepSeek-V3, and Alibaba's QwQ, have redefined AI's problem-solving capabilities by extending LLMs with advanced reasoning mechanisms.
no code implementations • 5 Jan 2025 • Saleh Ashkboos, Mahdi Nikdan, Soroush Tabesh, Roberto L. Castro, Torsten Hoefler, Dan Alistarh
Our approach ensures that all large matrix multiplications during the forward and backward passes are executed in lower precision.
no code implementations • 17 Nov 2024 • Saleh Ashkboos, Bram Verhoef, Torsten Hoefler, Evangelos Eleftheriou, Martino Dazzi
We address these challenges by proposing EfQAT, which generalizes both schemes by optimizing only a subset of the parameters of a quantized model.
1 code implementation • 17 Oct 2024 • Patrik Okanovic, Andreas Kirsch, Jannes Kasper, Torsten Hoefler, Andreas Krause, Nezihe Merve Gürel
We introduce MODEL SELECTOR, a framework for label-efficient selection of pretrained classifiers.
no code implementations • 8 Oct 2024 • Marcin Chrapek, Anjo Vahldiek-Oberwagner, Marcin Spoczynski, Scott Constable, Mona Vij, Torsten Hoefler
To our knowledge, our work is the first to show the practicality of TEEs for securing FMs.
no code implementations • 26 Aug 2024 • Daniele De Sensi, Lorenzo Pichetti, Flavio Vella, Tiziano De Matteis, Zebin Ren, Luigi Fusco, Matteo Turisini, Daniele Cesarini, Kurt Lust, Animesh Trivedi, Duncan Roweth, Filippo Spiga, Salvatore Di Girolamo, Torsten Hoefler
Multi-GPU nodes are increasingly common in the rapidly evolving landscape of exascale supercomputers.
no code implementations • 22 Aug 2024 • Maciej Besta, Robert Gerstenberger, Patrick Iff, Pournima Sonawane, Juan Gómez Luna, Raghavendra Kanakagiri, Rui Min, Grzegorz Kwaśniewski, Onur Mutlu, Torsten Hoefler, Raja Appuswamy, Aidan O Mahony
Knowledge graphs (KGs) have achieved significant attention in recent years, particularly in the area of the Semantic Web as well as gaining popularity in other application domains such as data mining and search engines.
2 code implementations • 21 Aug 2024 • Elias Frantar, Roberto L. Castro, Jiale Chen, Torsten Hoefler, Dan Alistarh
As inference on Large Language Models (LLMs) emerges as an important workload in machine learning applications, weight quantization has become a standard technique for efficient GPU deployment.
no code implementations • 18 Jun 2024 • Maciej Besta, Florian Scheidl, Lukas Gianinazzi, Grzegorz Kwasniewski, Shachar Klaiman, Jürgen Müller, Torsten Hoefler
Then, we use our taxonomy to analyze and compare the available HOGNN models.
2 code implementations • 7 Jun 2024 • Maciej Besta, Ales Kubicek, Roman Niggli, Robert Gerstenberger, Lucas Weitzendorf, Mingyuan Chi, Patrick Iff, Joanna Gajda, Piotr Nyczyk, Jürgen Müller, Hubert Niewiadomski, Marcin Chrapek, Michał Podstawski, Torsten Hoefler
Retrieval Augmented Generation (RAG) enhances the abilities of Large Language Models (LLMs) by enabling the retrieval of documents into the LLM context to provide more accurate and relevant responses.
1 code implementation • 4 Jun 2024 • Maciej Besta, Lorenzo Paleari, Ales Kubicek, Piotr Nyczyk, Robert Gerstenberger, Patrick Iff, Tomasz Lehmann, Hubert Niewiadomski, Torsten Hoefler
Large Language Models (LLMs) are revolutionizing various domains, yet verifying their answers remains a significant challenge, especially for intricate open-ended tasks such as consolidation, summarization, and extraction of knowledge.
3 code implementations • 30 Mar 2024 • Saleh Ashkboos, Amirkeivan Mohtashami, Maximilian L. Croci, Bo Li, Pashmina Cameron, Martin Jaggi, Dan Alistarh, Torsten Hoefler, James Hensman
We introduce QuaRot, a new Quantization scheme based on Rotations, which is able to quantize LLMs end-to-end, including all weights, activations, and KV cache in 4 bits.
1 code implementation • 26 Jan 2024 • Saleh Ashkboos, Maximilian L. Croci, Marcelo Gennari do Nascimento, Torsten Hoefler, James Hensman
Large language models have become the cornerstone of natural language processing, but their use comes with substantial costs in terms of compute and memory resources.
no code implementations • 25 Jan 2024 • Maciej Besta, Florim Memedi, Zhenyu Zhang, Robert Gerstenberger, Guangyuan Piao, Nils Blach, Piotr Nyczyk, Marcin Copik, Grzegorz Kwaśniewski, Jürgen Müller, Lukas Gianinazzi, Ales Kubicek, Hubert Niewiadomski, Aidan O'Mahony, Onur Mutlu, Torsten Hoefler
Among these, prompt engineering coupled with structures has emerged as a promising paradigm, with designs such as Chain-of-Thought, Tree of Thoughts, or Graph of Thoughts, in which the overall LLM reasoning is guided by a structure such as a graph.
no code implementations • 17 Jan 2024 • Daniele De Sensi, Tommaso Bonato, David Saam, Torsten Hoefler
The allreduce collective operation accounts for a significant fraction of the runtime of workloads running on distributed systems.
1 code implementation • 11 Jan 2024 • Langwen Huang, Lukas Gianinazzi, Yuejiang Yu, Peter D. Dueben, Torsten Hoefler
The experiments also show that the initial conditions assimilated from sparse observations (less than 0. 96% of gridded data) and 48-hour forecast can be used for forecast models with a loss of lead time of at most 24 hours compared to initial conditions from state-of-the-art data assimilation in ERA5.
no code implementations • 21 Dec 2023 • Eldar Kurtic, Torsten Hoefler, Dan Alistarh
Pruning large language models (LLMs) from the BERT family has emerged as a standard compression benchmark, and several pruning methods have been proposed for this task.
no code implementations • 30 Nov 2023 • Maciej Besta, Afonso Claudino Catarino, Lukas Gianinazzi, Nils Blach, Piotr Nyczyk, Hubert Niewiadomski, Torsten Hoefler
A fundamental workload in this setting is dynamic link prediction: using a history of graph updates to predict whether a given pair of vertices will become connected.
no code implementations • 15 Oct 2023 • Wenqi Jiang, Marco Zeller, Roger Waleffe, Torsten Hoefler, Gustavo Alonso
The heterogeneity ensures efficient acceleration of both LM inference and retrieval, while the accelerator disaggregation enables the system to independently scale both types of accelerators to fulfill diverse RALM requirements.
1 code implementation • 13 Oct 2023 • Saleh Ashkboos, Ilia Markov, Elias Frantar, Tingxuan Zhong, Xincheng Wang, Jie Ren, Torsten Hoefler, Dan Alistarh
We show, for the first time, that the majority of inference computations for large generative models such as LLaMA, OPT, and Falcon can be performed with both weights and activations being cast to 4 bits, in a way that leads to practical speedups, while at the same time maintaining good accuracy.
1 code implementation • 3 Oct 2023 • Roberto L. Castro, Andrei Ivanov, Diego Andrade, Tal Ben-Nun, Basilio B. Fraguela, Torsten Hoefler
We present the V:N:M format, which enables the execution of arbitrary N:M ratios on SPTCs.
no code implementations • 16 Sep 2023 • Torsten Hoefler, Bjorn Stevens, Andreas F. Prein, Johanna Baehr, Thomas Schulthess, Thomas F. Stocker, John Taylor, Daniel Klocke, Pekka Manninen, Piers M. Forster, Tobias Kölling, Nicolas Gruber, Hartwig Anzt, Claudia Frauen, Florian Ziemen, Milan Klöwer, Karthik Kashinath, Christoph Schär, Oliver Fuhrer, Bryan N. Lawrence
Participants of the Berlin Summit on Earth Virtualization Engines (EVEs) discussed ideas and concepts to improve our ability to cope with climate change.
no code implementations • 23 Aug 2023 • Julia Bazinska, Andrei Ivanov, Tal Ben-Nun, Nikoli Dryden, Maciej Besta, Siyuan Shen, Torsten Hoefler
Graph Neural Networks (GNNs) are a powerful tool for handling structured graph data and addressing tasks such as node classification, graph classification, and clustering.
2 code implementations • 18 Aug 2023 • Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Michal Podstawski, Lukas Gianinazzi, Joanna Gajda, Tomasz Lehmann, Hubert Niewiadomski, Piotr Nyczyk, Torsten Hoefler
We introduce Graph of Thoughts (GoT): a framework that advances prompting capabilities in large language models (LLMs) beyond those offered by paradigms such as Chain-of-Thought or Tree of Thoughts (ToT).
no code implementations • ICCV 2023 • Yunqiang Li, Jan C. van Gemert, Torsten Hoefler, Bert Moons, Evangelos Eleftheriou, Bram-Ernst Verhoef
Deep learning algorithms are increasingly employed at the edge.
1 code implementation • 19 Jun 2023 • Wenqi Jiang, Shigang Li, Yu Zhu, Johannes De Fine Licht, Zhenhao He, Runbin Shi, Cedric Renggli, Shuai Zhang, Theodoros Rekatsinas, Torsten Hoefler, Gustavo Alonso
Vector search has emerged as the foundation for large-scale information retrieval and machine learning systems, with search engines like Google and Bing processing tens of thousands of queries per second on petabyte-scale document datasets by evaluating vector similarities between encoded query texts and web documents.
1 code implementation • 5 Jun 2023 • Tim Dettmers, Ruslan Svirschevski, Vage Egiazarian, Denis Kuznedelev, Elias Frantar, Saleh Ashkboos, Alexander Borzunov, Torsten Hoefler, Dan Alistarh
Recent advances in large language model (LLM) pretraining have led to high-quality LLMs with impressive abilities.
2 code implementations • 8 May 2023 • Kazuki Osawa, Satoki Ishikawa, Rio Yokota, Shigang Li, Torsten Hoefler
Gradient preconditioning is a key technique to integrate the second-order information into gradients for improving and extending gradient-based learning algorithms.
no code implementations • 15 Apr 2023 • Andrei Ivanov, Nikoli Dryden, Tal Ben-Nun, Saleh Ashkboos, Torsten Hoefler
As deep learning models grow, sparsity is becoming an increasingly critical component of deep neural networks, enabling improved performance and reduced storage.
no code implementations • 14 Mar 2023 • Lukas Trümper, Tal Ben-Nun, Philipp Schaad, Alexandru Calotoiu, Torsten Hoefler
Performance optimization is an increasingly challenging but often repetitive task.
no code implementations • 6 Jan 2023 • Satoshi Matsuoka, Jens Domke, Mohamed Wahib, Aleksandr Drozd, Torsten Hoefler
While some laws end, new directions are emerging, such as algorithmic scaling or novel architecture research.
no code implementations • 3 Jan 2023 • Niels Gleinig, Tal Ben-Nun, Torsten Hoefler
We can only process data that is stored in fast memory, which incurs data movement (input/output-operations, or I/Os) between the two units.
1 code implementation • 25 Nov 2022 • Kazuki Osawa, Shigang Li, Torsten Hoefler
Pipeline parallelism enables efficient training of Large Language Models (LLMs) on large-scale distributed accelerator clusters.
1 code implementation • 24 Nov 2022 • Nikoli Dryden, Torsten Hoefler
Many data have an underlying dependence on spatial location; it may be weather on the Earth, a simulation on a mesh, or a registered image.
16 code implementations • 31 Oct 2022 • Elias Frantar, Saleh Ashkboos, Torsten Hoefler, Dan Alistarh
In this paper, we address this challenge, and propose GPTQ, a new one-shot weight quantization method based on approximate second-order information, that is both highly-accurate and highly-efficient.
1 code implementation • 22 Oct 2022 • Langwen Huang, Torsten Hoefler
We propose a new method of compressing this multidimensional weather and climate data: a coordinate-based neural network is trained to overfit the data, and the resulting parameters are taken as a compact representation of the original grid-based data.
no code implementations • 20 Sep 2022 • Maciej Besta, Patrick Iff, Florian Scheidl, Kazuki Osawa, Nikoli Dryden, Michal Podstawski, Tiancheng Chen, Torsten Hoefler
In general, LPG2vec enables combining predictive power of the most powerful GNNs with the full scope of information encoded in the LPG model, paving the way for neural graph databases, a class of systems where the vast complexity of maintained data will benefit from modern and future graph machine learning methods.
1 code implementation • 14 Sep 2022 • Shigang Li, Kazuki Osawa, Torsten Hoefler
We propose Magicube, a high-performance sparse-matrix library for low-precision integers on Tensor cores.
no code implementations • 3 Sep 2022 • Torsten Hoefler, Tommaso Bonato, Daniele De Sensi, Salvatore Di Girolamo, Shigang Li, Marco Heddes, Jon Belk, Deepak Goel, Miguel Castro, Steve Scott
Numerous microarchitectural optimizations unlocked tremendous processing power for deep neural networks that in turn fueled the AI revolution.
1 code implementation • 29 Jun 2022 • Saleh Ashkboos, Langwen Huang, Nikoli Dryden, Tal Ben-Nun, Peter Dueben, Lukas Gianinazzi, Luca Kummer, Torsten Hoefler
We propose the ENS-10 prediction correction task for improving the forecast quality at a 48-hour lead time through ensemble post-processing.
no code implementations • 19 May 2022 • Maciej Besta, Torsten Hoefler
To alleviate this, we first design a taxonomy of parallelism in GNNs, considering data and model parallelism, and different forms of pipelining.
1 code implementation • 19 Jan 2022 • Shigang Li, Torsten Hoefler
However, it is very challenging to obtain real performance improvement because of (1) the difficulty of achieving an scalable and efficient sparse allreduce algorithm and (2) the sparsification overhead.
1 code implementation • 20 Oct 2021 • Oliver Rausch, Tal Ben-Nun, Nikoli Dryden, Andrei Ivanov, Shigang Li, Torsten Hoefler
Rapid progress in deep learning is leading to a diverse set of quickly changing models, with a dramatically growing demand for compute.
1 code implementation • 14 Jul 2021 • Shigang Li, Torsten Hoefler
For a GPT-2 model with 1. 3 billion parameters running on 2, 048 GPU nodes of the Piz Daint supercomputer, Chimera improves the training throughput by 1. 16x-2. 34x over the state-of-the-art synchronous and asynchronous pipeline approaches.
no code implementations • 7 Jun 2021 • Lukas Gianinazzi, Maximilian Fries, Nikoli Dryden, Tal Ben-Nun, Maciej Besta, Torsten Hoefler
We present a novel neural architecture to solve graph optimization problems where the solution consists of arbitrary node labels, allowing us to solve hard problems like graph coloring.
no code implementations • 26 May 2021 • Maciej Besta, Raphael Grob, Cesare Miglioli, Nicola Bernold, Grzegorz Kwasniewski, Gabriel Gjini, Raghavendra Kanakagiri, Saleh Ashkboos, Lukas Gianinazzi, Nikoli Dryden, Torsten Hoefler
We also successfully apply our architecture for predicting more arbitrary clusters and communities, illustrating its potential for graph mining beyond motif analysis.
no code implementations • 5 Mar 2021 • Maciej Besta, Zur Vonarburg-Shmaria, Yannick Schaffner, Leonardo Schwarz, Grzegorz Kwasniewski, Lukas Gianinazzi, Jakub Beranek, Kacper Janda, Tobias Holenstein, Sebastian Leisinger, Peter Tatkowski, Esref Ozdemir, Adrian Balla, Marcin Copik, Philipp Lindenberger, Pavel Kalvoda, Marek Konieczny, Onur Mutlu, Torsten Hoefler
We propose GraphMineSuite (GMS): the first benchmarking suite for graph mining that facilitates evaluating and constructing high-performance graph mining algorithms.
no code implementations • 31 Jan 2021 • Torsten Hoefler, Dan Alistarh, Tal Ben-Nun, Nikoli Dryden, Alexandra Peste
The growing energy and performance costs of deep learning have driven the community to reduce the size of neural networks by selectively pruning components.
no code implementations • 21 Jan 2021 • Nikoli Dryden, Roman Böhringer, Tal Ben-Nun, Torsten Hoefler
I/O is emerging as a major bottleneck for machine learning training, especially in distributed environments.
2 code implementations • 1 Jan 2021 • Elad Hoffer, Berry Weinstein, Itay Hubara, Tal Ben-Nun, Torsten Hoefler, Daniel Soudry
Although trained on images of a specific size, it is well established that CNNs can be used to evaluate a wide range of image sizes at test time, by adjusting the size of intermediate feature maps.
2 code implementations • 31 Dec 2020 • Marcin Copik, Alexandru Calotoiu, Tobias Grosser, Nicolas Wicki, Felix Wolf, Torsten Hoefler
Performance models are well-known instruments to understand the scaling behavior of parallel applications.
Distributed, Parallel, and Cluster Computing Performance
no code implementations • 21 Nov 2020 • Chris Cummins, Hugh Leather, Zacharias Fisches, Tal Ben-Nun, Torsten Hoefler, Michael O'Boyle
Compiler architects increasingly look to machine learning when building heuristics for compiler optimization.
no code implementations • 29 Oct 2020 • Maciej Besta, Dimitri Stanojevic, Tijana Zivic, Jagpreet Singh, Maurice Hoerold, Torsten Hoefler
Our high-performance Log(Graph) implementation based on modern bitwise operations and state-of-the-art succinct data structures achieves high compression ratios as well as performance.
1 code implementation • 30 Jun 2020 • Andrei Ivanov, Nikoli Dryden, Tal Ben-Nun, Shigang Li, Torsten Hoefler
Transformers are one of the most important machine learning workloads today.
1 code implementation • ICLR 2022 • Bryan A. Plummer, Nikoli Dryden, Julius Frost, Torsten Hoefler, Kate Saenko
We introduce Neural Parameter Allocation Search (NPAS), a novel task where the goal is to train a neural network given an arbitrary, fixed parameter budget.
1 code implementation • 18 May 2020 • Peter Grönquist, Chengyuan Yao, Tal Ben-Nun, Nikoli Dryden, Peter Dueben, Shigang Li, Torsten Hoefler
Applied to global data, our mixed models achieve a relative improvement in ensemble forecast skill (CRPS) of over 14%.
no code implementations • 30 Apr 2020 • Shigang Li, Tal Ben-Nun, Giorgi Nadiradze, Salvatore Di Girolamo, Nikoli Dryden, Dan Alistarh, Torsten Hoefler
For evaluation, we train ResNet-50 on ImageNet; Transformer for machine translation; and deep reinforcement learning for navigation at scale.
2 code implementations • 23 Mar 2020 • Chris Cummins, Zacharias V. Fisches, Tal Ben-Nun, Torsten Hoefler, Hugh Leather
We introduce ProGraML - Program Graphs for Machine Learning - a novel graph-based program representation using a low level, language agnostic, and portable format; and machine learning models capable of performing complex downstream tasks over these graphs.
no code implementations • 29 Dec 2019 • Maciej Besta, Marc Fischer, Vasiliki Kalavri, Michael Kapralov, Torsten Hoefler
We also crystallize the meaning of different concepts associated with streaming graph processing, such as dynamic, temporal, online, and time-evolving graphs, edge-centric processing, models for the maintenance of updates, and graph databases.
Distributed, Parallel, and Cluster Computing Databases Data Structures and Algorithms Performance
no code implementations • 2 Nov 2019 • Peter Grönquist, Tal Ben-Nun, Nikoli Dryden, Peter Dueben, Luca Lavarini, Shigang Li, Torsten Hoefler
Modern weather forecast models perform uncertainty quantification using ensemble prediction systems, which collect nonparametric statistics based on multiple perturbed simulations.
2 code implementations • 10 Oct 2019 • Johannes de Fine Licht, Torsten Hoefler
High-level synthesis (HLS) tools have brought FPGA development into the mainstream, by allowing programmers to design architectures using familiar languages such as C, C++, and OpenCL.
Hardware Architecture Distributed, Parallel, and Cluster Computing Software Engineering
1 code implementation • 26 Aug 2019 • Grzegorz Kwasniewski, Marko Kabić, Maciej Besta, Joost VandeVondele, Raffaele Solcà, Torsten Hoefler
The key idea behind COSMA is to derive an optimal (up to a factor of 0. 03\% for 10MB of fast memory) sequential schedule and then parallelize it, preserving I/O optimality.
Computational Complexity Distributed, Parallel, and Cluster Computing Performance
2 code implementations • 12 Aug 2019 • Elad Hoffer, Berry Weinstein, Itay Hubara, Tal Ben-Nun, Torsten Hoefler, Daniel Soudry
Although trained on images of aspecific size, it is well established that CNNs can be used to evaluate a wide range of image sizes at test time, by adjusting the size of intermediate feature maps.
no code implementations • 12 Aug 2019 • Shigang Li, Tal Ben-Nun, Salvatore Di Girolamo, Dan Alistarh, Torsten Hoefler
Load imbalance pervasively exists in distributed deep learning training systems, either caused by the inherent imbalance in learned tasks or by the system itself.
1 code implementation • 18 Jul 2019 • Tiziano De Matteis, Johannes De Fine Licht, Torsten Hoefler
Spatial computing architectures pose an attractive alternative to mitigate control and data movement overheads typical of load-store architectures.
Distributed, Parallel, and Cluster Computing
3 code implementations • 27 Feb 2019 • Tal Ben-Nun, Johannes De Fine Licht, Alexandros Nikolaos Ziogas, Timo Schneider, Torsten Hoefler
With the ubiquity of accelerators, such as FPGAs and GPUs, the complexity of high-performance programming is increasing beyond the skill-set of the average scientist in domains outside of computer science.
Programming Languages Distributed, Parallel, and Cluster Computing Performance
no code implementations • 25 Feb 2019 • Maciej Besta, Dimitri Stanojevic, Johannes De Fine Licht, Tal Ben-Nun, Torsten Hoefler
To facilitate understanding of this emerging domain, we present the first survey and taxonomy on graph computations on FPGAs.
Distributed, Parallel, and Cluster Computing Hardware Architecture
1 code implementation • 29 Jan 2019 • Tal Ben-Nun, Maciej Besta, Simon Huber, Alexandros Nikolaos Ziogas, Daniel Peter, Torsten Hoefler
We introduce Deep500: the first customizable benchmarking infrastructure that enables fair comparison of the plethora of deep learning frameworks, algorithms, libraries, and techniques.
1 code implementation • 27 Jan 2019 • Elad Hoffer, Tal Ben-Nun, Itay Hubara, Niv Giladi, Torsten Hoefler, Daniel Soudry
We analyze the effect of batch augmentation on gradient variance and show that it empirically improves convergence for a wide variety of deep neural networks and datasets.
no code implementations • NeurIPS 2018 • Dan Alistarh, Torsten Hoefler, Mikael Johansson, Sarit Khirirat, Nikola Konstantinov, Cédric Renggli
Distributed training of massive machine learning models, in particular deep neural networks, via Stochastic Gradient Descent (SGD) is becoming commonplace.
1 code implementation • NeurIPS 2018 • Tal Ben-Nun, Alice Shoshana Jakobovits, Torsten Hoefler
In this paper, we propose a novel processing technique to learn code semantics, and apply it to a variety of program analysis tasks.
2 code implementations • 21 May 2018 • Johannes de Fine Licht, Simon Meierhans, Torsten Hoefler
Specialized hardware architectures promise a major step in performance and energy efficiency over the traditional load/store devices currently employed in large scale computing systems.
Distributed, Parallel, and Cluster Computing Programming Languages I.1.3; C.1.4; D.1.3
1 code implementation • 13 Apr 2018 • Yosuke Oyama, Tal Ben-Nun, Torsten Hoefler, Satoshi Matsuoka
NVIDIA cuDNN is a low-level library that provides GPU kernels frequently used in deep learning.
1 code implementation • 26 Feb 2018 • Tal Ben-Nun, Torsten Hoefler
We then review and model the different types of concurrency in DNNs: from the single operator, through parallelism in network inference and training, to distributed deep learning.
no code implementations • 22 Feb 2018 • Cedric Renggli, Saleh Ashkboos, Mehdi Aghagolzadeh, Dan Alistarh, Torsten Hoefler
This allreduce is the single communication and thus scalability bottleneck for most machine learning workloads.
3 code implementations • 22 Sep 2016 • Edgar Solomonik, Maciej Besta, Flavio Vella, Torsten Hoefler
Betweenness centrality (BC) is a crucial graph problem that measures the significance of a vertex by the number of shortest paths leading through it.
Distributed, Parallel, and Cluster Computing Discrete Mathematics Mathematical Software G.1.0; G.2.2
1 code implementation • 20 Aug 2016 • Marius Poke, Torsten Hoefler, Colin W. Glass
In this work, we propose AllConcur, a distributed system that provides agreement through a leaderless concurrent atomic broadcast algorithm, thus, not suffering from the bottleneck of a central coordinator.
Distributed, Parallel, and Cluster Computing
3 code implementations • 30 Nov 2015 • Edgar Solomonik, Torsten Hoefler
Dense and sparse tensors allow the representation of most bulk data structures in computational science applications.
Mathematical Software