Search Results for author: Quentin Anthony

Found 22 papers, 13 papers with code

Robin: a Suite of Multi-Scale Vision-Language Models and the CHIRP Evaluation Benchmark

no code implementations16 Jan 2025 Alexis Roger, Prateek Humane, Daniel Z. Kaplan, Kshitij Gupta, Qi Sun, George Adamopoulos, Jonathan Siu Chi Lim, Quentin Anthony, Edwin Fennell, Irina Rish

The proliferation of Vision-Language Models (VLMs) in the past several years calls for rigorous and comprehensive evaluation methods and benchmarks.

Scaling Large Language Model Training on Frontier with Low-Bandwidth Partitioning

no code implementations8 Jan 2025 Lang Xu, Quentin Anthony, Jacob Hatef, Aamir Shafi, Hari Subramoni, Dhabaleswar K., Panda

In this work, we propose a collection of communication and optimization strategies for ZeRO++ to reduce communication costs and improve memory utilization.

Language Modeling Language Modelling +1

The Zamba2 Suite: Technical Report

no code implementations22 Nov 2024 Paolo Glorioso, Quentin Anthony, Yury Tokpanov, Anna Golubeva, Vasudev Shyam, James Whittington, Jonathan Pilault, Beren Millidge

In this technical report, we present the Zamba2 series -- a suite of 1. 2B, 2. 7B, and 7. 4B parameter hybrid Mamba2-transformer models that achieve state of the art performance against the leading open-weights models of their class, while achieving substantial gains in inference latency, throughput, and memory efficiency.

Zyda-2: a 5 Trillion Token High-Quality Dataset

no code implementations9 Nov 2024 Yury Tokpanov, Paolo Glorioso, Quentin Anthony, Beren Millidge

In this technical report, we present Zyda-2: a five trillion token dataset for language model pretraining.

Language Modeling Language Modelling

Accelerating Large Language Model Training with Hybrid GPU-based Compression

no code implementations4 Sep 2024 Lang Xu, Quentin Anthony, Qinghua Zhou, Nawras Alnaasan, Radha R. Gulhane, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda

Using the adjusted hybrid compression scheme, we demonstrate a 17. 3\% increase in TFLOPS per GPU and a 12. 7\% increase in samples per second while reaching baseline loss convergence.

Language Modeling Language Modelling +1

Demystifying the Communication Characteristics for Distributed Transformer Models

no code implementations19 Aug 2024 Quentin Anthony, Benjamin Michalowicz, Jacob Hatef, Lang Xu, Mustafa Abduljabbar, Aamir Shafi, Hari Subramoni, Dhabaleswar Panda

Deep learning (DL) models based on the transformer architecture have revolutionized many DL applications such as large language models (LLMs), vision transformers, audio generation, and time series prediction.

Audio Generation Time Series +1

Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters

1 code implementation7 Aug 2024 Vasudev Shyam, Jonathan Pilault, Emily Shepperd, Quentin Anthony, Beren Millidge

Self-attention is the core mathematical operation of modern transformer architectures and is also a significant computational bottleneck due to its quadratic complexity in the sequence length.

Zyda: A 1.3T Dataset for Open Language Modeling

1 code implementation4 Jun 2024 Yury Tokpanov, Beren Millidge, Paolo Glorioso, Jonathan Pilault, Adam Ibrahim, James Whittington, Quentin Anthony

The size of large language models (LLMs) has scaled dramatically in recent years and their computational and data requirements have surged correspondingly.

Language Modeling Language Modelling

Zamba: A Compact 7B SSM Hybrid Model

no code implementations26 May 2024 Paolo Glorioso, Quentin Anthony, Yury Tokpanov, James Whittington, Jonathan Pilault, Adam Ibrahim, Beren Millidge

Zamba is pretrained in two phases: the first phase is based on existing web datasets, while the second one consists of annealing the model over high-quality instruct and synthetic datasets, and is characterized by a rapid learning rate decay.

Mamba

Simple and Scalable Strategies to Continually Pre-train Large Language Models

1 code implementation13 Mar 2024 Adam Ibrahim, Benjamin Thérien, Kshitij Gupta, Mats L. Richter, Quentin Anthony, Timothée Lesort, Eugene Belilovsky, Irina Rish

In this work, we show that a simple and scalable combination of learning rate (LR) re-warming, LR re-decaying, and replay of previous data is sufficient to match the performance of fully re-training from scratch on all available data, as measured by the final loss and the average score on several language model (LM) evaluation benchmarks.

Continual Learning Language Modelling

BlackMamba: Mixture of Experts for State-Space Models

1 code implementation1 Feb 2024 Quentin Anthony, Yury Tokpanov, Paolo Glorioso, Beren Millidge

In this paper, we present BlackMamba, a novel architecture that combines the Mamba SSM with MoE to obtain the benefits of both.

Language Modeling Language Modelling +2

The Case for Co-Designing Model Architectures with Hardware

1 code implementation25 Jan 2024 Quentin Anthony, Jacob Hatef, Deepak Narayanan, Stella Biderman, Stas Bekman, Junqi Yin, Aamir Shafi, Hari Subramoni, Dhabaleswar Panda

While GPUs are responsible for training the vast majority of state-of-the-art deep learning models, the implications of their architecture are often overlooked when designing new deep learning (DL) models.

Deep Learning

Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference

1 code implementation16 Jan 2024 Jinghan Yao, Quentin Anthony, Aamir Shafi, Hari Subramoni, Dhabaleswar K., Panda

Unlike previous methods, our solution can be directly applied to pre-trained MoE models without any fine-tuning or accuracy degradation.

Continual Pre-Training of Large Language Models: How to (re)warm your model?

2 code implementations8 Aug 2023 Kshitij Gupta, Benjamin Thérien, Adam Ibrahim, Mats L. Richter, Quentin Anthony, Eugene Belilovsky, Irina Rish, Timothée Lesort

We study the warmup phase of models pre-trained on the Pile (upstream data, 300B tokens) as we continue to pre-train on SlimPajama (downstream data, 297B tokens), following a linear warmup and cosine decay schedule.

Language Modelling

Emergent and Predictable Memorization in Large Language Models

2 code implementations NeurIPS 2023 Stella Biderman, USVSN Sai Prashanth, Lintang Sutawika, Hailey Schoelkopf, Quentin Anthony, Shivanshu Purohit, Edward Raff

Memorization, or the tendency of large language models (LLMs) to output entire sequences from their training data verbatim, is a key concern for safely deploying language models.

Memorization

MCR-DL: Mix-and-Match Communication Runtime for Deep Learning

no code implementations15 Mar 2023 Quentin Anthony, Ammar Ahmad Awan, Jeff Rasley, Yuxiong He, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar Panda

However, such distributed DL parallelism strategies require a varied mixture of collective and point-to-point communication operations across a broad range of message sizes and scales.

Deep Learning

GPT-NeoX-20B: An Open-Source Autoregressive Language Model

10 code implementations BigScience (ACL) 2022 Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, Samuel Weinbach

We introduce GPT-NeoX-20B, a 20 billion parameter autoregressive language model trained on the Pile, whose weights will be made freely and openly available to the public through a permissive license.

Ranked #95 on Multi-task Language Understanding on MMLU (using extra training data)

Language Modeling Language Modelling +1

HyPar-Flow: Exploiting MPI and Keras for Scalable Hybrid-Parallel DNN Training using TensorFlow

no code implementations12 Nov 2019 Ammar Ahmad Awan, Arpan Jain, Quentin Anthony, Hari Subramoni, Dhabaleswar K. Panda

Four major problems we focus on are: 1) defining a notion of a distributed model across processes, 2) implementing forward/back-propagation across process boundaries that requires explicit communication, 3) obtaining parallel speedup on an inherently sequential task, and 4) achieving scalability without losing out on a model's accuracy.

Cannot find the paper you are looking for? You can Submit a new open access paper.