Search Results for author: Yuxiong He

Found 50 papers, 30 papers with code

FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design

2 code implementations • 25 Jan 2024 • Haojun Xia, Zhen Zheng, Xiaoxia Wu, Shiyang Chen, Zhewei Yao, Stephen Youn, Arash Bakhtiari, Michael Wyatt, Donglin Zhuang, Zhongzhu Zhou, Olatunji Ruwase, Yuxiong He, Shuaiwen Leon Song

However, existing systems do not provide Tensor Core support for FP6 quantization and struggle to achieve practical performance improvements during LLM inference.

Quantization

32,517

Paper
Code

DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference

2 code implementations • 9 Jan 2024 • Connor Holmes, Masahiro Tanaka, Michael Wyatt, Ammar Ahmad Awan, Jeff Rasley, Samyam Rajbhandari, Reza Yazdani Aminabadi, Heyang Qin, Arash Bakhtiari, Lev Kurilenko, Yuxiong He

The deployment and scaling of large language models (LLMs) have become critical as they permeate various applications, demanding high-throughput and low-latency serving systems.

Benchmarking Text Generation

32,517

Paper
Code

ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks

2 code implementations • 14 Dec 2023 • Xiaoxia Wu, Haojun Xia, Stephen Youn, Zhen Zheng, Shiyang Chen, Arash Bakhtiari, Michael Wyatt, Reza Yazdani Aminabadi, Yuxiong He, Olatunji Ruwase, Leon Song, Zhewei Yao

With our design, FP6 can become a promising solution to the current 4-bit quantization methods used in LLMs.

Abstractive Text Summarization Code Generation +1

32,517

Paper
Code

ZeroQuant-HERO: Hardware-Enhanced Robust Optimized Post-Training Quantization Framework for W8A8 Transformers

no code implementations • 26 Oct 2023 • Zhewei Yao, Reza Yazdani Aminabadi, Stephen Youn, Xiaoxia Wu, Elton Zheng, Yuxiong He

Quantization techniques are pivotal in reducing the memory and computational demands of deep neural network inference.

Quantization

Paper
Add Code

DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies

no code implementations • 6 Oct 2023 • Shuaiwen Leon Song, Bonnie Kruft, Minjia Zhang, Conglong Li, Shiyang Chen, Chengming Zhang, Masahiro Tanaka, Xiaoxia Wu, Jeff Rasley, Ammar Ahmad Awan, Connor Holmes, Martin Cai, Adam Ghanem, Zhongzhu Zhou, Yuxiong He, Pete Luferenko, Divya Kumar, Jonathan Weyn, Ruixiong Zhang, Sylwester Klocek, Volodymyr Vragov, Mohammed AlQuraishi, Gustaf Ahdritz, Christina Floristean, Cristina Negri, Rao Kotamarthi, Venkatram Vishwanath, Arvind Ramanathan, Sam Foreman, Kyle Hippe, Troy Arcomano, Romit Maulik, Maxim Zvyagin, Alexander Brace, Bin Zhang, Cindy Orozco Bohorquez, Austin Clyde, Bharat Kale, Danilo Perez-Rivera, Heng Ma, Carla M. Mann, Michael Irvin, J. Gregory Pauloski, Logan Ward, Valerie Hayot, Murali Emani, Zhen Xie, Diangen Lin, Maulik Shukla, Ian Foster, James J. Davis, Michael E. Papka, Thomas Brettin, Prasanna Balaprakash, Gina Tourassi, John Gounley, Heidi Hanson, Thomas E Potok, Massimiliano Lupo Pasini, Kate Evans, Dan Lu, Dalton Lunga, Junqi Yin, Sajal Dash, Feiyi Wang, Mallikarjun Shankar, Isaac Lyngaas, Xiao Wang, Guojing Cong, Pei Zhang, Ming Fan, Siyan Liu, Adolfy Hoisie, Shinjae Yoo, Yihui Ren, William Tang, Kyle Felker, Alexey Svyatkovskiy, Hang Liu, Ashwin Aji, Angela Dalton, Michael Schulte, Karl Schulz, Yuntian Deng, Weili Nie, Josh Romero, Christian Dallago, Arash Vahdat, Chaowei Xiao, Thomas Gibbs, Anima Anandkumar, Rick Stevens

In the upcoming decade, deep learning may revolutionize the natural sciences, enhancing our capacity to model and predict natural occurrences.

Paper
Add Code

DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention

2 code implementations • 25 Sep 2023 • Zhewei Yao, Xiaoxia Wu, Conglong Li, Minjia Zhang, Heyang Qin, Olatunji Ruwase, Ammar Ahmad Awan, Samyam Rajbhandari, Yuxiong He

Most of the existing multi-modal models, hindered by their incapacity to adeptly manage interleaved image-and-text inputs in multi-image, multi-round dialogues, face substantial constraints in resource allocation for training and data accessibility, impacting their adaptability and scalability across varied interaction realms.

Language Modelling

5,640

Paper
Code

DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models

1 code implementation • 25 Sep 2023 • Sam Ade Jacobs, Masahiro Tanaka, Chengming Zhang, Minjia Zhang, Shuaiwen Leon Song, Samyam Rajbhandari, Yuxiong He

Computation in a typical Transformer-based large language model (LLM) can be characterized by batch size, hidden dimension, number of layers, and sequence length.

Language Modelling Large Language Model

982

Paper
Code

RenAIssance: A Survey into AI Text-to-Image Generation in the Era of Large Model

no code implementations • 2 Sep 2023 • Fengxiang Bie, Yibo Yang, Zhongzhu Zhou, Adam Ghanem, Minjia Zhang, Zhewei Yao, Xiaoxia Wu, Connor Holmes, Pareesa Golnari, David A. Clifton, Yuxiong He, DaCheng Tao, Shuaiwen Leon Song

Text-to-image generation (TTI) refers to the usage of models that could process text input and generate high fidelity images based on text descriptions.

3D Generation Text-to-Image Generation +1

Paper
Add Code

DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales

1 code implementation • 2 Aug 2023 • Zhewei Yao, Reza Yazdani Aminabadi, Olatunji Ruwase, Samyam Rajbhandari, Xiaoxia Wu, Ammar Ahmad Awan, Jeff Rasley, Minjia Zhang, Conglong Li, Connor Holmes, Zhongzhu Zhou, Michael Wyatt, Molly Smith, Lev Kurilenko, Heyang Qin, Masahiro Tanaka, Shuai Che, Shuaiwen Leon Song, Yuxiong He

ChatGPT-like models have revolutionized various applications in artificial intelligence, from summarization and coding to translation, matching or even surpassing human performance.

32,517

Paper
Code

ZeroQuant-FP: A Leap Forward in LLMs Post-Training W4A8 Quantization Using Floating-Point Formats

1 code implementation • 19 Jul 2023 • Xiaoxia Wu, Zhewei Yao, Yuxiong He

In the complex domain of large language models (LLMs), striking a balance between computational efficiency and maintaining model quality is a formidable challenge.

Computational Efficiency Quantization

32,517

Paper
Code

ZeRO++: Extremely Efficient Collective Communication for Giant Model Training

1 code implementation • 16 Jun 2023 • Guanhua Wang, Heyang Qin, Sam Ade Jacobs, Connor Holmes, Samyam Rajbhandari, Olatunji Ruwase, Feng Yan, Lei Yang, Yuxiong He

Zero Redundancy Optimizer (ZeRO) has been used to train a wide range of large language models on massive GPUs clusters due to its ease of use, efficiency, and good scalability.

Quantization

32,517

Paper
Code

Selective Guidance: Are All the Denoising Steps of Guided Diffusion Important?

no code implementations • 16 May 2023 • Pareesa Ameneh Golnari, Zhewei Yao, Yuxiong He

This study examines the impact of optimizing the Stable Diffusion (SD) guided inference pipeline.

Denoising

Paper
Add Code

HEAT: A Highly Efficient and Affordable Training System for Collaborative Filtering Based Recommendation on CPUs

1 code implementation • 14 Apr 2023 • Chengming Zhang, Shaden Smith, Baixi Sun, Jiannan Tian, Jonathan Soifer, Xiaodong Yu, Shuaiwen Leon Song, Yuxiong He, Dingwen Tao

Collaborative filtering (CF) has been proven to be one of the most effective techniques for recommendation.

Collaborative Filtering

Paper
Code

ZeroQuant-V2: Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation

2 code implementations • 15 Mar 2023 • Zhewei Yao, Xiaoxia Wu, Cheng Li, Stephen Youn, Yuxiong He

Post-training quantization (PTQ) has emerged as a promising technique for mitigating memory consumption and computational costs in large language models (LLMs).

Quantization

32,517

Paper
Code

MCR-DL: Mix-and-Match Communication Runtime for Deep Learning

no code implementations • 15 Mar 2023 • Quentin Anthony, Ammar Ahmad Awan, Jeff Rasley, Yuxiong He, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar Panda

However, such distributed DL parallelism strategies require a varied mixture of collective and point-to-point communication operations across a broad range of message sizes and scales.

Paper
Add Code

Scaling Vision-Language Models with Sparse Mixture of Experts

no code implementations • 13 Mar 2023 • Sheng Shen, Zhewei Yao, Chunyuan Li, Trevor Darrell, Kurt Keutzer, Yuxiong He

The field of natural language processing (NLP) has made significant strides in recent years, particularly in the development of large-scale vision-language models (VLMs).

Paper
Add Code

A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training

1 code implementation • 11 Mar 2023 • Siddharth Singh, Olatunji Ruwase, Ammar Ahmad Awan, Samyam Rajbhandari, Yuxiong He, Abhinav Bhatele

Mixture-of-Experts (MoE) is a neural network architecture that adds sparsely activated expert blocks to a base model, increasing the number of parameters without impacting computational costs.

32,517

Paper
Code

Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases

1 code implementation • 27 Jan 2023 • Xiaoxia Wu, Cheng Li, Reza Yazdani Aminabadi, Zhewei Yao, Yuxiong He

Improving the deployment efficiency of transformer-based language models has been challenging given their high computation and memory cost.

Quantization

32,517

Paper
Code

DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing

1 code implementation • 7 Dec 2022 • Conglong Li, Zhewei Yao, Xiaoxia Wu, Minjia Zhang, Connor Holmes, Cheng Li, Yuxiong He

Compared to the rapidly evolving model architecture, how to efficiently use the training data (especially for the expensive foundation model pretraining) is both less explored and difficult to realize due to the lack of a convenient framework that focuses on data efficiency capabilities.

Language Modelling Large Language Model

32,517

Paper
Code

Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers

1 code implementation • 17 Nov 2022 • Zhewei Yao, Xiaoxia Wu, Conglong Li, Connor Holmes, Minjia Zhang, Cheng Li, Yuxiong He

Large-scale transformer models have become the de-facto architectures for various machine learning applications, e. g., CV and NLP.

32,517

Paper
Code

BiFeat: Supercharge GNN Training via Graph Feature Quantization

1 code implementation • 29 Jul 2022 • Yuxin Ma, Ping Gong, Jun Yi, Zhewei Yao, Cheng Li, Yuxiong He, Feng Yan

We identify the main accuracy impact factors in graph feature quantization and theoretically prove that BiFeat training converges to a network where the loss is within $\epsilon$ of the optimal loss of uncompressed network.

Quantization

Paper
Code

DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale

2 code implementations • 30 Jun 2022 • Reza Yazdani Aminabadi, Samyam Rajbhandari, Minjia Zhang, Ammar Ahmad Awan, Cheng Li, Du Li, Elton Zheng, Jeff Rasley, Shaden Smith, Olatunji Ruwase, Yuxiong He

DeepSpeed Inference reduces latency by up to 7. 3X over the state-of-the-art for latency-oriented scenarios and increases throughput by over 1. 5x for throughput-oriented scenarios.

32,517

Paper
Code

Compressing Pre-trained Transformers via Low-Bit NxM Sparsity for Natural Language Understanding

no code implementations • 30 Jun 2022 • Connor Holmes, Minjia Zhang, Yuxiong He, Bo Wu

Furthermore, we present and inexpensive, heuristic-driven search algorithm that identifies promising heterogeneous compression configurations that meet a compression ratio constraint.

Natural Language Understanding Quantization

Paper
Add Code

Extreme Compression for Pre-trained Transformers Made Simple and Efficient

1 code implementation • 4 Jun 2022 • Xiaoxia Wu, Zhewei Yao, Minjia Zhang, Conglong Li, Yuxiong He

Extreme compression, particularly ultra-low bit precision (binary/ternary) quantization, has been proposed to fit large NLP models on resource-constraint devices.

Knowledge Distillation Quantization

32,517

Paper
Code

ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers

3 code implementations • 4 Jun 2022 • Zhewei Yao, Reza Yazdani Aminabadi, Minjia Zhang, Xiaoxia Wu, Conglong Li, Yuxiong He

How to efficiently serve ever-larger trained natural language models in practice has become exceptionally challenging even for powerful cloud servers due to their prohibitive memory/computation requirements.

Knowledge Distillation Quantization

32,517

Paper
Code

Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam

1 code implementation • 12 Feb 2022 • Yucheng Lu, Conglong Li, Minjia Zhang, Christopher De Sa, Yuxiong He

1-bit gradient compression and local steps are two representative techniques that enable drastic communication reduction in distributed SGD.

Open-Ended Question Answering

32,517

Paper
Code

ScaLA: Accelerating Adaptation of Pre-Trained Transformer-Based Language Models via Efficient Large-Batch Adversarial Noise

no code implementations • 29 Jan 2022 • Minjia Zhang, Niranjan Uma Naresh, Yuxiong He

In recent years, large pre-trained Transformer-based language models have led to dramatic improvements in many natural language understanding tasks.

Natural Language Understanding

Paper
Add Code

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model

1 code implementation • 28 Jan 2022 • Shaden Smith, Mostofa Patwary, Brandon Norick, Patrick Legresley, Samyam Rajbhandari, Jared Casper, Zhun Liu, Shrimai Prabhumoye, George Zerveas, Vijay Korthikanti, Elton Zhang, Rewon Child, Reza Yazdani Aminabadi, Julie Bernauer, Xia Song, Mohammad Shoeybi, Yuxiong He, Michael Houston, Saurabh Tiwary, Bryan Catanzaro

Next, we detail the training process, the design of our training corpus, and our data curation techniques, which we believe is a key ingredient to the success of the model.

Ranked #33 on Sentence Completion on HellaSwag

Few-Shot Learning Language Modelling +1

32,517

Paper
Code

DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale

3 code implementations • 14 Jan 2022 • Samyam Rajbhandari, Conglong Li, Zhewei Yao, Minjia Zhang, Reza Yazdani Aminabadi, Ammar Ahmad Awan, Jeff Rasley, Yuxiong He

As the training of giant dense models hits the boundary on the availability and capability of the hardware resources today, Mixture-of-Experts (MoE) models become one of the most promising model architectures due to their significant training cost reduction compared to a quality-equivalent dense model.

Model Compression

32,517

Paper
Code

SimiGrad: Fine-Grained Adaptive Batching for Large Scale Training using Gradient Similarity Measurement

1 code implementation • NeurIPS 2021 • Heyang Qin, Samyam Rajbhandari, Olatunji Ruwase, Feng Yan, Lei Yang, Yuxiong He

In this paper, we propose a fully automated and lightweight adaptive batching methodology to enable fine-grained batch size adaption (e. g., at a mini-batch level) that can achieve state-of-the-art performance with record breaking batch sizes.

Computational Efficiency

Paper
Code

NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM

no code implementations • NeurIPS 2021 • Connor Holmes, Minjia Zhang, Yuxiong He, Bo Wu

In particular, we propose to formulate the NxM sparsity as a constrained optimization problem and use Alternating Direction Method of Multipliers (ADMM) to optimize the downstream tasks while taking the underlying hardware constraints into consideration.

Knowledge Distillation Natural Language Understanding

Paper
Add Code

ScaLA: Speeding-Up Fine-tuning of Pre-trained Transformer Networks via Efficient and Scalable Adversarial Perturbation

no code implementations • 29 Sep 2021 • Minjia Zhang, Niranjan Uma Naresh, Yuxiong He

To address this challenge, we propose ScaLA, a scalable and robust method for large-batch optimization of transformer networks via adversarial perturbation.

Paper
Add Code

Demystifying Hyperparameter Optimization in Federated Learning

no code implementations • 29 Sep 2021 • Syed Zawad, Jun Yi, Minjia Zhang, Cheng Li, Feng Yan, Yuxiong He

Such data heterogeneity and privacy requirements bring unique challenges for learning hyperparameter optimization as the training dynamics change across clients even within the same training round and they are difficult to measure due to privacy constraints.

Federated Learning Hyperparameter Optimization +1

Paper
Add Code

HoloFormer: Deep Compression of Pre-Trained Transforms via Unified Optimization of N:M Sparsity and Integer Quantization

no code implementations • 29 Sep 2021 • Minjia Zhang, Connor Holmes, Yuxiong He, Bo Wu

In this work, we propose a unified, systematic approach to learning N:M sparsity and integer quantization for pre-trained Transformers using the Alternating Directions Method of Multipliers (ADMM).

Quantization

Paper
Add Code

Scalable and Efficient MoE Training for Multitask Multilingual Models

1 code implementation • 22 Sep 2021 • Young Jin Kim, Ammar Ahmad Awan, Alexandre Muzio, Andres Felipe Cruz Salinas, Liyang Lu, Amr Hendy, Samyam Rajbhandari, Yuxiong He, Hany Hassan Awadalla

By combining the efficient system and training methods, we are able to significantly scale up large multitask multilingual models for language generation which results in a great improvement in model accuracy.

Machine Translation Text Generation

32,517

Paper
Code

The Stability-Efficiency Dilemma: Investigating Sequence Length Warmup for Training GPT Models

1 code implementation • 13 Aug 2021 • Conglong Li, Minjia Zhang, Yuxiong He

To reduce the wall-clock training time, a common practice is to increase the batch size and learning rate.

LAMBADA Text Generation

32,517

Paper
Code

LEAP: Learnable Pruning for Transformer-based Models

1 code implementation • 30 May 2021 • Zhewei Yao, Xiaoxia Wu, Linjian Ma, Sheng Shen, Kurt Keutzer, Michael W. Mahoney, Yuxiong He

Moreover, in order to reduce hyperparameter tuning, a novel adaptive regularization coefficient is deployed to control the regularization penalty adaptively.

QQP

Paper
Code

ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning

no code implementations • 16 Apr 2021 • Samyam Rajbhandari, Olatunji Ruwase, Jeff Rasley, Shaden Smith, Yuxiong He

It requires 800 NVIDIA V100 GPUs just to fit a trillion parameter model for training, and such clusters are simply out of reach for most data scientists.

Paper
Add Code

1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence Speed

1 code implementation • 13 Apr 2021 • Conglong Li, Ammar Ahmad Awan, Hanlin Tang, Samyam Rajbhandari, Yuxiong He

To this end, we design a new communication-efficient algorithm, 1-bit LAMB, which introduces a novel way to support adaptive layerwise learning rates under compression.

32,517

Paper
Code

1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed

2 code implementations • 4 Feb 2021 • Hanlin Tang, Shaoduo Gan, Ammar Ahmad Awan, Samyam Rajbhandari, Conglong Li, Xiangru Lian, Ji Liu, Ce Zhang, Yuxiong He

One of the most effective methods is error-compensated compression, which offers robust convergence speed even under 1-bit compression.

32,517

Paper
Code

ZeRO-Offload: Democratizing Billion-Scale Model Training

3 code implementations • 18 Jan 2021 • Jie Ren, Samyam Rajbhandari, Reza Yazdani Aminabadi, Olatunji Ruwase, Shuangyan Yang, Minjia Zhang, Dong Li, Yuxiong He

By combining compute and memory efficiency with ease-of-use, ZeRO-Offload democratizes large-scale model training making it accessible to even data scientists with access to just a single GPU.

Computational Efficiency

32,517

Paper
Code

Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping

1 code implementation • NeurIPS 2020 • Minjia Zhang, Yuxiong He

Recently, Transformer-based language models have demonstrated remarkable performance across many NLP domains.

Language Modelling Unsupervised Pre-training

Paper
Code

APMSqueeze: A Communication Efficient Adam-Preconditioned Momentum SGD Algorithm

no code implementations • 26 Aug 2020 • Hanlin Tang, Shaoduo Gan, Samyam Rajbhandari, Xiangru Lian, Ji Liu, Yuxiong He, Ce Zhang

Adam is the important optimization algorithm to guarantee efficiency and accuracy for training many important tasks such as BERT and ImageNet.

Paper
Add Code

SHARP: An Adaptable, Energy-Efficient Accelerator for Recurrent Neural Network

no code implementations • 4 Nov 2019 • Reza Yazdani, Olatunji Ruwase, Minjia Zhang, Yuxiong He, Jose-Maria Arnau, Antonio Gonzalez

To solve these issues, we propose an intelligent tiled-based dispatching mechanism for increasing the adaptiveness of RNN computation, in order to efficiently handle the data dependencies.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

ZeRO: Memory Optimizations Toward Training Trillion Parameter Models

10 code implementations • 4 Oct 2019 • Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, Yuxiong He

Large deep learning models offer significant accuracy gains, but training billions to trillions of parameters is challenging.

Cross-Lingual Document Classification Image Generation +1

47,594

Paper
Code

AntMan: Sparse Low-Rank Compression to Accelerate RNN inference

no code implementations • ICLR 2019 • Samyam Rajbhandari, Harsh Shrivastava, Yuxiong He

Wide adoption of complex RNN based models is hindered by their inference performance, cost and memory requirements.

Knowledge Distillation Low-rank compression +1

Paper
Add Code

Learning to Anneal and Prune Proximity Graphs for Similarity Search

no code implementations • 25 Sep 2019 • Minjia Zhang, Wenhan Wang, Yuxiong He

This paper studies similarity search, which is a crucial enabler of many feature vector--based applications.

Stochastic Optimization

Paper
Add Code

Zoom: SSD-based Vector Search for Optimizing Accuracy, Latency and Memory

no code implementations • 11 Sep 2018 • Minjia Zhang, Yuxiong He

With the advancement of machine learning and deep learning, vector search becomes instrumental to many information retrieval systems, to search and find best matches to user queries based on their semantic similarities. These online services require the search architecture to be both effective with high accuracy and efficient with low latency and memory footprint, which existing work fails to offer.

Information Retrieval Retrieval

Paper
Add Code

Navigating with Graph Representations for Fast and Scalable Decoding of Neural Language Models

no code implementations • NeurIPS 2018 • Minjia Zhang, Xiaodong Liu, Wenhan Wang, Jianfeng Gao, Yuxiong He

Neural language models (NLMs) have recently gained a renewed interest by achieving state-of-the-art performance across many natural language processing (NLP) tasks.

Language Modelling Machine Translation +1

Paper
Add Code

Learning Intrinsic Sparse Structures within Long Short-Term Memory

no code implementations • ICLR 2018 • Wei Wen, Yuxiong He, Samyam Rajbhandari, Minjia Zhang, Wenhan Wang, Fang Liu, Bin Hu, Yiran Chen, Hai Li

This work aims to learn structurally-sparse Long Short-Term Memory (LSTM) by reducing the sizes of basic structures within LSTM units, including input updates, gates, hidden states, cell states and outputs.

Language Modelling Model Compression +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.