Search Results for author: Lizhong Chen

Found 11 papers, 4 papers with code

Simul-LLM: A Framework for Exploring High-Quality Simultaneous Translation with Large Language Models

1 code implementation7 Dec 2023 Victor Agostinelli, Max Wild, Matthew Raffel, Kazi Ahmed Asif Fuad, Lizhong Chen

Large language models (LLMs) with billions of parameters and pretrained on massive amounts of data are now capable of near or better than state-of-the-art performance in a variety of downstream natural language processing tasks.

Machine Translation NMT +1

Implicit Memory Transformer for Computationally Efficient Simultaneous Speech Translation

1 code implementation3 Jul 2023 Matthew Raffel, Lizhong Chen

Experiments on the MuST-C dataset show that the Implicit Memory Transformer provides a substantial speedup on the encoder forward pass with nearly identical translation quality when compared with the state-of-the-art approach that employs both left context and memory banks.

Translation

Shiftable Context: Addressing Training-Inference Context Mismatch in Simultaneous Speech Translation

1 code implementation3 Jul 2023 Matthew Raffel, Drew Penney, Lizhong Chen

Transformer models using segment-based processing have been an effective architecture for simultaneous speech translation.

Translation

Partitioning-Guided K-Means: Extreme Empty Cluster Resolution for Extreme Model Compression

no code implementations24 Jun 2023 Tianhong Huang, Victor Agostinelli, Lizhong Chen

Compactness in deep learning can be critical to a model's viability in low-resource applications, and a common approach to extreme model compression is quantization.

Model Compression Quantization

Improving Autoregressive NLP Tasks via Modular Linearized Attention

no code implementations17 Apr 2023 Victor Agostinelli, Lizhong Chen

Various natural language processing (NLP) tasks necessitate models that are efficient and small based on their ultimate application at the edge or in other resource-constrained environments.

Computational Efficiency Machine Translation +2

RAPID: Enabling Fast Online Policy Learning in Dynamic Public Cloud Environments

no code implementations10 Apr 2023 Drew Penney, Bin Li, Lizhong Chen, Jaroslaw J. Sydir, Anna Drewek-Ossowicka, Ramesh Illikkal, Charlie Tai, Ravi Iyer, Andrew Herdrich

Resource sharing between multiple workloads has become a prominent practice among cloud service providers, motivated by demand for improved resource utilization and reduced cost of ownership.

PROMPT: Learning Dynamic Resource Allocation Policies for Network Applications

no code implementations19 Jan 2022 Drew Penney, Bin Li, Jaroslaw Sydir, Lizhong Chen, Charlie Tai, Stefan Lee, Eoin Walsh, Thomas Long

A growing number of service providers are exploring methods to improve server utilization and reduce power consumption by co-scheduling high-priority latency-critical workloads with best-effort workloads.

Scheduling

UVMBench: A Comprehensive Benchmark Suite for Researching Unified Virtual Memory in GPUs

1 code implementation20 Jul 2020 Yongbin Gu, Wenxuan Wu, Yunfan Li, Lizhong Chen

The recent introduction of Unified Virtual Memory (UVM) in GPUs offers a new programming model that allows GPUs and CPUs to share the same virtual memory space, shifts the complex memory management from programmers to GPU driver/ hardware, and enables kernel execution even when memory is oversubscribed.

Hardware Architecture

A Survey of Machine Learning Applied to Computer Architecture Design

no code implementations26 Sep 2019 Drew D. Penney, Lizhong Chen

Machine learning has enabled significant benefits in diverse fields, but, with a few exceptions, has had limited impact on computer architecture.

BIG-bench Machine Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.