1 code implementation • 7 Dec 2023 • Victor Agostinelli, Max Wild, Matthew Raffel, Kazi Ahmed Asif Fuad, Lizhong Chen
Large language models (LLMs) with billions of parameters and pretrained on massive amounts of data are now capable of near or better than state-of-the-art performance in a variety of downstream natural language processing tasks.
1 code implementation • 3 Jul 2023 • Matthew Raffel, Lizhong Chen
Experiments on the MuST-C dataset show that the Implicit Memory Transformer provides a substantial speedup on the encoder forward pass with nearly identical translation quality when compared with the state-of-the-art approach that employs both left context and memory banks.
1 code implementation • 3 Jul 2023 • Matthew Raffel, Drew Penney, Lizhong Chen
Transformer models using segment-based processing have been an effective architecture for simultaneous speech translation.
no code implementations • 24 Jun 2023 • Tianhong Huang, Victor Agostinelli, Lizhong Chen
Compactness in deep learning can be critical to a model's viability in low-resource applications, and a common approach to extreme model compression is quantization.
no code implementations • 17 Apr 2023 • Victor Agostinelli, Lizhong Chen
Various natural language processing (NLP) tasks necessitate models that are efficient and small based on their ultimate application at the edge or in other resource-constrained environments.
no code implementations • 10 Apr 2023 • Drew Penney, Bin Li, Lizhong Chen, Jaroslaw J. Sydir, Anna Drewek-Ossowicka, Ramesh Illikkal, Charlie Tai, Ravi Iyer, Andrew Herdrich
Resource sharing between multiple workloads has become a prominent practice among cloud service providers, motivated by demand for improved resource utilization and reduced cost of ownership.
no code implementations • 19 Jan 2022 • Drew Penney, Bin Li, Jaroslaw Sydir, Lizhong Chen, Charlie Tai, Stefan Lee, Eoin Walsh, Thomas Long
A growing number of service providers are exploring methods to improve server utilization and reduce power consumption by co-scheduling high-priority latency-critical workloads with best-effort workloads.
1 code implementation • 20 Jul 2020 • Yongbin Gu, Wenxuan Wu, Yunfan Li, Lizhong Chen
The recent introduction of Unified Virtual Memory (UVM) in GPUs offers a new programming model that allows GPUs and CPUs to share the same virtual memory space, shifts the complex memory management from programmers to GPU driver/ hardware, and enables kernel execution even when memory is oversubscribed.
Hardware Architecture
no code implementations • 7 Jul 2020 • Jason Lowe-Power, Abdul Mutaal Ahmad, Ayaz Akram, Mohammad Alian, Rico Amslinger, Matteo Andreozzi, Adrià Armejach, Nils Asmussen, Brad Beckmann, Srikant Bharadwaj, Gabe Black, Gedare Bloom, Bobby R. Bruce, Daniel Rodrigues Carvalho, Jeronimo Castrillon, Lizhong Chen, Nicolas Derumigny, Stephan Diestelhorst, Wendy Elsasser, Carlos Escuin, Marjan Fariborz, Amin Farmahini-Farahani, Pouya Fotouhi, Ryan Gambord, Jayneel Gandhi, Dibakar Gope, Thomas Grass, Anthony Gutierrez, Bagus Hanindhito, Andreas Hansson, Swapnil Haria, Austin Harris, Timothy Hayes, Adrian Herrera, Matthew Horsnell, Syed Ali Raza Jafri, Radhika Jagtap, Hanhwi Jang, Reiley Jeyapaul, Timothy M. Jones, Matthias Jung, Subash Kannoth, Hamidreza Khaleghzadeh, Yuetsu Kodama, Tushar Krishna, Tommaso Marinelli, Christian Menard, Andrea Mondelli, Miquel Moreto, Tiago Mück, Omar Naji, Krishnendra Nathella, Hoa Nguyen, Nikos Nikoleris, Lena E. Olson, Marc Orr, Binh Pham, Pablo Prieto, Trivikram Reddy, Alec Roelke, Mahyar Samani, Andreas Sandberg, Javier Setoain, Boris Shingarov, Matthew D. Sinclair, Tuan Ta, Rahul Thakur, Giacomo Travaglini, Michael Upton, Nilay Vaish, Ilias Vougioukas, William Wang, Zhengrong Wang, Norbert Wehn, Christian Weis, David A. Wood, Hongil Yoon, Éder F. Zulian
The open-source and community-supported gem5 simulator is one of the most popular tools for computer architecture research.
Hardware Architecture
no code implementations • 26 Sep 2019 • Drew D. Penney, Lizhong Chen
Machine learning has enabled significant benefits in diverse fields, but, with a few exceptions, has had limited impact on computer architecture.
no code implementations • 11 May 2019 • Ting-Ru Lin, Drew Penney, Massoud Pedram, Lizhong Chen
Machine learning applied to architecture design presents a promising opportunity with broad applications.