no code implementations • 27 Feb 2023 • Sehoon Kim, Coleman Hooper, Thanakul Wattanawong, Minwoo Kang, Ruohan Yan, Hasan Genc, Grace Dinh, Qijing Huang, Kurt Keutzer, Michael W. Mahoney, Yakun Sophia Shao, Amir Gholami
In this work, we survey different approaches for efficient Transformer inference, including: (i) analysis and profiling of the bottlenecks in existing Transformer architectures and their similarities and differences with previous convolutional models; (ii) implications of Transformer architecture on hardware, including the impact of non-linear operations such as Layer Normalization, Softmax, and GELU, as well as linear operations, on hardware design; (iii) approaches for optimizing a fixed Transformer architecture; (iv) challenges in finding the right mapping and scheduling of operations for Transformer models; and (v) approaches for optimizing Transformer models by adapting the architecture using neural architecture search.
1 code implementation • 15 Feb 2023 • Sehoon Kim, Karttikeya Mangalam, Suhong Moon, John Canny, Jitendra Malik, Michael W. Mahoney, Amir Gholami, Kurt Keutzer
To address this, we propose Big Little Decoder (BiLD), a framework that can improve inference efficiency and latency for a wide range of text generation applications.
1 code implementation • 20 Jul 2022 • Geonung Kim, Kyoungkook Kang, Seongtae Kim, Hwayoon Lee, Sehoon Kim, Jonghyun Kim, Seung-Hwan Baek, Sunghyun Cho
In this paper, we propose BigColor, a novel colorization approach that provides vivid colorization for diverse in-the-wild images with complex structures.
3 code implementations • 2 Jun 2022 • Sehoon Kim, Amir Gholami, Albert Shaw, Nicholas Lee, Karttikeya Mangalam, Jitendra Malik, Michael W. Mahoney, Kurt Keutzer
After re-examining the design choices for both the macro and micro-architecture of Conformer, we propose Squeezeformer which consistently outperforms the state-of-the-art ASR models under the same training schemes.
Ranked #27 on Speech Recognition on LibriSpeech test-clean
Automatic Speech Recognition Automatic Speech Recognition (ASR)
1 code implementation • 29 Mar 2022 • Woosuk Kwon, Sehoon Kim, Michael W. Mahoney, Joseph Hassoun, Kurt Keutzer, Amir Gholami
To address this, we propose a fast post-training pruning framework for Transformers that does not require any retraining.
no code implementations • NeurIPS 2021 • Taebum Kim, Eunji Jeong, Geon-Woo Kim, Yunmo Koo, Sehoon Kim, Gyeong-In Yu, Byung-Gon Chun
Recently, several systems have been proposed to combine the usability of imperative programming with the optimized performance of symbolic graph execution.
1 code implementation • 16 Dec 2021 • Ho Young Jhoo, Sehoon Kim, Woosung Song, Kyuyeon Park, DongKwon Lee, Kwangkeun Yi
We present an automatic static analyzer PyTea that detects tensor-shape errors in PyTorch code.
no code implementations • 25 Oct 2021 • Allison McCarn Deiana, Nhan Tran, Joshua Agar, Michaela Blott, Giuseppe Di Guglielmo, Javier Duarte, Philip Harris, Scott Hauck, Mia Liu, Mark S. Neubauer, Jennifer Ngadiuba, Seda Ogrenci-Memik, Maurizio Pierini, Thea Aarrestad, Steffen Bahr, Jurgen Becker, Anne-Sophie Berthold, Richard J. Bonventre, Tomas E. Muller Bravo, Markus Diefenthaler, Zhen Dong, Nick Fritzsche, Amir Gholami, Ekaterina Govorkova, Kyle J Hazelwood, Christian Herwig, Babar Khan, Sehoon Kim, Thomas Klijnsma, Yaling Liu, Kin Ho Lo, Tri Nguyen, Gianantonio Pezzullo, Seyedramin Rasoulinezhad, Ryan A. Rivera, Kate Scholberg, Justin Selig, Sougata Sen, Dmitri Strukov, William Tang, Savannah Thais, Kai Lukas Unger, Ricardo Vilalta, Belinavon Krosigk, Thomas K. Warburton, Maria Acosta Flechas, Anthony Aportela, Thomas Calvet, Leonardo Cristella, Daniel Diaz, Caterina Doglioni, Maria Domenica Galati, Elham E Khoda, Farah Fahim, Davide Giri, Benjamin Hawks, Duc Hoang, Burt Holzman, Shih-Chieh Hsu, Sergo Jindariani, Iris Johnson, Raghav Kansal, Ryan Kastner, Erik Katsavounidis, Jeffrey Krupa, Pan Li, Sandeep Madireddy, Ethan Marx, Patrick McCormack, Andres Meza, Jovan Mitrevski, Mohammed Attia Mohammed, Farouk Mokhtar, Eric Moreno, Srishti Nagu, Rohin Narayan, Noah Palladino, Zhiqiang Que, Sang Eon Park, Subramanian Ramamoorthy, Dylan Rankin, Simon Rothman, ASHISH SHARMA, Sioni Summers, Pietro Vischia, Jean-Roch Vlimant, Olivia Weng
In this community review report, we discuss applications and techniques for fast machine learning (ML) in science -- the concept of integrating power ML methods into the real-time experimental data processing loop to accelerate scientific discovery.
1 code implementation • 2 Jul 2021 • Sehoon Kim, Sheng Shen, David Thorsley, Amir Gholami, Woosuk Kwon, Joseph Hassoun, Kurt Keutzer
We extensively test the performance of LTP on GLUE tasks and show that our method outperforms the prior state-of-the-art token pruning methods by up to ~2. 5% higher accuracy with the same amount of FLOPs.
1 code implementation • 31 Mar 2021 • Sehoon Kim, Amir Gholami, Zhewei Yao, Nicholas Lee, Patrick Wang, Aniruddha Nrusimha, Bohan Zhai, Tianren Gao, Michael W. Mahoney, Kurt Keutzer
End-to-end neural network models achieve improved performance on various automatic speech recognition (ASR) tasks.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 25 Mar 2021 • Amir Gholami, Sehoon Kim, Zhen Dong, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer
Thus, it is not surprising that quantization has emerged recently as an important and very active sub-area of research in the efficient implementation of computations associated with Neural Networks.
1 code implementation • 22 Jan 2021 • Shixing Yu, Zhewei Yao, Amir Gholami, Zhen Dong, Sehoon Kim, Michael W Mahoney, Kurt Keutzer
To address this problem, we introduce a new Hessian Aware Pruning (HAP) method coupled with a Neural Implant approach that uses second-order sensitivity as a metric for structured pruning.
3 code implementations • 5 Jan 2021 • Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer
Transformer based models, like BERT and RoBERTa, have achieved state-of-the-art results in many Natural Language Processing tasks.
Natural Language Inference Natural Language Understanding +1
1 code implementation • 10 Jun 2019 • Gyeong-In Yu, Saeed Amizadeh, Sehoon Kim, Artidoro Pagnoni, Byung-Gon Chun, Markus Weimer, Matteo Interlandi
To this end, we propose a framework that translates a pre-trained ML pipeline into a neural network and fine-tunes the ML models within the pipeline jointly using backpropagation.