Search Results for author: Raghuraman Krishnamoorthi

Found 21 papers, 8 papers with code

Quantizing deep convolutional networks for efficient inference: A whitepaper

4 code implementations21 Jun 2018 Raghuraman Krishnamoorthi

Per-channel quantization of weights and per-layer quantization of activations to 8-bits of precision post-training produces classification accuracies within 2% of floating point networks for a wide variety of CNN architectures.

Quantization

Check-N-Run: A Checkpointing System for Training Deep Learning Recommendation Models

no code implementations17 Oct 2020 Assaf Eisenman, Kiran Kumar Matam, Steven Ingram, Dheevatsa Mudigere, Raghuraman Krishnamoorthi, Krishnakumar Nair, Misha Smelyanskiy, Murali Annavaram

While Check-N-Run is applicable to long running ML jobs, we focus on checkpointing recommendation models which are currently the largest ML models with Terabytes of model size.

Quantization Recommendation Systems

Low-Precision Hardware Architectures Meet Recommendation Model Inference at Scale

no code implementations26 May 2021 Zhaoxia, Deng, Jongsoo Park, Ping Tak Peter Tang, Haixin Liu, Jie, Yang, Hector Yuen, Jianyu Huang, Daya Khudia, Xiaohan Wei, Ellie Wen, Dhruv Choudhary, Raghuraman Krishnamoorthi, Carole-Jean Wu, Satish Nadathur, Changkyu Kim, Maxim Naumov, Sam Naghshineh, Mikhail Smelyanskiy

We share in this paper our search strategies to adapt reference recommendation models to low-precision hardware, our optimization of low-precision compute kernels, and the design and development of tool chain so as to maintain our models' accuracy throughout their lifespan during which topic trends and users' interests inevitably evolve.

Recommendation Systems

BiT: Robustly Binarized Multi-distilled Transformer

2 code implementations25 May 2022 Zechun Liu, Barlas Oguz, Aasish Pappu, Lin Xiao, Scott Yih, Meng Li, Raghuraman Krishnamoorthi, Yashar Mehdad

Modern pre-trained transformers have rapidly advanced the state-of-the-art in machine learning, but have also grown in parameters and computational complexity, making them increasingly difficult to deploy in resource-constrained environments.

Binarization

DepthShrinker: A New Compression Paradigm Towards Boosting Real-Hardware Efficiency of Compact Neural Networks

1 code implementation2 Jun 2022 Yonggan Fu, Haichuan Yang, Jiayi Yuan, Meng Li, Cheng Wan, Raghuraman Krishnamoorthi, Vikas Chandra, Yingyan Lin

Efficient deep neural network (DNN) models equipped with compact operators (e. g., depthwise convolutions) have shown great potential in reducing DNNs' theoretical complexity (e. g., the total number of weights/operations) while maintaining a decent model accuracy.

Learning a Dual-Mode Speech Recognition Model via Self-Pruning

no code implementations25 Jul 2022 Chunxi Liu, Yuan Shangguan, Haichuan Yang, Yangyang Shi, Raghuraman Krishnamoorthi, Ozlem Kalinli

There is growing interest in unifying the streaming and full-context automatic speech recognition (ASR) networks into a single end-to-end ASR model to simplify the model training and deployment for both use cases.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Fast Point Cloud Generation with Straight Flows

1 code implementation CVPR 2023 Lemeng Wu, Dilin Wang, Chengyue Gong, Xingchao Liu, Yunyang Xiong, Rakesh Ranjan, Raghuraman Krishnamoorthi, Vikas Chandra, Qiang Liu

We perform evaluations on multiple 3D tasks and find that our PSF performs comparably to the standard diffusion model, outperforming other efficient 3D point cloud generation methods.

Point Cloud Completion

PathFusion: Path-consistent Lidar-Camera Deep Feature Fusion

no code implementations12 Dec 2022 Lemeng Wu, Dilin Wang, Meng Li, Yunyang Xiong, Raghuraman Krishnamoorthi, Qiang Liu, Vikas Chandra

Fusing 3D LiDAR features with 2D camera features is a promising technique for enhancing the accuracy of 3D detection, thanks to their complementary physical properties.

LLM-QAT: Data-Free Quantization Aware Training for Large Language Models

no code implementations29 May 2023 Zechun Liu, Barlas Oguz, Changsheng Zhao, Ernie Chang, Pierre Stock, Yashar Mehdad, Yangyang Shi, Raghuraman Krishnamoorthi, Vikas Chandra

Several post-training quantization methods have been applied to large language models (LLMs), and have been shown to perform well down to 8-bits.

Data Free Quantization

Binary and Ternary Natural Language Generation

1 code implementation2 Jun 2023 Zechun Liu, Barlas Oguz, Aasish Pappu, Yangyang Shi, Raghuraman Krishnamoorthi

For machine translation, we achieved BLEU scores of 21. 7 and 17. 6 on the WMT16 En-Ro benchmark, compared with a full precision mBART model score of 26. 8.

Machine Translation Quantization +2

TODM: Train Once Deploy Many Efficient Supernet-Based RNN-T Compression For On-device ASR Models

no code implementations5 Sep 2023 Yuan Shangguan, Haichuan Yang, Danni Li, Chunyang Wu, Yassir Fathullah, Dilin Wang, Ayushi Dalmia, Raghuraman Krishnamoorthi, Ozlem Kalinli, Junteng Jia, Jay Mahadeokar, Xin Lei, Mike Seltzer, Vikas Chandra

Results demonstrate that our TODM Supernet either matches or surpasses the performance of manually tuned models by up to a relative of 3% better in word error rate (WER), while efficiently keeping the cost of training many models at a small constant.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning

1 code implementation14 Oct 2023 Jun Chen, Deyao Zhu, Xiaoqian Shen, Xiang Li, Zechun Liu, Pengchuan Zhang, Raghuraman Krishnamoorthi, Vikas Chandra, Yunyang Xiong, Mohamed Elhoseiny

Motivated by this, we target to build a unified interface for completing many vision-language tasks including image description, visual question answering, and visual grounding, among others.

Language Modelling Large Language Model +4

EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything

1 code implementation1 Dec 2023 Yunyang Xiong, Bala Varadarajan, Lemeng Wu, Xiaoyu Xiang, Fanyi Xiao, Chenchen Zhu, Xiaoliang Dai, Dilin Wang, Fei Sun, Forrest Iandola, Raghuraman Krishnamoorthi, Vikas Chandra

On segment anything task such as zero-shot instance segmentation, our EfficientSAMs with SAMI-pretrained lightweight image encoders perform favorably with a significant gain (e. g., ~4 AP on COCO/LVIS) over other fast SAM models.

Image Classification Instance Segmentation +5

Diversify, Don't Fine-Tune: Scaling Up Visual Recognition Training with Synthetic Images

no code implementations4 Dec 2023 Zhuoran Yu, Chenchen Zhu, Sean Culatana, Raghuraman Krishnamoorthi, Fanyi Xiao, Yong Jae Lee

We present a new framework leveraging off-the-shelf generative models to generate synthetic training images, addressing multiple challenges: class name ambiguity, lack of diversity in naive prompts, and domain shifts.

Domain Generalization Text-to-Image Generation

Gen2Det: Generate to Detect

no code implementations7 Dec 2023 Saksham Suri, Fanyi Xiao, Animesh Sinha, Sean Chang Culatana, Raghuraman Krishnamoorthi, Chenchen Zhu, Abhinav Shrivastava

In the long-tailed detection setting on LVIS, Gen2Det improves the performance on rare categories by a large margin while also significantly improving the performance on other categories, e. g. we see an improvement of 2. 13 Box AP and 1. 84 Mask AP over just training on real data on LVIS with Mask R-CNN.

Image Generation Object +2

Communication Efficient Distributed Training with Distributed Lion

no code implementations30 Mar 2024 Bo Liu, Lemeng Wu, Lizhang Chen, Kaizhao Liang, Jiaxu Zhu, Chen Liang, Raghuraman Krishnamoorthi, Qiang Liu

The Lion optimizer has been a promising competitor with the AdamW for training large AI models, with advantages on memory, computation, and sample efficiency.

Cannot find the paper you are looking for? You can Submit a new open access paper.