no code implementations • 6 Sep 2024 • Yecheng Wu, Zhuoyang Zhang, Junyu Chen, Haotian Tang, Dacheng Li, Yunhao Fang, Ligeng Zhu, Enze Xie, Hongxu Yin, Li Yi, Song Han, Yao Lu
VILA-U is a Unified foundation model that integrates Video, Image, Language understanding and generation.
1 code implementation • 19 Aug 2024 • Fuzhao Xue, Yukang Chen, Dacheng Li, Qinghao Hu, Ligeng Zhu, Xiuyu Li, Yunhao Fang, Haotian Tang, Shang Yang, Zhijian Liu, Ethan He, Hongxu Yin, Pavlo Molchanov, Jan Kautz, Linxi Fan, Yuke Zhu, Yao Lu, Song Han
We introduce the long-context Multi-Modal Sequence Parallelism (MM-SP) system that efficiently parallelizes long video training and inference, enabling 2M context length training on 256 GPUs without any gradient checkpointing.
no code implementations • 26 Jul 2024 • Zhijian Liu, Zhuoyang Zhang, Samir Khaki, Shang Yang, Haotian Tang, Chenfeng Xu, Kurt Keutzer, Song Han
Finally, it leverages a gated ensembler to apply these sparse refinements to the initial coarse predictions.
1 code implementation • 7 May 2024 • Yujun Lin, Haotian Tang, Shang Yang, Zhekai Zhang, Guangxuan Xiao, Chuang Gan, Song Han
The key insight driving QServe is that the efficiency of LLM serving on GPUs is critically influenced by operations on low-throughput CUDA cores.
no code implementations • CVPR 2024 • Norman Mu, Jingwei Ji, Zhenpei Yang, Nate Harada, Haotian Tang, Kan Chen, Charles R. Qi, Runzhou Ge, Kratarth Goel, Zoey Yang, Scott Ettinger, Rami Al-Rfou, Dragomir Anguelov, Yin Zhou
This symbolic representation is a high-level abstraction of the real world, which may render the motion prediction model vulnerable to perception errors (e. g., failures in detecting open-vocabulary obstacles) while missing salient information from the scene context (e. g., poor road conditions).
1 code implementation • 25 Oct 2023 • Haotian Tang, Shang Yang, Zhijian Liu, Ke Hong, Zhongming Yu, Xiuyu Li, Guohao Dai, Yu Wang, Song Han
On top of this, we design the Sparse Autotuner, which extends the design space of existing sparse convolution libraries and searches for the best dataflow configurations for training and inference workloads.
3 code implementations • 21 Sep 2023 • Yukang Chen, Shengju Qian, Haotian Tang, Xin Lai, Zhijian Liu, Song Han, Jiaya Jia
For example, training on the context length of 8192 needs 16x computational costs in self-attention layers as that of 2048.
10 code implementations • 1 Jun 2023 • Ji Lin, Jiaming Tang, Haotian Tang, Shang Yang, Wei-Ming Chen, Wei-Chen Wang, Guangxuan Xiao, Xingyu Dang, Chuang Gan, Song Han
We propose Activation-aware Weight Quantization (AWQ), a hardware-friendly approach for LLM low-bit weight-only quantization.
1 code implementation • CVPR 2023 • Xuanyao Chen, Zhijian Liu, Haotian Tang, Li Yi, Hang Zhao, Song Han
High-resolution images enable neural networks to learn richer visual representations.
no code implementations • CVPR 2023 • Zhijian Liu, Xinyu Yang, Haotian Tang, Shang Yang, Song Han
Transformer, as an alternative to CNN, has been proven effective in many modalities (e. g., texts and images).
1 code implementation • 19 Oct 2022 • Xueru Wen, Changjiang Zhou, Haotian Tang, Luguang Liang, Yu Jiang, Hong Qi
Named entity recognition is a traditional task in natural language processing.
1 code implementation • 19 Oct 2022 • Xueru Wen, Changjiang Zhou, Haotian Tang, Luguang Liang, Yu Jiang, Hong Qi
Named entity recognition is a fundamental task in natural language processing, identifying the span and category of entities in unstructured texts.
1 code implementation • 26 May 2022 • Zhijian Liu, Haotian Tang, Alexander Amini, Xinyu Yang, Huizi Mao, Daniela Rus, Song Han
Multi-sensor fusion is essential for an accurate and reliable autonomous driving system.
Ranked #4 on 3D Object Detection on nuScenes
no code implementations • 25 Apr 2022 • Han Cai, Ji Lin, Yujun Lin, Zhijian Liu, Haotian Tang, Hanrui Wang, Ligeng Zhu, Song Han
Deep neural networks (DNNs) have achieved unprecedented success in the field of artificial intelligence (AI), including computer vision, natural language processing and speech recognition.
no code implementations • 25 Apr 2022 • Zhijian Liu, Haotian Tang, Shengyu Zhao, Kevin Shao, Song Han
3D neural networks are widely used in real-world applications (e. g., AR/VR headsets, self-driving cars).
1 code implementation • 21 Apr 2022 • Haotian Tang, Zhijian Liu, Xiuyu Li, Yujun Lin, Song Han
TorchSparse directly optimizes the two bottlenecks of sparse convolution: irregular computation and data movement.
6 code implementations • ECCV 2020 • Haotian Tang, Zhijian Liu, Shengyu Zhao, Yujun Lin, Ji Lin, Hanrui Wang, Song Han
Self-driving cars need to understand 3D scenes efficiently and accurately in order to drive safely.
Ranked #1 on Robust 3D Semantic Segmentation on SemanticKITTI-C
4 code implementations • NeurIPS 2019 • Zhijian Liu, Haotian Tang, Yujun Lin, Song Han
The computation cost and memory footprints of the voxel-based models grow cubically with the input resolution, making it memory-prohibitive to scale up the resolution.
Ranked #1 on 3D Object Detection on KITTI Pedestrian Hard val