Search Results for author: Haotian Tang

Found 18 papers, 12 papers with code

LongVILA: Scaling Long-Context Visual Language Models for Long Videos

1 code implementation19 Aug 2024 Fuzhao Xue, Yukang Chen, Dacheng Li, Qinghao Hu, Ligeng Zhu, Xiuyu Li, Yunhao Fang, Haotian Tang, Shang Yang, Zhijian Liu, Ethan He, Hongxu Yin, Pavlo Molchanov, Jan Kautz, Linxi Fan, Yuke Zhu, Yao Lu, Song Han

We introduce the long-context Multi-Modal Sequence Parallelism (MM-SP) system that efficiently parallelizes long video training and inference, enabling 2M context length training on 256 GPUs without any gradient checkpointing.

Video Captioning Video Understanding

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

1 code implementation7 May 2024 Yujun Lin, Haotian Tang, Shang Yang, Zhekai Zhang, Guangxuan Xiao, Chuang Gan, Song Han

The key insight driving QServe is that the efficiency of LLM serving on GPUs is critically influenced by operations on low-throughput CUDA cores.

Language Modelling Large Language Model +1

MoST: Multi-modality Scene Tokenization for Motion Prediction

no code implementations CVPR 2024 Norman Mu, Jingwei Ji, Zhenpei Yang, Nate Harada, Haotian Tang, Kan Chen, Charles R. Qi, Runzhou Ge, Kratarth Goel, Zoey Yang, Scott Ettinger, Rami Al-Rfou, Dragomir Anguelov, Yin Zhou

This symbolic representation is a high-level abstraction of the real world, which may render the motion prediction model vulnerable to perception errors (e. g., failures in detecting open-vocabulary obstacles) while missing salient information from the scene context (e. g., poor road conditions).

General Knowledge motion prediction

TorchSparse++: Efficient Training and Inference Framework for Sparse Convolution on GPUs

1 code implementation25 Oct 2023 Haotian Tang, Shang Yang, Zhijian Liu, Ke Hong, Zhongming Yu, Xiuyu Li, Guohao Dai, Yu Wang, Song Han

On top of this, we design the Sparse Autotuner, which extends the design space of existing sparse convolution libraries and searches for the best dataflow configurations for training and inference workloads.

Autonomous Driving Recommendation Systems

LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

3 code implementations21 Sep 2023 Yukang Chen, Shengju Qian, Haotian Tang, Xin Lai, Zhijian Liu, Song Han, Jiaya Jia

For example, training on the context length of 8192 needs 16x computational costs in self-attention layers as that of 2048.

4k Instruction Following +3

Type-supervised sequence labeling based on the heterogeneous star graph for named entity recognition

1 code implementation19 Oct 2022 Xueru Wen, Changjiang Zhou, Haotian Tang, Luguang Liang, Yu Jiang, Hong Qi

Named entity recognition is a fundamental task in natural language processing, identifying the span and category of entities in unstructured texts.

Graph Attention named-entity-recognition +4

Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications

no code implementations25 Apr 2022 Han Cai, Ji Lin, Yujun Lin, Zhijian Liu, Haotian Tang, Hanrui Wang, Ligeng Zhu, Song Han

Deep neural networks (DNNs) have achieved unprecedented success in the field of artificial intelligence (AI), including computer vision, natural language processing and speech recognition.

Model Compression Neural Architecture Search +3

TorchSparse: Efficient Point Cloud Inference Engine

1 code implementation21 Apr 2022 Haotian Tang, Zhijian Liu, Xiuyu Li, Yujun Lin, Song Han

TorchSparse directly optimizes the two bottlenecks of sparse convolution: irregular computation and data movement.

Autonomous Driving

Point-Voxel CNN for Efficient 3D Deep Learning

4 code implementations NeurIPS 2019 Zhijian Liu, Haotian Tang, Yujun Lin, Song Han

The computation cost and memory footprints of the voxel-based models grow cubically with the input resolution, making it memory-prohibitive to scale up the resolution.

3D Object Detection 3D Semantic Segmentation +2

Cannot find the paper you are looking for? You can Submit a new open access paper.