Search Results for author: Ke Hong

Found 3 papers, 1 papers with code

FlashDecoding++: Faster Large Language Model Inference on GPUs

no code implementations2 Nov 2023 Ke Hong, Guohao Dai, Jiaming Xu, Qiuli Mao, Xiuhong Li, Jun Liu, Kangdi Chen, Yuhan Dong, Yu Wang

A single and static dataflow may lead to a 50. 25% performance loss for GEMMs of different shapes in LLM inference.

Language Modelling Large Language Model

TorchSparse++: Efficient Training and Inference Framework for Sparse Convolution on GPUs

1 code implementation25 Oct 2023 Haotian Tang, Shang Yang, Zhijian Liu, Ke Hong, Zhongming Yu, Xiuyu Li, Guohao Dai, Yu Wang, Song Han

On top of this, we design the Sparse Autotuner, which extends the design space of existing sparse convolution libraries and searches for the best dataflow configurations for training and inference workloads.

Autonomous Driving Recommendation Systems

Ada3D : Exploiting the Spatial Redundancy with Adaptive Inference for Efficient 3D Object Detection

no code implementations ICCV 2023 Tianchen Zhao, Xuefei Ning, Ke Hong, Zhongyuan Qiu, Pu Lu, Yali Zhao, Linfeng Zhang, Lipu Zhou, Guohao Dai, Huazhong Yang, Yu Wang

One reason for this high resource consumption is the presence of a large number of redundant background points in Lidar point clouds, resulting in spatial redundancy in both 3D voxel and dense BEV map representations.

3D Object Detection Autonomous Driving +1

Cannot find the paper you are looking for? You can Submit a new open access paper.