no code implementations • 3 Jan 2025 • Yunzhe Li, Facheng Hu, Hongzi Zhu, Quan Liu, Xiaoke Zhao, Jiangang Shen, Shan Chang, Minyi Guo
To achieve uncontrolled online prediction on mobile devices, referred to as the flexible user perception (FUP) problem, is attractive but hard.
1 code implementation • 18 Dec 2024 • Jiacheng Liu, Peng Tang, Wenfeng Wang, Yuhang Ren, Xiaofeng Hou, Pheng-Ann Heng, Minyi Guo, Chao Li
This comprehensive survey systematically analyzes the current landscape of inference optimization techniques for MoE models across the entire system stack.
no code implementations • 4 Dec 2024 • Guangda Liu, Chengwei Li, Jieru Zhao, Chenqi Zhang, Minyi Guo
To achieve efficient and accurate recallable KV cache compression, we introduce ClusterKV, which recalls tokens at the granularity of semantic clusters.
1 code implementation • 24 Nov 2024 • Zhengyi Li, Kang Yang, Jin Tan, Wen-jie Lu, Haoqi Wu, Xiao Wang, Yu Yu, Derun Zhao, Yancheng Zheng, Minyi Guo, Jingwen Leng
For the linear layer, we propose a new 2PC paradigm along with an encoding approach to securely compute matrix multiplications based on an outer-product insight, which achieves $2. 9\times \sim 12. 5\times$ performance improvements compared to the state-of-the-art (SOTA) protocol.
no code implementations • 3 Nov 2024 • Peng Tang, Jiacheng Liu, Xiaofeng Hou, YiFei PU, Jing Wang, Pheng-Ann Heng, Chao Li, Minyi Guo
We present HOBBIT, a mixed precision expert offloading system to enable flexible and efficient MoE inference.
no code implementations • 28 Oct 2024 • Kunyun Wang, Jieru Zhao, Shuo Yang, Wenchao Ding, Minyi Guo
To address these issues, we propose a memory-efficient scheduling method to eliminate memory overhead and an online adjustment mechanism to minimize accuracy degradation.
no code implementations • 11 Sep 2024 • Zhenyu Ning, Jieru Zhao, Qihao Jin, Wenchao Ding, Minyi Guo
In this paper, we introduce Inf-MLLM, an efficient inference framework for MLLMs, which enable streaming inference of MLLM on a single GPU with infinite context.
1 code implementation • 22 Jul 2024 • Jiale Xu, Rui Zhang, Cong Guo, Weiming Hu, Zihan Liu, Feiyang Wu, Yu Feng, Shixuan Sun, Changxu Shao, Yuhong Guo, Junping Zhao, Ke Zhang, Minyi Guo, Jingwen Leng
This study introduces the vTensor, an innovative tensor structure for LLM inference based on GPU virtual memory management (VMM).
no code implementations • 21 Jul 2024 • Mingzhe Gao, Jieru Zhao, Zhe Lin, Wenchao Ding, Xiaofeng Hou, Yu Feng, Chao Li, Minyi Guo
Recently, the use of large language models (LLMs) for software code generation, e. g., C/C++ and Python, has proven a great success.
no code implementations • 13 Jun 2024 • Yunsong Zhou, Michael Simon, Zhenghao Peng, Sicheng Mo, Hongzi Zhu, Minyi Guo, Bolei Zhou
In this work, we introduce a simulator-conditioned scene generation framework called SimGen that can learn to generate diverse driving scenes by mixing data from the simulator and the real world.
no code implementations • 24 Mar 2024 • Chunyu Xue, Weihao Cui, Han Zhao, Quan Chen, Shulai Zhang, Pengyu Yang, Jing Yang, Shaobo Li, Minyi Guo
The exponentially enlarged scheduling space and ever-changing optimal parallelism plan from adaptive parallelism together result in the contradiction between low-overhead and accurate performance data acquisition for efficient cluster scheduling.
1 code implementation • 7 Mar 2024 • Yunsong Zhou, Linyan Huang, Qingwen Bu, Jia Zeng, Tianyu Li, Hang Qiu, Hongzi Zhu, Minyi Guo, Yu Qiao, Hongyang Li
Hereby, we introduce the Embodied Language Model (ELM), a comprehensive framework tailored for agents' understanding of driving scenes with large spatial and temporal spans.
1 code implementation • CVPR 2024 • Quan Liu, Hongzi Zhu, Zhenxi Wang, Yunsong Zhou, Shan Chang, Minyi Guo
Registration of point clouds collected from a pair of distant vehicles provides a comprehensive and accurate 3D view of the driving scenario, which is vital for driving safety related applications, yet existing literature suffers from the expensive pose label acquisition and the deficiency to generalize to new data distributions.
1 code implementation • 14 Jan 2024 • Mingzhe Gao, Jieru Zhao, Zhe Lin, Minyi Guo
High-level synthesis (HLS) notably speeds up the hardware design process by avoiding RTL programming.
no code implementations • 27 Sep 2023 • Jiawen Wang, Quan Chen, Deze Zeng, Zhuo Song, Chen Chen, Minyi Guo
With the collaborative serving mechanism, only part of node representations are updated during the update phase, and the final representations are calculated in the inference phase.
no code implementations • 16 Aug 2023 • Shuwen Lu, Zhihui Zhang, Cong Guo, Jingwen Leng, Yangjie Zhou, Minyi Guo
However, designing GNN accelerators faces two fundamental challenges: the high bandwidth requirement of GNN models and the diversity of GNN models.
no code implementations • 23 Jul 2023 • Guan Shen, Jieru Zhao, Zeke Wang, Zhe Lin, Wenchao Ding, Chentao Wu, Quan Chen, Minyi Guo
Along with the fast evolution of deep neural networks, the hardware system is also developing rapidly.
2 code implementations • ICCV 2023 • Quan Liu, Hongzi Zhu, Yunsong Zhou, Hongyang Li, Shan Chang, Minyi Guo
Registration of distant outdoor LiDAR point clouds is crucial to extending the 3D vision of collaborative autonomous vehicles, and yet is challenging due to small overlapping area and a huge disparity between observed point densities.
Ranked #1 on Point Cloud Registration on nuScenes (Distant PCR)
no code implementations • 27 May 2023 • Yangjie Zhou, Yaoxu Song, Jingwen Leng, Zihan Liu, Weihao Cui, Zhendong Zhang, Cong Guo, Quan Chen, Li Li, Minyi Guo
Graph neural networks (GNNs) are powerful tools for exploring and learning from graph structures and features.
1 code implementation • 4 May 2023 • Quan Liu, Yunsong Zhou, Hongzi Zhu, Shan Chang, Minyi Guo
Such features are then used for online distant point cloud registration.
Ranked #3 on Point Cloud Registration on nuScenes (Distant PCR)
no code implementations • CVPR 2023 • Yunsong Zhou, Hongzi Zhu, Quan Liu, Shan Chang, Minyi Guo
Mobile monocular 3D object detection (Mono3D) (e. g., on a vehicle, a drone, or a robot) is an important yet challenging task.
no code implementations • 23 Mar 2023 • Yunsong Zhou, Quan Liu, Hongzi Zhu, Yunzhe Li, Shan Chang, Minyi Guo
To this end, we utilize a pose detection network to estimate the pose of the camera and then construct a feature map portraying pixel-level ground depth according to the 3D-to-2D perspective geometry.
no code implementations • 22 Sep 2022 • Cong Guo, Yuxian Qiu, Jingwen Leng, Chen Zhang, Ying Cao, Quanlu Zhang, Yunxin Liu, Fan Yang, Minyi Guo
An activation function is an element-wise mathematical function and plays a crucial role in deep neural networks (DNN).
1 code implementation • 30 Aug 2022 • Cong Guo, Chen Zhang, Jingwen Leng, Zihan Liu, Fan Yang, Yunxin Liu, Minyi Guo, Yuhao Zhu
In this work, we propose a fixed-length adaptive numerical data type called ANT to achieve low-bit quantization with tiny hardware overheads.
no code implementations • 25 Aug 2022 • Zhengyi Li, Cong Guo, Zhanda Zhu, Yangjie Zhou, Yuxian Qiu, Xiaotian Gao, Jingwen Leng, Minyi Guo
To deal with the runtime overhead, we use a coarse-grained version of the border function.
no code implementations • 29 Jun 2022 • Guan Shen, Jieru Zhao, Quan Chen, Jingwen Leng, Chao Li, Minyi Guo
However, the quadratic complexity of self-attention w. r. t the sequence length incurs heavy computational and memory burdens, especially for tasks with long sequences.
1 code implementation • ACL 2022 • Yue Guan, Zhengyi Li, Jingwen Leng, Zhouhan Lin, Minyi Guo
To address the above limitations, we propose the Transkimmer architecture, which learns to identify hidden state tokens that are not required by each layer.
1 code implementation • ICLR 2022 • Cong Guo, Yuxian Qiu, Jingwen Leng, Xiaotian Gao, Chen Zhang, Yunxin Liu, Fan Yang, Yuhao Zhu, Minyi Guo
This paper proposes an on-the-fly DFQ framework with sub-second quantization time, called SQuant, which can quantize networks on inference-only devices with low computation and memory requirements.
1 code implementation • 16 Dec 2021 • Yue Guan, Zhengyi Li, Jingwen Leng, Zhouhan Lin, Minyi Guo, Yuhao Zhu
We further prune the hidden states corresponding to the unnecessary positions early in lower layers, achieving significant inference-time speedup.
no code implementations • 8 Sep 2021 • Shulai Zhang, Zirui Li, Quan Chen, Wenli Zheng, Jingwen Leng, Minyi Guo
Federated learning (FL) is a distributed machine learning paradigm that allows clients to collaboratively train a model over their own local data.
no code implementations • ICCV 2021 • Yunsong Zhou, Hongzi Zhu, Chunqin Li, Tiankai Cui, Shan Chang, Minyi Guo
In this paper, we propose a light-weight semantic segmentation framework for large-scale point cloud series, called TempNet, which can improve both the accuracy and the stability of existing semantic segmentation models by combining a novel frame aggregation scheme.
no code implementations • 1 Jan 2021 • Yue Guan, Jingwen Leng, Yuhao Zhu, Minyi Guo
Following this idea, we proposed Block Skim Transformer (BST) to improve and accelerate the processing of transformer QA models.
no code implementations • COLING 2020 • Yue Guan, Jingwen Leng, Chao Li, Quan Chen, Minyi Guo
Recent research on the multi-head attention mechanism, especially that in pre-trained models such as BERT, has shown us heuristics and clues in analyzing various aspects of the mechanism.
no code implementations • 2 Nov 2020 • Yue Guan, Jingwen Leng, Chao Li, Quan Chen, Minyi Guo
Recent research on the multi-head attention mechanism, especially that in pre-trained models such as BERT, has shown us heuristics and clues in analyzing various aspects of the mechanism.
no code implementations • 2 Sep 2020 • Zhihui Zhang, Jingwen Leng, Lingxiao Ma, Youshan Miao, Chao Li, Minyi Guo
Graph neural networks (GNN) represent an emerging line of deep learning models that operate on graph structures.
1 code implementation • 29 Aug 2020 • Cong Guo, Bo Yang Hsueh, Jingwen Leng, Yuxian Qiu, Yue Guan, Zehuan Wang, Xiaoying Jia, Xipeng Li, Minyi Guo, Yuhao Zhu
Network pruning can reduce the high computation cost of deep neural network (DNN) models.
no code implementations • 18 Feb 2020 • Cong Guo, Yangjie Zhou, Jingwen Leng, Yuhao Zhu, Zidong Du, Quan Chen, Chao Li, Bin Yao, Minyi Guo
We propose Simultaneous Multi-mode Architecture (SMA), a novel architecture design and execution model that offers general-purpose programmability on DNN accelerators in order to accelerate end-to-end applications.
no code implementations • CVPR 2019 • Yuxian Qiu, Jingwen Leng, Cong Guo, Quan Chen, Chao Li, Minyi Guo, Yuhao Zhu
Recently, researchers have started decomposing deep neural network models according to their semantics or functions.
no code implementations • 12 Apr 2019 • Shiheng Ma, Jingcai Guo, Song Guo, Minyi Guo
Our approach employs the inception backbone network to capture rich features of traffic distribution on the whole area.
8 code implementations • 18 Mar 2019 • Hongwei Wang, Miao Zhao, Xing Xie, Wenjie Li, Minyi Guo
To alleviate sparsity and cold start problem of collaborative filtering based recommender systems, researchers and engineers usually collect attributes of users and items, and design delicate algorithms to exploit these additional information.
Ranked #1 on Click-Through Rate Prediction on Book-Crossing
3 code implementations • 23 Jan 2019 • Hongwei Wang, Fuzheng Zhang, Miao Zhao, Wenjie Li, Xing Xie, Minyi Guo
Collaborative filtering often suffers from sparsity and cold start problems in real recommendation scenarios, therefore, researchers and engineers usually use side information to address the issues and improve the performance of recommender systems.
no code implementations • 27 Sep 2018 • Yuxian Qiu, Jingwen Leng, Yuhao Zhu, Quan Chen, Chao Li, Minyi Guo
Despite their enormous success, there is still no solid understanding of deep neural network’s working mechanism.
9 code implementations • 9 Mar 2018 • Hongwei Wang, Fuzheng Zhang, Jialin Wang, Miao Zhao, Wenjie Li, Xing Xie, Minyi Guo
To address the sparsity and cold start problem of collaborative filtering, researchers usually make use of side information, such as social networks or item attributes, to improve recommendation performance.
Ranked #2 on Click-Through Rate Prediction on Book-Crossing
no code implementations • 6 Mar 2018 • Huan Yang, Baoyuan Wang, Noranart Vesdapunt, Minyi Guo, Sing Bing Kang
We propose a reinforcement learning approach for real-time exposure control of a mobile camera that is personalizable.
4 code implementations • 25 Jan 2018 • Hongwei Wang, Fuzheng Zhang, Xing Xie, Minyi Guo
To solve the above problems, in this paper, we propose a deep knowledge-aware network (DKN) that incorporates knowledge graph representation into news recommendation.
Ranked #5 on News Recommendation on MIND
1 code implementation • 3 Dec 2017 • Hongwei Wang, Fuzheng Zhang, Min Hou, Xing Xie, Minyi Guo, Qi Liu
First, due to the lack of explicit sentiment links in mainstream social networks, we establish a labeled heterogeneous sentiment dataset which consists of users' sentiment relation, social relation and profile knowledge by entity-level sentiment extraction method.
1 code implementation • 3 Dec 2017 • Hongwei Wang, Jia Wang, Miao Zhao, Jiannong Cao, Minyi Guo
JTS-MF model calculates similarity among users and votings by combining their TEWE representation and structural information of social networks, and preserves this topic-semantic-social similarity during matrix factorization.
5 code implementations • 22 Nov 2017 • Hongwei Wang, Jia Wang, Jialin Wang, Miao Zhao, Wei-Nan Zhang, Fuzheng Zhang, Xing Xie, Minyi Guo
The goal of graph representation learning is to embed each vertex in a graph into a low-dimensional vector space.
Ranked #1 on Node Classification on Wikipedia
no code implementations • ICCV 2015 • Huan Yang, Baoyuan Wang, Stephen Lin, David Wipf, Minyi Guo, Baining Guo
With the growing popularity of short-form video sharing platforms such as \em{Instagram} and \em{Vine}, there has been an increasing need for techniques that automatically extract highlights from video.