Search Results for author: Minyi Guo

Found 49 papers, 20 papers with code

Prism: Mining Task-aware Domains in Non-i.i.d. IMU Data for Flexible User Perception

no code implementations3 Jan 2025 Yunzhe Li, Facheng Hu, Hongzi Zhu, Quan Liu, Xiaoke Zhao, Jiangang Shen, Shan Chang, Minyi Guo

To achieve uncontrolled online prediction on mobile devices, referred to as the flexible user perception (FUP) problem, is attractive but hard.

A Survey on Inference Optimization Techniques for Mixture of Experts Models

1 code implementation18 Dec 2024 Jiacheng Liu, Peng Tang, Wenfeng Wang, Yuhang Ren, Xiaofeng Hou, Pheng-Ann Heng, Minyi Guo, Chao Li

This comprehensive survey systematically analyzes the current landscape of inference optimization techniques for MoE models across the entire system stack.

Computational Efficiency Distributed Computing +4

ClusterKV: Manipulating LLM KV Cache in Semantic Space for Recallable Compression

no code implementations4 Dec 2024 Guangda Liu, Chengwei Li, Jieru Zhao, Chenqi Zhang, Minyi Guo

To achieve efficient and accurate recallable KV cache compression, we introduce ClusterKV, which recalls tokens at the granularity of semantic clusters.

2k Logical Reasoning

Nimbus: Secure and Efficient Two-Party Inference for Transformers

1 code implementation24 Nov 2024 Zhengyi Li, Kang Yang, Jin Tan, Wen-jie Lu, Haoqi Wu, Xiao Wang, Yu Yu, Derun Zhao, Yancheng Zheng, Minyi Guo, Jingwen Leng

For the linear layer, we propose a new 2PC paradigm along with an encoding approach to securely compute matrix multiplications based on an outer-product insight, which achieves $2. 9\times \sim 12. 5\times$ performance improvements compared to the state-of-the-art (SOTA) protocol.

HOBBIT: A Mixed Precision Expert Offloading System for Fast MoE Inference

no code implementations3 Nov 2024 Peng Tang, Jiacheng Liu, Xiaofeng Hou, YiFei PU, Jing Wang, Pheng-Ann Heng, Chao Li, Minyi Guo

We present HOBBIT, a mixed precision expert offloading system to enable flexible and efficient MoE inference.

SparseTem: Boosting the Efficiency of CNN-Based Video Encoders by Exploiting Temporal Continuity

no code implementations28 Oct 2024 Kunyun Wang, Jieru Zhao, Shuo Yang, Wenchao Ding, Minyi Guo

To address these issues, we propose a memory-efficient scheduling method to eliminate memory overhead and an online adjustment mechanism to minimize accuracy degradation.

Autonomous Driving object-detection +2

Inf-MLLM: Efficient Streaming Inference of Multimodal Large Language Models on a Single GPU

no code implementations11 Sep 2024 Zhenyu Ning, Jieru Zhao, Qihao Jin, Wenchao Ding, Minyi Guo

In this paper, we introduce Inf-MLLM, an efficient inference framework for MLLMs, which enable streaming inference of MLLM on a single GPU with infinite context.

Autonomous Driving

vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving

1 code implementation22 Jul 2024 Jiale Xu, Rui Zhang, Cong Guo, Weiming Hu, Zihan Liu, Feiyang Wu, Yu Feng, Shixuan Sun, Changxu Shao, Yuhong Guo, Junping Zhao, Ke Zhang, Minyi Guo, Jingwen Leng

This study introduces the vTensor, an innovative tensor structure for LLM inference based on GPU virtual memory management (VMM).

Management

AutoVCoder: A Systematic Framework for Automated Verilog Code Generation using LLMs

no code implementations21 Jul 2024 Mingzhe Gao, Jieru Zhao, Zhe Lin, Wenchao Ding, Xiaofeng Hou, Yu Feng, Chao Li, Minyi Guo

Recently, the use of large language models (LLMs) for software code generation, e. g., C/C++ and Python, has proven a great success.

Code Generation Dataset Generation +1

SimGen: Simulator-conditioned Driving Scene Generation

no code implementations13 Jun 2024 Yunsong Zhou, Michael Simon, Zhenghao Peng, Sicheng Mo, Hongzi Zhu, Minyi Guo, Bolei Zhou

In this work, we introduce a simulator-conditioned scene generation framework called SimGen that can learn to generate diverse driving scenes by mixing data from the simulator and the real world.

Autonomous Driving Data Augmentation +3

A Codesign of Scheduling and Parallelization for Large Model Training in Heterogeneous Clusters

no code implementations24 Mar 2024 Chunyu Xue, Weihao Cui, Han Zhao, Quan Chen, Shulai Zhang, Pengyu Yang, Jing Yang, Shaobo Li, Minyi Guo

The exponentially enlarged scheduling space and ever-changing optimal parallelism plan from adaptive parallelism together result in the contradiction between low-overhead and accurate performance data acquisition for efficient cluster scheduling.

Scheduling

Embodied Understanding of Driving Scenarios

1 code implementation7 Mar 2024 Yunsong Zhou, Linyan Huang, Qingwen Bu, Jia Zeng, Tianyu Li, Hang Qiu, Hongzi Zhu, Minyi Guo, Yu Qiao, Hongyang Li

Hereby, we introduce the Embodied Language Model (ELM), a comprehensive framework tailored for agents' understanding of driving scenes with large spatial and temporal spans.

Autonomous Driving Language Modeling +2

Extend Your Own Correspondences: Unsupervised Distant Point Cloud Registration by Progressive Distance Extension

1 code implementation CVPR 2024 Quan Liu, Hongzi Zhu, Zhenxi Wang, Yunsong Zhou, Shan Chang, Minyi Guo

Registration of point clouds collected from a pair of distant vehicles provides a comprehensive and accurate 3D view of the driving scenario, which is vital for driving safety related applications, yet existing literature suffers from the expensive pose label acquisition and the deficiency to generalize to new data distributions.

Point Cloud Registration

STAG: Enabling Low Latency and Low Staleness of GNN-based Services with Dynamic Graphs

no code implementations27 Sep 2023 Jiawen Wang, Quan Chen, Deze Zeng, Zhuo Song, Chen Chen, Minyi Guo

With the collaborative serving mechanism, only part of node representations are updated during the update phase, and the final representations are calculated in the inference phase.

Accelerating Generic Graph Neural Networks via Architecture, Compiler, Partition Method Co-Design

no code implementations16 Aug 2023 Shuwen Lu, Zhihui Zhang, Cong Guo, Jingwen Leng, Yangjie Zhou, Minyi Guo

However, designing GNN accelerators faces two fundamental challenges: the high bandwidth requirement of GNN models and the diversity of GNN models.

Graph Learning graph partitioning

Density-invariant Features for Distant Point Cloud Registration

2 code implementations ICCV 2023 Quan Liu, Hongzi Zhu, Yunsong Zhou, Hongyang Li, Shan Chang, Minyi Guo

Registration of distant outdoor LiDAR point clouds is crucial to extending the 3D vision of collaborative autonomous vehicles, and yet is challenging due to small overlapping area and a huge disparity between observed point densities.

Autonomous Vehicles Contrastive Learning +1

AdaptGear: Accelerating GNN Training via Adaptive Subgraph-Level Kernels on GPUs

no code implementations27 May 2023 Yangjie Zhou, Yaoxu Song, Jingwen Leng, Zihan Liu, Weihao Cui, Zhendong Zhang, Cong Guo, Quan Chen, Li Li, Minyi Guo

Graph neural networks (GNNs) are powerful tools for exploring and learning from graph structures and features.

MoGDE: Boosting Mobile Monocular 3D Object Detection with Ground Depth Estimation

no code implementations23 Mar 2023 Yunsong Zhou, Quan Liu, Hongzi Zhu, Yunzhe Li, Shan Chang, Minyi Guo

To this end, we utilize a pose detection network to estimate the pose of the camera and then construct a feature map portraying pixel-level ground depth according to the 3D-to-2D perspective geometry.

Depth Estimation Monocular 3D Object Detection +1

Nesting Forward Automatic Differentiation for Memory-Efficient Deep Neural Network Training

no code implementations22 Sep 2022 Cong Guo, Yuxian Qiu, Jingwen Leng, Chen Zhang, Ying Cao, Quanlu Zhang, Yunxin Liu, Fan Yang, Minyi Guo

An activation function is an element-wise mathematical function and plays a crucial role in deep neural networks (DNN).

ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network Quantization

1 code implementation30 Aug 2022 Cong Guo, Chen Zhang, Jingwen Leng, Zihan Liu, Fan Yang, Yunxin Liu, Minyi Guo, Yuhao Zhu

In this work, we propose a fixed-length adaptive numerical data type called ANT to achieve low-bit quantization with tiny hardware overheads.

Quantization

SALO: An Efficient Spatial Accelerator Enabling Hybrid Sparse Attention Mechanisms for Long Sequences

no code implementations29 Jun 2022 Guan Shen, Jieru Zhao, Quan Chen, Jingwen Leng, Chao Li, Minyi Guo

However, the quadratic complexity of self-attention w. r. t the sequence length incurs heavy computational and memory burdens, especially for tasks with long sequences.

Transkimmer: Transformer Learns to Layer-wise Skim

1 code implementation ACL 2022 Yue Guan, Zhengyi Li, Jingwen Leng, Zhouhan Lin, Minyi Guo

To address the above limitations, we propose the Transkimmer architecture, which learns to identify hidden state tokens that are not required by each layer.

Computational Efficiency

SQuant: On-the-Fly Data-Free Quantization via Diagonal Hessian Approximation

1 code implementation ICLR 2022 Cong Guo, Yuxian Qiu, Jingwen Leng, Xiaotian Gao, Chen Zhang, Yunxin Liu, Fan Yang, Yuhao Zhu, Minyi Guo

This paper proposes an on-the-fly DFQ framework with sub-second quantization time, called SQuant, which can quantize networks on inference-only devices with low computation and memory requirements.

Data Free Quantization

Block-Skim: Efficient Question Answering for Transformer

1 code implementation16 Dec 2021 Yue Guan, Zhengyi Li, Jingwen Leng, Zhouhan Lin, Minyi Guo, Yuhao Zhu

We further prune the hidden states corresponding to the unnecessary positions early in lower layers, achieving significant inference-time speedup.

Extractive Question-Answering Question Answering

Dubhe: Towards Data Unbiasedness with Homomorphic Encryption in Federated Learning Client Selection

no code implementations8 Sep 2021 Shulai Zhang, Zirui Li, Quan Chen, Wenli Zheng, Jingwen Leng, Minyi Guo

Federated learning (FL) is a distributed machine learning paradigm that allows clients to collaboratively train a model over their own local data.

Federated Learning

TempNet: Online Semantic Segmentation on Large-Scale Point Cloud Series

no code implementations ICCV 2021 Yunsong Zhou, Hongzi Zhu, Chunqin Li, Tiankai Cui, Shan Chang, Minyi Guo

In this paper, we propose a light-weight semantic segmentation framework for large-scale point cloud series, called TempNet, which can improve both the accuracy and the stability of existing semantic segmentation models by combining a novel frame aggregation scheme.

Autonomous Driving Point Cloud Segmentation +4

Block Skim Transformer for Efficient Question Answering

no code implementations1 Jan 2021 Yue Guan, Jingwen Leng, Yuhao Zhu, Minyi Guo

Following this idea, we proposed Block Skim Transformer (BST) to improve and accelerate the processing of transformer QA models.

Language Modeling Language Modelling +2

How Far Does BERT Look At: Distance-based Clustering and Analysis of BERT's Attention

no code implementations COLING 2020 Yue Guan, Jingwen Leng, Chao Li, Quan Chen, Minyi Guo

Recent research on the multi-head attention mechanism, especially that in pre-trained models such as BERT, has shown us heuristics and clues in analyzing various aspects of the mechanism.

Clustering

How Far Does BERT Look At:Distance-based Clustering and Analysis of BERT$'$s Attention

no code implementations2 Nov 2020 Yue Guan, Jingwen Leng, Chao Li, Quan Chen, Minyi Guo

Recent research on the multi-head attention mechanism, especially that in pre-trained models such as BERT, has shown us heuristics and clues in analyzing various aspects of the mechanism.

Clustering

Architectural Implications of Graph Neural Networks

no code implementations2 Sep 2020 Zhihui Zhang, Jingwen Leng, Lingxiao Ma, Youshan Miao, Chao Li, Minyi Guo

Graph neural networks (GNN) represent an emerging line of deep learning models that operate on graph structures.

Balancing Efficiency and Flexibility for DNN Acceleration via Temporal GPU-Systolic Array Integration

no code implementations18 Feb 2020 Cong Guo, Yangjie Zhou, Jingwen Leng, Yuhao Zhu, Zidong Du, Quan Chen, Chao Li, Bin Yao, Minyi Guo

We propose Simultaneous Multi-mode Architecture (SMA), a novel architecture design and execution model that offers general-purpose programmability on DNN accelerators in order to accelerate end-to-end applications.

Adversarial Defense Through Network Profiling Based Path Extraction

no code implementations CVPR 2019 Yuxian Qiu, Jingwen Leng, Cong Guo, Quan Chen, Chao Li, Minyi Guo, Yuhao Zhu

Recently, researchers have started decomposing deep neural network models according to their semantics or functions.

Adversarial Defense

Position-Aware Convolutional Networks for Traffic Prediction

no code implementations12 Apr 2019 Shiheng Ma, Jingcai Guo, Song Guo, Minyi Guo

Our approach employs the inception backbone network to capture rich features of traffic distribution on the whole area.

Management Position +1

Knowledge Graph Convolutional Networks for Recommender Systems

8 code implementations18 Mar 2019 Hongwei Wang, Miao Zhao, Xing Xie, Wenjie Li, Minyi Guo

To alleviate sparsity and cold start problem of collaborative filtering based recommender systems, researchers and engineers usually collect attributes of users and items, and design delicate algorithms to exploit these additional information.

Click-Through Rate Prediction Collaborative Filtering +3

Multi-Task Feature Learning for Knowledge Graph Enhanced Recommendation

3 code implementations23 Jan 2019 Hongwei Wang, Fuzheng Zhang, Miao Zhao, Wenjie Li, Xing Xie, Minyi Guo

Collaborative filtering often suffers from sparsity and cold start problems in real recommendation scenarios, therefore, researchers and engineers usually use side information to address the issues and improve the performance of recommender systems.

Collaborative Filtering Knowledge Graph Embedding +4

Effective Path: Know the Unknowns of Neural Network

no code implementations27 Sep 2018 Yuxian Qiu, Jingwen Leng, Yuhao Zhu, Quan Chen, Chao Li, Minyi Guo

Despite their enormous success, there is still no solid understanding of deep neural network’s working mechanism.

RippleNet: Propagating User Preferences on the Knowledge Graph for Recommender Systems

9 code implementations9 Mar 2018 Hongwei Wang, Fuzheng Zhang, Jialin Wang, Miao Zhao, Wenjie Li, Xing Xie, Minyi Guo

To address the sparsity and cold start problem of collaborative filtering, researchers usually make use of side information, such as social networks or item attributes, to improve recommendation performance.

Click-Through Rate Prediction Collaborative Filtering +2

DKN: Deep Knowledge-Aware Network for News Recommendation

4 code implementations25 Jan 2018 Hongwei Wang, Fuzheng Zhang, Xing Xie, Minyi Guo

To solve the above problems, in this paper, we propose a deep knowledge-aware network (DKN) that incorporates knowledge graph representation into news recommendation.

Click-Through Rate Prediction Common Sense Reasoning +2

SHINE: Signed Heterogeneous Information Network Embedding for Sentiment Link Prediction

1 code implementation3 Dec 2017 Hongwei Wang, Fuzheng Zhang, Min Hou, Xing Xie, Minyi Guo, Qi Liu

First, due to the lack of explicit sentiment links in mainstream social networks, we establish a labeled heterogeneous sentiment dataset which consists of users' sentiment relation, social relation and profile knowledge by entity-level sentiment extraction method.

Link Prediction Network Embedding +2

Joint Topic-Semantic-aware Social Recommendation for Online Voting

1 code implementation3 Dec 2017 Hongwei Wang, Jia Wang, Miao Zhao, Jiannong Cao, Minyi Guo

JTS-MF model calculates similarity among users and votings by combining their TEWE representation and structural information of social networks, and preserves this topic-semantic-social similarity during matrix factorization.

Unsupervised Extraction of Video Highlights Via Robust Recurrent Auto-encoders

no code implementations ICCV 2015 Huan Yang, Baoyuan Wang, Stephen Lin, David Wipf, Minyi Guo, Baining Guo

With the growing popularity of short-form video sharing platforms such as \em{Instagram} and \em{Vine}, there has been an increasing need for techniques that automatically extract highlights from video.

Cannot find the paper you are looking for? You can Submit a new open access paper.