ProxyBNN: Learning Binarized Neural Networks via Proxy Matrices

no code implementations ECCV 2020 Xiangyu He, Zitao Mo, Ke Cheng, Weixiang Xu, Qinghao Hu, Peisong Wang, Qingshan Liu, Jian Cheng

The matrix composed of basis vectors is referred to as the proxy matrix, and auxiliary variables serve as the coefficients of this linear combination.

FastGL: A GPU-Efficient Framework for Accelerating Sampling-Based GNN Training at Large Scale

1 code implementation23 Sep 2024 Zeyu Zhu, Peisong Wang, Qinghao Hu, Gang Li, Xiaoyao Liang, Jian Cheng

However, through an in-depth analysis, we observe that the efficiency of existing sampling-based training frameworks is still limited due to the key bottlenecks lying in all three phases of sampling-based training, i. e., subgraph sample, memory IO, and computation.

LongVILA: Scaling Long-Context Visual Language Models for Long Videos

1 code implementation19 Aug 2024 Yukang Chen, Fuzhao Xue, Dacheng Li, Qinghao Hu, Ligeng Zhu, Xiuyu Li, Yunhao Fang, Haotian Tang, Shang Yang, Zhijian Liu, Ethan He, Hongxu Yin, Pavlo Molchanov, Jan Kautz, Linxi Fan, Yuke Zhu, Yao Lu, Song Han

We introduce the long-context Multi-Modal Sequence Parallelism (MM-SP) system that efficiently parallelizes long video training and inference, enabling 2M context length training on 256 GPUs without any gradient checkpointing.

TorchGT: A Holistic System for Large-scale Graph Transformer Training

no code implementations19 Jul 2024 Meng Zhang, Jie Sun, Qinghao Hu, Peng Sun, Zeke Wang, Yonggang Wen, Tianwei Zhang

While there emerge inspiring algorithm advancements, their practical adoption is still limited, particularly on real-world graphs involving up to millions of nodes.

DeltaZip: Efficient Serving of Multiple Full-Model-Tuned LLMs

1 code implementation8 Dec 2023 Xiaozhe Yao, Qinghao Hu, Ana Klimovic

Fine-tuning large language models (LLMs) greatly improves model quality for downstream tasks.

SpikingNeRF: Making Bio-inspired Neural Networks See through the Real World

1 code implementation20 Sep 2023 Xingting Yao, Qinghao Hu, Fei Zhou, Tielong Liu, Zitao Mo, Zeyu Zhu, Zhengyang Zhuge, Jian Cheng

In SpikingNeRF, each sampled point on the ray is matched to a particular time step and represented in a hybrid manner where the voxel grids are maintained as well.

Boosting Distributed Full-graph GNN Training with Asynchronous One-bit Communication

no code implementations2 Mar 2023 Meng Zhang, Qinghao Hu, Peng Sun, Yonggang Wen, Tianwei Zhang

Training Graph Neural Networks (GNNs) on large graphs is challenging due to the conflict between the high memory demand and limited GPU memory.


$\rm A^2Q$: Aggregation-Aware Quantization for Graph Neural Networks

1 code implementation1 Feb 2023 Zeyu Zhu, Fanrong Li, Zitao Mo, Qinghao Hu, Gang Li, Zejian Liu, Xiaoyao Liang, Jian Cheng

Through an in-depth analysis of the topology of GNNs, we observe that the topology of the graph leads to significant differences between nodes, and most of the nodes in a graph appear to have a small aggregation value.


DATE: Dual Assignment for End-to-End Fully Convolutional Object Detection

1 code implementation25 Nov 2022 Yiqun Chen, Qiang Chen, Qinghao Hu, Jian Cheng

In this paper, we revisit these two assignment methods and find that bringing one-to-many assignment back to end-to-end fully convolutional detectors helps with model convergence.

Accumulated Trivial Attention Matters in Vision Transformers on Small Datasets

1 code implementation22 Oct 2022 Xiangyu Chen, Qinghao Hu, Kaidong Li, Cuncong Zhong, Guanghui Wang

After carefully examining the self-attention modules, we discover that the number of trivial attention weights is far greater than the important ones and the accumulated trivial weights are dominating the attention in Vision Transformers due to their large quantity, which is not handled by the attention itself.

PalQuant: Accelerating High-precision Networks on Low-precision Accelerators

1 code implementation3 Aug 2022 Qinghao Hu, Gang Li, Qiman Wu, Jian Cheng

In this paper, we propose the PArallel Low-precision Quantization (PalQuant) method that approximates high-precision computations via learning parallel low-precision representations from scratch.

Soft Threshold Ternary Networks

1 code implementation4 Apr 2022 Weixiang Xu, Xiangyu He, Tianli Zhao, Qinghao Hu, Peisong Wang, Jian Cheng

The latest STTN shows that ResNet-18 with ternary weights and ternary activations achieves up to 68. 2% Top-1 accuracy on ImageNet.


Revisiting Quantization Error in Face Alignment

no code implementations ICCV Workshop 2021 Xing Lan, Qinghao Hu, Jian Cheng

The statistical re- sults show the NME generated by quantization error is even larger than 1/3 of the SOTA item, which is a serious obsta- cle for making a new breakthrough in face alignment.

Characterization and Prediction of Deep Learning Workloads in Large-Scale GPU Datacenters

1 code implementation3 Sep 2021 Qinghao Hu, Peng Sun, Shengen Yan, Yonggang Wen, Tianwei Zhang

Modern GPU datacenters are critical for delivering Deep Learning (DL) models and services in both the research community and industry.

Architecture Aware Latency Constrained Sparse Neural Networks

no code implementations1 Sep 2021 Tianli Zhao, Qinghao Hu, Xiangyu He, Weixiang Xu, Jiaxing Wang, Cong Leng, Jian Cheng

Acceleration of deep neural networks to meet a specific latency constraint is essential for their deployment on mobile devices.

HIH: Towards More Accurate Face Alignment via Heatmap in Heatmap

1 code implementation7 Apr 2021 Xing Lan, Qinghao Hu, Qiang Chen, Jian Xue, Jian Cheng

In particular, our HIH reaches 4. 08 NME (Normalized Mean Error) on WFLW, and 3. 21 on COFW, which exceeds previous methods by a significant margin.

Generative Zero-shot Network Quantization

no code implementations21 Jan 2021 Xiangyu He, Qinghao Hu, Peisong Wang, Jian Cheng

Convolutional neural networks are able to learn realistic image priors from numerous training samples in low-level image generation and restoration.

A System-Level Solution for Low-Power Object Detection

no code implementations24 Sep 2019 Fanrong Li, Zitao Mo, Peisong Wang, Zejian Liu, Jiayun Zhang, Gang Li, Qinghao Hu, Xiangyu He, Cong Leng, Yang Zhang, Jian Cheng

As a case study, we evaluate our object detection system on a real-world surveillance video with input size of 512x512, and it turns out that the system can achieve an inference speed of 18 fps at the cost of 6. 9W (with display) with an mAP of 66. 4 verified on the PASCAL VOC 2012 dataset.

AirFace: Lightweight and Efficient Model for Face Recognition

1 code implementation29 Jul 2019 Xianyang Li, Feng Wang, Qinghao Hu, Cong Leng

With the development of convolutional neural network, significant progress has been made in computer vision tasks.

Compact Global Descriptor for Neural Networks

1 code implementation23 Jul 2019 Xiangyu He, Ke Cheng, Qiang Chen, Qinghao Hu, Peisong Wang, Jian Cheng

Long-range dependencies modeling, widely used in capturing spatiotemporal correlation, has shown to be effective in CNN dominated computer vision tasks.

Training Binary Weight Networks via Semi-Binary Decomposition

no code implementations ECCV 2018 Qinghao Hu, Gang Li, Peisong Wang, Yifan Zhang, Jian Cheng

In this paper, we propose a novel semi-binary decomposition method which decomposes a matrix into two binary matrices and a diagonal matrix.

Semi-Supervised Generative Adversarial Hashing for Image Retrieval

no code implementations ECCV 2018 Guan'an Wang, Qinghao Hu, Jian Cheng, Zeng-Guang Hou

Secondly, we design novel structure of the generative model and the discriminative model to learn the distribution of triplet-wise information in a semi-supervised way.

Two-Step Quantization for Low-Bit Neural Networks

1 code implementation CVPR 2018 Peisong Wang, Qinghao Hu, Yifan Zhang, Chunjie Zhang, Yang Liu, Jian Cheng

In this paper, we propose a simple yet effective Two-Step Quantization (TSQ) framework, by decomposing the network quantization problem into two steps: code learning and transformation function learning based on the learned codes.

From Hashing to CNNs: Training BinaryWeight Networks via Hashing

no code implementations8 Feb 2018 Qinghao Hu, Peisong Wang, Jian Cheng

To achieve this goal, we propose a novel approach named BWNH to train Binary Weight Networks via Hashing.

Recent Advances in Efficient Computation of Deep Convolutional Neural Networks

no code implementations3 Feb 2018 Jian Cheng, Peisong Wang, Gang Li, Qinghao Hu, Hanqing Lu

As for hardware implementation of deep neural networks, a batch of accelerators based on FPGA/ASIC have been proposed in recent years.

Quantized Convolutional Neural Networks for Mobile Devices

1 code implementation CVPR 2016 Jiaxiang Wu, Cong Leng, Yuhang Wang, Qinghao Hu, Jian Cheng

Recently, convolutional neural networks (CNN) have demonstrated impressive performance in various computer vision tasks.

