Search Results for author: Xuehai Qian

Found 23 papers, 3 papers with code

RobustState: Boosting Fidelity of Quantum State Preparation via Noise-Aware Variational Training

no code implementations27 Nov 2023 Hanrui Wang, Yilian Liu, Pengyu Liu, Jiaqi Gu, Zirui Li, Zhiding Liang, Jinglei Cheng, Yongshan Ding, Xuehai Qian, Yiyu Shi, David Z. Pan, Frederic T. Chong, Song Han

Arbitrary state preparation algorithms can be broadly categorized into arithmetic decomposition (AD) and variational quantum state preparation (VQSP).

GNNPipe: Scaling Deep GNN Training with Pipelined Model Parallelism

no code implementations19 Aug 2023 Jingji Chen, Zhuoming Chen, Xuehai Qian

Communication is a key bottleneck for distributed graph neural network (GNN) training.

QuEst: Graph Transformer for Quantum Circuit Reliability Estimation

1 code implementation30 Oct 2022 Hanrui Wang, Pengyu Liu, Jinglei Cheng, Zhiding Liang, Jiaqi Gu, Zirui Li, Yongshan Ding, Weiwen Jiang, Yiyu Shi, Xuehai Qian, David Z. Pan, Frederic T. Chong, Song Han

Specifically, the TorchQuantum library also supports using data-driven ML models to solve problems in quantum system research, such as predicting the impact of quantum noise on circuit fidelity and improving the quantum circuit compilation efficiency.

GRIM: A General, Real-Time Deep Learning Inference Framework for Mobile Devices based on Fine-Grained Structured Weight Sparsity

no code implementations25 Aug 2021 Wei Niu, Zhengang Li, Xiaolong Ma, Peiyan Dong, Gang Zhou, Xuehai Qian, Xue Lin, Yanzhi Wang, Bin Ren

It necessitates the sparse model inference via weight pruning, i. e., DNN weight sparsity, and it is desirable to design a new DNN weight sparsity scheme that can facilitate real-time inference on mobile devices while preserving a high sparse model accuracy.

Code Generation Compiler Optimization

FORMS: Fine-grained Polarized ReRAM-based In-situ Computation for Mixed-signal DNN Accelerator

no code implementations16 Jun 2021 Geng Yuan, Payman Behnam, Zhengang Li, Ali Shafiee, Sheng Lin, Xiaolong Ma, Hang Liu, Xuehai Qian, Mahdi Nazm Bojnordi, Yanzhi Wang, Caiwen Ding

With weights stored in the ReRAM crossbar cells as conductance, when the input vector is applied to word lines, the matrix-vector multiplication results can be generated as the current in bit lines.

HASCO: Towards Agile HArdware and Software CO-design for Tensor Computation

1 code implementation4 May 2021 Qingcheng Xiao, Size Zheng, Bingzhe Wu, Pengcheng Xu, Xuehai Qian, Yun Liang

Second, the overall design space composed of HW/SW partitioning, hardware optimization, and software optimization is huge.

Bayesian Optimization Q-Learning

Mix and Match: A Novel FPGA-Centric Deep Neural Network Quantization Framework

no code implementations8 Dec 2020 Sung-En Chang, Yanyu Li, Mengshu Sun, Runbin Shi, Hayden K. -H. So, Xuehai Qian, Yanzhi Wang, Xue Lin

Unlike existing methods that use the same quantization scheme for all weights, we propose the first solution that applies different quantization schemes for different rows of the weight matrix.

Edge-computing Model Compression +1

PERMDNN: Efficient Compressed DNN Architecture with Permuted Diagonal Matrices

no code implementations23 Apr 2020 Chunhua Deng, Siyu Liao, Yi Xie, Keshab K. Parhi, Xuehai Qian, Bo Yuan

On the other hand, the recent structured matrix-based approach (i. e., CirCNN) is limited by the relatively complex arithmetic computation (i. e., FFT), less flexible compression ratio, and its inability to fully utilize input sparsity.

Model Compression

PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning

no code implementations1 Jan 2020 Wei Niu, Xiaolong Ma, Sheng Lin, Shihao Wang, Xuehai Qian, Xue Lin, Yanzhi Wang, Bin Ren

Weight pruning of DNNs is proposed, but existing schemes represent two extremes in the design space: non-structured pruning is fine-grained, accurate, but not hardware friendly; structured pruning is coarse-grained, hardware-efficient, but with higher accuracy loss.

Code Generation Model Compression

Heterogeneity-Aware Asynchronous Decentralized Training

no code implementations17 Sep 2019 Qinyi Luo, Jiaao He, Youwei Zhuo, Xuehai Qian

Is it possible to get the best of both worlds - designing a distributed training method that has both high performance as All-Reduce in homogeneous environment and good heterogeneity tolerance as AD-PSGD?

Scheduling

A Stochastic-Computing based Deep Learning Framework using Adiabatic Quantum-Flux-Parametron SuperconductingTechnology

no code implementations22 Jul 2019 Ruizhe Cai, Ao Ren, Olivia Chen, Ning Liu, Caiwen Ding, Xuehai Qian, Jie Han, Wenhui Luo, Nobuyuki Yoshikawa, Yanzhi Wang

Further, the application of SC has been investigated in DNNs in prior work, and the suitability has been illustrated as SC is more compatible with approximate computations.

Non-Structured DNN Weight Pruning -- Is It Beneficial in Any Platform?

no code implementations3 Jul 2019 Xiaolong Ma, Sheng Lin, Shaokai Ye, Zhezhi He, Linfeng Zhang, Geng Yuan, Sia Huat Tan, Zhengang Li, Deliang Fan, Xuehai Qian, Xue Lin, Kaisheng Ma, Yanzhi Wang

Based on the proposed comparison framework, with the same accuracy and quantization, the results show that non-structrued pruning is not competitive in terms of both storage and computation efficiency.

Model Compression Quantization

Hop: Heterogeneity-Aware Decentralized Training

no code implementations4 Feb 2019 Qinyi Luo, JinKun Lin, Youwei Zhuo, Xuehai Qian

Based on a unique characteristic of decentralized training that we have identified, the iteration gap, we propose a queue-based synchronization mechanism that can efficiently implement backup workers and bounded staleness in the decentralized setting.

HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array

no code implementations7 Jan 2019 Linghao Song, Jiachen Mao, Youwei Zhuo, Xuehai Qian, Hai Li, Yiran Chen

In this paper, inspired by recent work in machine learning systems, we propose a solution HyPar to determine layer-wise parallelism for deep neural network training with an array of DNN accelerators.

E-RNN: Design Optimization for Efficient Recurrent Neural Networks in FPGAs

no code implementations12 Dec 2018 Zhe Li, Caiwen Ding, Siyue Wang, Wujie Wen, Youwei Zhuo, Chang Liu, Qinru Qiu, Wenyao Xu, Xue Lin, Xuehai Qian, Yanzhi Wang

It is a challenging task to have real-time, efficient, and accurate hardware RNN implementations because of the high sensitivity to imprecision accumulation and the requirement of special activation function implementations.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

GraphR: Accelerating Graph Processing Using ReRAM

no code implementations21 Aug 2017 Linghao Song, Youwei Zhuo, Xuehai Qian, Hai Li, Yiran Chen

GRAPHR gains a speedup of 1. 16x to 4. 12x, and is 3. 67x to 10. 96x more energy efficiency compared to PIM-based architecture.

Distributed, Parallel, and Cluster Computing Hardware Architecture

SC-DCNN: Highly-Scalable Deep Convolutional Neural Network using Stochastic Computing

no code implementations18 Nov 2016 Ao Ren, Ji Li, Zhe Li, Caiwen Ding, Xuehai Qian, Qinru Qiu, Bo Yuan, Yanzhi Wang

Stochastic Computing (SC), which uses bit-stream to represent a number within [-1, 1] by counting the number of ones in the bit-stream, has a high potential for implementing DCNNs with high scalability and ultra-low hardware footprint.

Cannot find the paper you are looking for? You can Submit a new open access paper.