Search Results for author: Ziheng Jiang

Found 14 papers, 3 papers with code

FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel Fusion

no code implementations11 Jun 2024 Li-Wen Chang, Wenlei Bao, Qi Hou, Chengquan Jiang, Ningxin Zheng, Yinmin Zhong, Xuanrun Zhang, Zuquan Song, Ziheng Jiang, Haibin Lin, Xin Jin, Xin Liu

Overall, it can achieve up to 1. 24x speedups for training over Megatron-LM on a cluster of 128 GPUs with various GPU generations and interconnects, and up to 1. 66x and 1. 30x speedups for prefill and decoding inference over vLLM on a cluster with 8 GPUs with various GPU generations and interconnects.

Federated Remote Physiological Measurement with Imperfect Data

no code implementations11 Mar 2022 Xin Liu, Mingchuan Zhang, Ziheng Jiang, Shwetak Patel, Daniel McDuff

The growing need for technology that supports remote healthcare is being acutely highlighted by an aging population and the COVID-19 pandemic.

Federated Learning Privacy Preserving

EfficientPhys: Enabling Simple, Fast, and Accurate Camera-Based Vitals Measurement

no code implementations29 Sep 2021 Xin Liu, Brian L. Hill, Ziheng Jiang, Shwetak Patel, Daniel McDuff

Camera-based physiological measurement is a growing field with neural models providing state-the-art-performance.

Face Detection

Automated Backend-Aware Post-Training Quantization

no code implementations27 Mar 2021 Ziheng Jiang, Animesh Jain, Andrew Liu, Josh Fromm, Chengqian Ma, Tianqi Chen, Luis Ceze

Quantization is a key technique to reduce the resource requirement and improve the performance of neural network deployment.

Quantization

SplitSR: An End-to-End Approach to Super-Resolution on Mobile Devices

no code implementations20 Jan 2021 Xin Liu, Yuang Li, Josh Fromm, Yuntao Wang, Ziheng Jiang, Alex Mariakakis, Shwetak Patel

In this work, we demonstrate state-of-the-art latency and accuracy for on-device super-resolution using a novel hybrid architecture called SplitSR and a novel lightweight residual block called SplitSRBlock.

Super-Resolution

MetaPhys: Few-Shot Adaptation for Non-Contact Physiological Measurement

1 code implementation5 Oct 2020 Xin Liu, Ziheng Jiang, Josh Fromm, Xuhai Xu, Shwetak Patel, Daniel McDuff

There are large individual differences in physiological processes, making designing personalized health sensing algorithms challenging.

Meta-Learning

Characterizing Structural Regularities of Labeled Data in Overparameterized Models

1 code implementation8 Feb 2020 Ziheng Jiang, Chiyuan Zhang, Kunal Talwar, Michael C. Mozer

We obtain empirical estimates of this score for individual instances in multiple data sets, and we show that the score identifies out-of-distribution and mislabeled examples at one end of the continuum and strongly regular examples at the other end.

Density Estimation Out-of-Distribution Detection +1

A Hardware-Software Blueprint for Flexible Deep Learning Specialization

no code implementations11 Jul 2018 Thierry Moreau, Tianqi Chen, Luis Vega, Jared Roesch, Eddie Yan, Lianmin Zheng, Josh Fromm, Ziheng Jiang, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy

Specialized Deep Learning (DL) acceleration stacks, designed for a specific set of frameworks, model architectures, operators, and data types, offer the allure of high performance while sacrificing flexibility.

Code Generation Style Transfer

Learning to Optimize Tensor Programs

no code implementations NeurIPS 2018 Tianqi Chen, Lianmin Zheng, Eddie Yan, Ziheng Jiang, Thierry Moreau, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy

Efficient implementations of tensor operators, such as matrix multiplication and high dimensional convolution, are key enablers of effective deep learning systems.

TVM: An Automated End-to-End Optimizing Compiler for Deep Learning

1 code implementation12 Feb 2018 Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Meghan Cowan, Haichen Shen, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy

Experimental results show that TVM delivers performance across hardware back-ends that are competitive with state-of-the-art, hand-tuned libraries for low-power CPU, mobile GPU, and server-class GPUs.

Cannot find the paper you are looking for? You can Submit a new open access paper.