no code implementations • 22 Apr 2025 • Xiao Zhang, Yaoyao Ding, Yang Hu, Gennady Pekhimenko
Deep learning (DL) workloads mainly run on accelerators like GPUs.
no code implementations • 17 Apr 2025 • Yaoyao Ding, Bohan Hou, Xiao Zhang, Allan Lin, Tianqi Chen, Cody Yu Hao, Yida Wang, Gennady Pekhimenko
Existing approaches for generating low-precision kernels are limited to weight bit widths that are powers of two and suffer from suboptimal performance due to high-level GPU programming abstractions.
2 code implementations • 18 Oct 2022 • Yaoyao Ding, Cody Hao Yu, Bojian Zheng, Yizhi Liu, Yida Wang, Gennady Pekhimenko
With the proposed paradigm, we implement a deep learning compiler Hidet.
1 code implementation • 2 Nov 2020 • Yaoyao Ding, Ligeng Zhu, Zhihao Jia, Gennady Pekhimenko, Song Han
To accelerate CNN inference, existing deep learning frameworks focus on optimizing intra-operator parallelization.
1 code implementation • CVPR 2020 • Muyang Li, Ji Lin, Yaoyao Ding, Zhijian Liu, Jun-Yan Zhu, Song Han
Directly applying existing compression methods yields poor performance due to the difficulty of GAN training and the differences in generator architectures.
1 code implementation • 13 May 2019 • Huichu Zhang, Siyuan Feng, Chang Liu, Yaoyao Ding, Yichen Zhu, Zihan Zhou, Wei-Nan Zhang, Yong Yu, Haiming Jin, Zhenhui Li
The most commonly used open-source traffic simulator SUMO is, however, not scalable to large road network and large traffic flow, which hinders the study of reinforcement learning on traffic scenarios.
Multi-agent Reinforcement Learning
reinforcement-learning
+3