no code implementations • 6 Feb 2025 • Juming Xiong, Muyang Li, Ruining Deng, Tianyuan Yao, Regina N Tyree, Girish Hiremath, Yuankai Huo
Image stitching techniques address this issue by providing a continuous and complete visualization of the examined area.
no code implementations • 3 Feb 2025 • Haocheng Xi, Shuo Yang, Yilong Zhao, Chenfeng Xu, Muyang Li, Xiuyu Li, Yujun Lin, Han Cai, Jintao Zhang, Dacheng Li, Jianfei Chen, Ion Stoica, Kurt Keutzer, Song Han
Diffusion Transformers (DiTs) dominate video generation but their high computational cost severely limits real-world applicability, usually requiring tens of minutes to generate a few seconds of video even on high-performance GPUs.
1 code implementation • 30 Jan 2025 • Enze Xie, Junsong Chen, Yuyang Zhao, Jincheng Yu, Ligeng Zhu, Yujun Lin, Zhekai Zhang, Muyang Li, Junyu Chen, Han Cai, Bingchen Liu, Daquan Zhou, Song Han
This paper presents SANA-1. 5, a linear Diffusion Transformer for efficient scaling in text-to-image generation.
Ranked #2 on
Text-to-Image Generation
on GenEval
3 code implementations • 7 Nov 2024 • Muyang Li, Yujun Lin, Zhekai Zhang, Tianle Cai, Xiuyu Li, Junxian Guo, Enze Xie, Chenlin Meng, Jun-Yan Zhu, Song Han
To address this, we co-design an inference engine Nunchaku that fuses the kernels of the low-rank branch into those of the low-bit branch to cut off redundant memory access.
1 code implementation • 14 Oct 2024 • Junyu Chen, Han Cai, Junsong Chen, Enze Xie, Shang Yang, Haotian Tang, Muyang Li, Yao Lu, Song Han
With these designs, we improve the autoencoder's spatial compression ratio up to 128 while maintaining the reconstruction quality.
Ranked #13 on
Image Generation
on ImageNet 512x512
1 code implementation • 14 Oct 2024 • Enze Xie, Junsong Chen, Junyu Chen, Han Cai, Haotian Tang, Yujun Lin, Zhekai Zhang, Muyang Li, Ligeng Zhu, Yao Lu, Song Han
We introduce Sana, a text-to-image framework that can efficiently generate images up to 4096$\times$4096 resolution.
no code implementations • 2 Oct 2024 • Muyang Li, Juming Xiong, Ruining Deng, Tianyuan Yao, Regina N Tyree, Girish Hiremath, Yuankai Huo
Endoscopy is a crucial tool for diagnosing the gastrointestinal tract, but its effectiveness is often limited by a narrow field of view and the dynamic nature of the internal environment, especially in the esophagus, where complex and repetitive patterns make image stitching challenging.
no code implementations • 19 Jul 2024 • Muyang Li, Can Cui, Quan Liu, Ruining Deng, Tianyuan Yao, Marilyn Lionts, Yuankai Huo
Our extensive experiments across multiple medical datasets reveal that data distillation can significantly reduce dataset size while maintaining comparable model performance to that achieved with the full dataset, suggesting that a small, representative sample of images can serve as a reliable indicator of distillation success.
no code implementations • CVPR 2024 • Han Cai, Muyang Li, Zhuoyang Zhang, Qinsheng Zhang, Ming-Yu Liu, Song Han
In parallel to prior conditional control methods, CAN controls the image generation process by dynamically manipulating the weight of the neural network.
2 code implementations • CVPR 2024 • Muyang Li, Tianle Cai, Jiaxin Cao, Qinsheng Zhang, Han Cai, Junjie Bai, Yangqing Jia, Ming-Yu Liu, Kai Li, Song Han
To overcome this dilemma, we observe the high similarity between the input from adjacent diffusion steps and propose displaced patch parallelism, which takes advantage of the sequential nature of the diffusion process by reusing the pre-computed feature maps from the previous timestep to provide context for the current step.
no code implementations • 25 Oct 2023 • Zhuo Huang, Muyang Li, Li Shen, Jun Yu, Chen Gong, Bo Han, Tongliang Liu
By fully exploring both variant and invariant parameters, our EVIL can effectively identify a robust subnetwork to improve OOD generalization.
no code implementations • 11 Mar 2023 • Muyang Li, Zijian Zhang, Xiangyu Zhao, Wanyu Wang, Minghao Zhao, Runze Wu, Ruocheng Guo
Sequential recommender systems aim to predict users' next interested item given their historical interactions.
1 code implementation • 3 Nov 2022 • Muyang Li, Ji Lin, Chenlin Meng, Stefano Ermon, Song Han, Jun-Yan Zhu
With about $1\%$-area edits, SIGE accelerates DDPM by $3. 0\times$ on NVIDIA RTX 3090 and $4. 6\times$ on Apple M1 Pro GPU, Stable Diffusion by $7. 2\times$ on 3090, and GauGAN by $5. 6\times$ on 3090 and $5. 2\times$ on M1 Pro GPU.
no code implementations • 24 Jul 2022 • Haoran Yang, Xiangyu Zhao, Muyang Li, Hongxu Chen, Guandong Xu
Currently, graph learning models are indispensable tools to help researchers explore graph-structured data.
1 code implementation • CVPR 2022 • Yihan Wang, Muyang Li, Han Cai, Wei-Ming Chen, Song Han
Inspired by this finding, we design LitePose, an efficient single-branch architecture for pose estimation, and introduce two simple approaches to enhance the capacity of LitePose, including Fusion Deconv Head and Large Kernel Convs.
Ranked #5 on
Multi-Person Pose Estimation
on MS COCO
(Validation AP metric)
no code implementations • 25 Apr 2022 • Muyang Li, Xiangyu Zhao, Chuan Lyu, Minghao Zhao, Runze Wu, Ruocheng Guo
In addition, most existing works assume that such sequential dependencies exist solely in the item embeddings, but neglect their existence among the item features.
1 code implementation • CVPR 2020 • Muyang Li, Ji Lin, Yaoyao Ding, Zhijian Liu, Jun-Yan Zhu, Song Han
Directly applying existing compression methods yields poor performance due to the difficulty of GAN training and the differences in generator architectures.