no code implementations • 10 Oct 2024 • Ruoyi Du, Dongyang Liu, Le Zhuo, Qin Qi, Hongsheng Li, Zhanyu Ma, Peng Gao
Rectified Flow Transformers (RFTs) offer superior training and inference efficiency, making them likely the most viable direction for scaling up diffusion models.
1 code implementation • 5 Jun 2024 • Le Zhuo, Ruoyi Du, Han Xiao, Yangguang Li, Dongyang Liu, Rongjie Huang, Wenze Liu, Lirui Zhao, Fu-Yun Wang, Zhanyu Ma, Xu Luo, Zehan Wang, Kaipeng Zhang, Xiangyang Zhu, Si Liu, Xiangyu Yue, Dingning Liu, Wanli Ouyang, Ziwei Liu, Yu Qiao, Hongsheng Li, Peng Gao
Lumina-T2X is a nascent family of Flow-based Large Diffusion Transformers that establishes a unified framework for transforming noise into various modalities, such as images and videos, conditioned on text instructions.
2 code implementations • 9 May 2024 • Peng Gao, Le Zhuo, Dongyang Liu, Ruoyi Du, Xu Luo, Longtian Qiu, Yuhang Zhang, Chen Lin, Rongjie Huang, Shijie Geng, Renrui Zhang, Junlin Xi, Wenqi Shao, Zhengkai Jiang, Tianshuo Yang, Weicai Ye, He Tong, Jingwen He, Yu Qiao, Hongsheng Li
Sora unveils the potential of scaling Diffusion Transformer for generating photorealistic images and videos at arbitrary resolutions, aspect ratios, and durations, yet it still lacks sufficient implementation details.
1 code implementation • CVPR 2024 • Ruoyi Du, Dongliang Chang, Timothy Hospedales, Yi-Zhe Song, Zhanyu Ma
High-resolution image generation with Generative Artificial Intelligence (GenAI) has immense potential but, due to the enormous capital investment required for training, it is increasingly centralised to a few large corporations, and hidden behind paywalls.
1 code implementation • ICCV 2023 • Ruoyi Du, Wenqing Yu, Heqing Wang, Ting-En Lin, Dongliang Chang, Zhanyu Ma
Despite the remarkable progress of Fine-grained visual classification (FGVC) with years of history, it is still limited to recognizing 2 images.
Fine-Grained Image Classification Fine-Grained Visual Recognition
no code implementations • ICCV 2023 • Yurong Guo, Ruoyi Du, Yuan Dong, Timothy Hospedales, Yi-Zhe Song, Zhanyu Ma
In this paper, we first observe the dependence of task-specific parameter configuration on the target task.
1 code implementation • CVPR 2023 • Ruoyi Du, Dongliang Chang, Kongming Liang, Timothy Hospedales, Yi-Zhe Song, Zhanyu Ma
Our code is available at https://github. com/PRIS-CV/On-the-fly-Category-Discovery.
no code implementations • CVPR 2023 • Dongliang Chang, Yujun Tong, Ruoyi Du, Timothy Hospedales, Yi-Zhe Song, Zhanyu Ma
Therefore, we first propose a feature disentanglement module and a feature re-fusion module to reduce negative transfer and boost positive transfer between different datasets.
1 code implementation • 2 Jun 2022 • Ruoyi Du, Wenqing Yu, Heqing Wang, Dongliang Chang, Ting-En Lin, Yongbin Li, Zhanyu Ma
As fine-grained visual classification (FGVC) being developed for decades, great works related have exposed a key direction -- finding discriminative local regions and revealing subtle differences.
1 code implementation • 1 Jun 2022 • Tian Zhang, Kongming Liang, Ruoyi Du, Xian Sun, Zhanyu Ma, Jun Guo
Compositional Zero-Shot Learning (CZSL) aims to recognize novel compositions using knowledge learned from seen attribute-object compositions in the training set.
1 code implementation • 18 Apr 2022 • Yiming Zhang, Hong Yu, Ruoyi Du, Zhanyu Ma, Yuan Dong
To eliminate this negative effect, in this paper, we propose a two-stage framework for audio captioning: (i) in the first stage, via the contrastive learning, we construct a proxy feature space to reduce the distances between captions correlated to the same audio, and (ii) in the second stage, the proxy feature space is utilized as additional supervision to encourage the model to be optimized in the direction that benefits all the correlated captions.
no code implementations • 20 Jan 2022 • Jingye Wang, Ruoyi Du, Dongliang Chang, Kongming Liang, Zhanyu Ma
Adaptation to out-of-distribution data is a meta-challenge for all statistical learning algorithms that strongly rely on the i. i. d.
1 code implementation • 6 Dec 2021 • Ruoyi Du, Dongliang Chang, Zhanyu Ma, Yi-Zhe Song, Jun Guo
Despite great strides made on fine-grained visual classification (FGVC), current methods are still heavily reliant on fully-supervised paradigms where ample expert labels are called for.
1 code implementation • 6 Dec 2021 • Dongliang Chang, Kaiyue Pang, Ruoyi Du, Zhanyu Ma, Yi-Zhe Song, Jun Guo
1 lays out our approach in answering this question.
2 code implementations • 31 Jan 2021 • Dongliang Chang, Yixiao Zheng, Zhanyu Ma, Ruoyi Du, Kongming Liang
Finally, we can obtain multiple discriminative regions on high-level feature channels and obtain multiple more minute regions within these discriminative regions on middle-level feature channels.
1 code implementation • 21 Dec 2020 • Siqing Zhang, Ruoyi Du, Dongliang Chang, Zhanyu Ma, Jun Guo
Convolution neural networks (CNNs), which employ the cross entropy loss (CE-loss) as the loss function, show poor performance since the model can only learn the most discriminative part and ignore other meaningful regions.
Ranked #41 on Fine-Grained Image Classification on CUB-200-2011
5 code implementations • ECCV 2020 • Ruoyi Du, Dongliang Chang, Ayan Kumar Bhunia, Jiyang Xie, Zhanyu Ma, Yi-Zhe Song, Jun Guo
In this work, we propose a novel framework for fine-grained visual classification to tackle these problems.
Ranked #21 on Fine-Grained Image Classification on Stanford Cars