Search Results for author: Ruoyi Du

Found 17 papers, 13 papers with code

I-Max: Maximize the Resolution Potential of Pre-trained Rectified Flow Transformers with Projected Flow

no code implementations10 Oct 2024 Ruoyi Du, Dongyang Liu, Le Zhuo, Qin Qi, Hongsheng Li, Zhanyu Ma, Peng Gao

Rectified Flow Transformers (RFTs) offer superior training and inference efficiency, making them likely the most viable direction for scaling up diffusion models.


Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT

1 code implementation5 Jun 2024 Le Zhuo, Ruoyi Du, Han Xiao, Yangguang Li, Dongyang Liu, Rongjie Huang, Wenze Liu, Lirui Zhao, Fu-Yun Wang, Zhanyu Ma, Xu Luo, Zehan Wang, Kaipeng Zhang, Xiangyang Zhu, Si Liu, Xiangyu Yue, Dingning Liu, Wanli Ouyang, Ziwei Liu, Yu Qiao, Hongsheng Li, Peng Gao

Lumina-T2X is a nascent family of Flow-based Large Diffusion Transformers that establishes a unified framework for transforming noise into various modalities, such as images and videos, conditioned on text instructions.

Point Cloud Generation Text-to-Image Generation

Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers

2 code implementations9 May 2024 Peng Gao, Le Zhuo, Dongyang Liu, Ruoyi Du, Xu Luo, Longtian Qiu, Yuhang Zhang, Chen Lin, Rongjie Huang, Shijie Geng, Renrui Zhang, Junlin Xi, Wenqi Shao, Zhengkai Jiang, Tianshuo Yang, Weicai Ye, He Tong, Jingwen He, Yu Qiao, Hongsheng Li

Sora unveils the potential of scaling Diffusion Transformer for generating photorealistic images and videos at arbitrary resolutions, aspect ratios, and durations, yet it still lacks sufficient implementation details.

DemoFusion: Democratising High-Resolution Image Generation With No $$$

1 code implementation CVPR 2024 Ruoyi Du, Dongliang Chang, Timothy Hospedales, Yi-Zhe Song, Zhanyu Ma

High-resolution image generation with Generative Artificial Intelligence (GenAI) has immense potential but, due to the enormous capital investment required for training, it is increasingly centralised to a few large corporations, and hidden behind paywalls.

Image Generation

Multi-View Active Fine-Grained Visual Recognition

1 code implementation ICCV 2023 Ruoyi Du, Wenqing Yu, Heqing Wang, Ting-En Lin, Dongliang Chang, Zhanyu Ma

Despite the remarkable progress of Fine-grained visual classification (FGVC) with years of history, it is still limited to recognizing 2 images.

Fine-Grained Image Classification Fine-Grained Visual Recognition

An Erudite Fine-Grained Visual Classification Model

no code implementations CVPR 2023 Dongliang Chang, Yujun Tong, Ruoyi Du, Timothy Hospedales, Yi-Zhe Song, Zhanyu Ma

Therefore, we first propose a feature disentanglement module and a feature re-fusion module to reduce negative transfer and boost positive transfer between different datasets.

Classification Disentanglement +2

Multi-View Active Fine-Grained Recognition

1 code implementation2 Jun 2022 Ruoyi Du, Wenqing Yu, Heqing Wang, Dongliang Chang, Ting-En Lin, Yongbin Li, Zhanyu Ma

As fine-grained visual classification (FGVC) being developed for decades, great works related have exposed a key direction -- finding discriminative local regions and revealing subtle differences.

Fine-Grained Image Classification

Learning Invariant Visual Representations for Compositional Zero-Shot Learning

1 code implementation1 Jun 2022 Tian Zhang, Kongming Liang, Ruoyi Du, Xian Sun, Zhanyu Ma, Jun Guo

Compositional Zero-Shot Learning (CZSL) aims to recognize novel compositions using knowledge learned from seen attribute-object compositions in the training set.

Attribute Compositional Zero-Shot Learning +2

Caption Feature Space Regularization for Audio Captioning

1 code implementation18 Apr 2022 Yiming Zhang, Hong Yu, Ruoyi Du, Zhanyu Ma, Yuan Dong

To eliminate this negative effect, in this paper, we propose a two-stage framework for audio captioning: (i) in the first stage, via the contrastive learning, we construct a proxy feature space to reduce the distances between captions correlated to the same audio, and (ii) in the second stage, the proxy feature space is utilized as additional supervision to encourage the model to be optimized in the direction that benefits all the correlated captions.

Audio captioning Contrastive Learning +1

Domain Generalization via Frequency-domain-based Feature Disentanglement and Interaction

no code implementations20 Jan 2022 Jingye Wang, Ruoyi Du, Dongliang Chang, Kongming Liang, Zhanyu Ma

Adaptation to out-of-distribution data is a meta-challenge for all statistical learning algorithms that strongly rely on the i. i. d.

Data Augmentation Decoder +3

Clue Me In: Semi-Supervised FGVC with Out-of-Distribution Data

1 code implementation6 Dec 2021 Ruoyi Du, Dongliang Chang, Zhanyu Ma, Yi-Zhe Song, Jun Guo

Despite great strides made on fine-grained visual classification (FGVC), current methods are still heavily reliant on fully-supervised paradigms where ample expert labels are called for.

Fine-Grained Image Classification

Fine-Grained Visual Classification via Simultaneously Learning of Multi-regional Multi-grained Features

2 code implementations31 Jan 2021 Dongliang Chang, Yixiao Zheng, Zhanyu Ma, Ruoyi Du, Kongming Liang

Finally, we can obtain multiple discriminative regions on high-level feature channels and obtain multiple more minute regions within these discriminative regions on middle-level feature channels.

Fine-Grained Image Classification General Classification

Knowledge Transfer Based Fine-grained Visual Classification

1 code implementation21 Dec 2020 Siqing Zhang, Ruoyi Du, Dongliang Chang, Zhanyu Ma, Jun Guo

Convolution neural networks (CNNs), which employ the cross entropy loss (CE-loss) as the loss function, show poor performance since the model can only learn the most discriminative part and ignore other meaningful regions.

Classification Fine-Grained Image Classification +2

Cannot find the paper you are looking for? You can Submit a new open access paper.