1 code implementation • 3 Jan 2025 • Zhengcong Fei, Debang Li, Di Qiu, Changqian Yu, Mingyuan Fan
This paper presents a powerful framework to customize video creations by incorporating multiple specific identity (ID) photos, with video diffusion Transformers, referred to as \texttt{Ingredients}.
1 code implementation • 14 Dec 2024 • Zhengcong Fei, Di Qiu, Changqian Yu, Debang Li, Mingyuan Fan, Xiang Wen
This paper investigates a solution for enabling in-context capabilities of video diffusion transformers, with minimal tuning required for activation.
no code implementations • 28 Oct 2024 • Di Qiu, Zheng Chen, Rui Wang, Mingyuan Fan, Changqian Yu, Junshi Huang, Xiang Wen
Recent advancements in character video synthesis still depend on extensive fine-tuning or complex 3D modeling processes, which can restrict accessibility and hinder real-time applicability.
2 code implementations • 1 Sep 2024 • Zhengcong Fei, Mingyuan Fan, Changqian Yu, Junshi Huang
This paper explores a simple extension of diffusion-based rectified flow Transformers for text-to-music generation, termed as FluxMusic.
Ranked #2 on
Text-to-Music Generation
on MusicCaps
1 code implementation • 16 Jul 2024 • Zhengcong Fei, Mingyuan Fan, Changqian Yu, Debang Li, Junshi Huang
In this paper, we present DiT-MoE, a sparse version of the diffusion Transformer, that is scalable and competitive with dense networks while exhibiting highly optimized inference.
no code implementations • 20 Jun 2024 • Mingkun Wang, Xiaoguang Ren, Ruochun Jin, Minglong Li, Xiaochuan Zhang, Changqian Yu, Mingxu Wang, Wenjing Yang
Most prior motion prediction endeavors in autonomous driving have inadequately encoded future scenarios, leading to predictions that may fail to accurately capture the diverse movements of agents (e. g., vehicles or pedestrians).
no code implementations • 3 Jun 2024 • Zhengcong Fei, Mingyuan Fan, Changqian Yu, Debang Li, Youqiang Zhang, Junshi Huang
This paper unveils Dimba, a new text-to-image diffusion model that employs a distinctive hybrid architecture combining Transformer and Mamba elements.
1 code implementation • 6 Apr 2024 • Zhengcong Fei, Mingyuan Fan, Changqian Yu, Debang Li, Junshi Huang
Transformers have catalyzed advancements in computer vision and natural language processing (NLP) fields.
2 code implementations • 8 Feb 2024 • Zhengcong Fei, Mingyuan Fan, Changqian Yu, Junshi Huang
We endeavor to train diffusion models for image data, wherein the traditional U-Net backbone is supplanted by a state space backbone, functioning on raw patches or latent space.
1 code implementation • 28 Dec 2023 • Zhengze Xu, Dongyue Wu, Changqian Yu, Xiangxiang Chu, Nong Sang, Changxin Gao
Recent real-time semantic segmentation methods usually adopt an additional semantic branch to pursue rich long-range context.
1 code implementation • IEEE Transactions on Image Processing 2023 • Dongyue Wu, Zilin Guo, Aoyan Li, Changqian Yu, Nong Sang, Changxin Gao
We conduct extensive experiments on ADE20K, Cityscapes, and Pascal Context, and the results show that applying the CBL to various popular segmentation networks can significantly improve the mIoU and boundary F-score performance.
Ranked #23 on
Semantic Segmentation
on Cityscapes val
1 code implementation • 15 May 2023 • Jialong Zuo, Jiahao Hong, Feng Zhang, Changqian Yu, Hanyu Zhou, Changxin Gao, Nong Sang, Jingdong Wang
To address this issue, we propose a novel language-image pre-training framework for person representation learning, termed PLIP.
Ranked #7 on
Text based Person Retrieval
on ICFG-PEDES
no code implementations • 12 Jan 2023 • Dongyue Wu, Zilin Guo, Aoyan Li, Changqian Yu, Changxin Gao, Nong Sang
Under this novel view, we propose a Class Center Similarity layer (CCS layer) to address the above-mentioned challenges by generating adaptive class centers conditioned on different scenes and supervising the similarities between class centers.
1 code implementation • 20 Sep 2022 • Mingkun Wang, Xinge Zhu, Changqian Yu, Wei Li, Yuexin Ma, Ruochun Jin, Xiaoguang Ren, Dongchun Ren, Mingxu Wang, Wenjing Yang
In view of this, we propose a new goal area-based framework, named Goal Area Network (GANet), for motion forecasting, which models goal areas rather than exact goal coordinates as preconditions for trajectory prediction, performing more robustly and accurately.
Ranked #15 on
Motion Forecasting
on Argoverse CVPR 2020
1 code implementation • 6 Jun 2022 • Yunsheng Ni, Depu Meng, Changqian Yu, Chengbin Quan, Dongchun Ren, Youjian Zhao
Specifically, we first capture the different representations with different augmentations, then regularize the cosine distance of the representations to enhance the consistency.
no code implementations • 24 Feb 2022 • Yifan Liu, Chunhua Shen, Changqian Yu, Jingdong Wang
To this end, we perform inference at each frame.
1 code implementation • 25 Nov 2021 • Rui Wang, Jian Chen, Gang Yu, Li Sun, Changqian Yu, Changxin Gao, Nong Sang
Image manipulation with StyleGAN has been an increasing concern in recent years. Recent works have achieved tremendous success in analyzing several semantic latent spaces to edit the attributes of the generated images. However, due to the limited semantic and spatial manipulation precision in these latent spaces, the existing endeavors are defeated in fine-grained StyleGAN image manipulation, i. e., local attribute translation. To address this issue, we discover attribute-specific control units, which consist of multiple channels of feature maps and modulation styles.
2 code implementations • 21 Sep 2021 • Changqian Yu, Yuanjie Shao, Changxin Gao, Nong Sang
The last layer of FCN is typically a global classifier (1x1 convolution) to recognize each pixel to a semantic label.
Ranked #20 on
Semantic Segmentation
on PASCAL Context
15 code implementations • CVPR 2021 • Changqian Yu, Bin Xiao, Changxin Gao, Lu Yuan, Lei Zhang, Nong Sang, Jingdong Wang
We introduce a lightweight unit, conditional channel weighting, to replace costly pointwise (1x1) convolutions in shuffle blocks.
Ranked #33 on
Pose Estimation
on COCO test-dev
(using extra training data)
no code implementations • ECCV 2020 • Changqian Yu, Yifan Liu, Changxin Gao, Chunhua Shen, Nong Sang
In this paper, we present a Representative Graph (RepGraph) layer to dynamically sample a few representative features, which dramatically reduces redundancy.
7 code implementations • 5 Apr 2020 • Changqian Yu, Changxin Gao, Jingbo Wang, Gang Yu, Chunhua Shen, Nong Sang
We propose to treat these spatial details and categorical semantics separately to achieve high accuracy and high efficiency for realtime semantic segmentation.
Ranked #1 on
Real-Time Semantic Segmentation
on COCO-Stuff
2 code implementations • CVPR 2020 • Changqian Yu, Jingbo Wang, Changxin Gao, Gang Yu, Chunhua Shen, Nong Sang
Given an input image and corresponding ground truth, Affinity Loss constructs an ideal affinity map to supervise the learning of Context Prior.
Ranked #1 on
Scene Understanding
on ADE20K val
1 code implementation • ECCV 2020 • Yifan Liu, Chunhua Shen, Changqian Yu, Jingdong Wang
For semantic segmentation, most existing real-time deep models trained with each frame independently may produce inconsistent results for a video sequence.
Ranked #2 on
Video Semantic Segmentation
on CamVid
1 code implementation • 19 Jan 2020 • Shizhen Zhao, Changxin Gao, Yuanjie Shao, Lerenhan Li, Changqian Yu, Zhong Ji, Nong Sang
FFU and BFU add the IoU variance to the results of CFU, yielding class-specific foreground and background features, respectively.
no code implementations • CVPR 2019 • Huanyu Liu, Chao Peng, Changqian Yu, Jingbo Wang, Xu Liu, Gang Yu, Wei Jiang
Panoptic segmentation, which needs to assign a category label to each pixel and segment each object instance simultaneously, is a challenging topic.
21 code implementations • ECCV 2018 • Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, Nong Sang
Semantic segmentation requires both rich spatial information and sizeable receptive field.
Ranked #4 on
Semantic Segmentation
on SkyScapes-Dense
Dichotomous Image Segmentation
Real-Time Semantic Segmentation
+2
3 code implementations • CVPR 2018 • Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, Nong Sang
Most existing methods of semantic segmentation still suffer from two aspects of challenges: intra-class inconsistency and inter-class indistinction.
Ranked #5 on
Semantic Segmentation
on PASCAL VOC 2012 test