no code implementations • 23 Apr 2025 • Chong Mou, Yanze Wu, Wenxu Wu, Zinan Guo, Pengze Zhang, Yufeng Cheng, Yiming Luo, Fei Ding, Shiwen Zhang, Xinghui Li, Mengtian Li, Mingcong Liu, Yi Zhang, Shaojin Wu, Songtao Zhao, Jian Zhang, Qian He, Xinglong Wu
Recently, extensive research on image customization (e. g., identity, subject, style, background, etc.)
1 code implementation • 20 Apr 2025 • Fulong Ye, Miao Hua, Pengze Zhang, Xinghui Li, Qichao Sun, Songtao Zhao, Qian He, Xinglong Wu
To address this issue, we leverage the accelerated diffusion model SD Turbo, reducing the inference steps to a single iteration, enabling efficient pixel-level end-to-end training with explicit Triplet ID Group supervision.
no code implementations • 16 Feb 2025 • Lijie Liu, Tianxiang Ma, Bingchuan Li, Zhuowei Chen, Jiawei Liu, Qian He, Xinglong Wu
The continuous development of foundational models for video generation is evolving into various applications, with subject-consistent video generation still in the exploratory stage.
no code implementations • 22 Dec 2024 • Bin Xia, Yuechen Zhang, Jingyao Li, Chengyao Wang, Yitong Wang, Xinglong Wu, Bei Yu, Jiaya Jia
We begin by analyzing existing frameworks and the requirements of downstream tasks, proposing a unified framework that integrates both T2I models and various editing tasks.
no code implementations • 5 Dec 2024 • HUI ZHANG, Dexiang Hong, Tingwei Gao, Yitong Wang, Jie Shao, Xinglong Wu, Zuxuan Wu, Yu-Gang Jiang
To Inherit the advantages of MM-DiT, we use a separate set of network weights to process the layout, treating it as equally important as the image and text modalities.
2 code implementations • 4 Dec 2024 • Liao Qu, Huichao Zhang, Yiheng Liu, Xu Wang, Yi Jiang, Yiming Gao, Hu Ye, Daniel K. Du, Zehuan Yuan, Xinglong Wu
This design enables direct access to both high-level semantic representations crucial for understanding tasks and fine-grained visual features essential for generation through shared indices.
no code implementations • 17 Nov 2024 • Guoping Xu, Ximing Wu, Wentao Liao, Xinglong Wu, Qing Huang, Chang Li
To address this, we introduce UBBS-Net, a dual-branch deep neural network that learns the relationship between body and boundary for improved segmentation.
no code implementations • 1 Nov 2024 • Yu Tai, Xinglong Wu, HongWei Yang, Hui He, Duanjing Chen, Yuanming Shao, Weizhe Zhang
Specifically, aiming at spatial heterogeneity, we develop a spatial feature modeling layer to capture the fine-grained topological distribution patterns from node- and edge-level representations, respectively.
1 code implementation • 7 Jul 2024 • Xinglong Wu, Anfeng Huang, HongWei Yang, Hui He, Yu Tai, Weizhe Zhang
Multi-modal recommendation greatly enhances the performance of recommender systems by modeling the auxiliary information from multi-modality contents.
no code implementations • 5 Jul 2024 • Weibo Gong, Xiaojie Jin, Xin Li, Dongliang He, Xinglong Wu
To address this task, we propose VCoME, a general framework that employs a large multimodal model to generate editing effects for video composition.
1 code implementation • 2 May 2024 • Guoping Xu, Xiaxia Wang, Xinglong Wu, Xuesong Leng, Yongchao Xu
Deep learning has made significant progress in computer vision, specifically in image classification, object detection, and semantic segmentation.
1 code implementation • 22 Apr 2024 • Yutao Cheng, Zhao Zhang, Maoke Yang, Hui Nie, Chunyuan Li, Xinglong Wu, Jie Shao
One existing practice is Graphic Layout Generation (GLG), which aims to layout sequential design elements.
no code implementations • 26 Aug 2023 • Bin Xia, Yulun Zhang, Shiyin Wang, Yitong Wang, Xinglong Wu, Yapeng Tian, Wenming Yang, Radu Timotfe, Luc van Gool
Compared to traditional DMs, the compact IPR enables DiffI2I to obtain more accurate outcomes and employ a lighter denoising network and fewer iterations.
1 code implementation • ICCV 2023 • Bin Xia, Yulun Zhang, Shiyin Wang, Yitong Wang, Xinglong Wu, Yapeng Tian, Wenming Yang, Luc van Gool
Diffusion model (DM) has achieved SOTA performance by modeling the image synthesis process into a sequential application of a denoising network.
1 code implementation • CVPR 2023 • Ruyang Liu, Jingjia Huang, Ge Li, Jiashi Feng, Xinglong Wu, Thomas H. Li
In this paper, based on the CLIP model, we revisit temporal modeling in the context of image-to-video knowledge transferring, which is the key point for extending image-text pretrained models to the video domain.
Ranked #7 on
Video Retrieval
on MSR-VTT-1kA
(using extra training data)
1 code implementation • 21 Dec 2022 • Jingjia Huang, Yuanqi Chen, Jiashi Feng, Xinglong Wu
Semi-supervised learning based methods are current SOTA solutions to the noisy-label learning problem, which rely on learning an unsupervised label cleaner first to divide the training samples into a labeled set for clean data and an unlabeled set for noise data.
Ranked #4 on
Image Classification
on Clothing1M
(using extra training data)
1 code implementation • CVPR 2023 • Jingjia Huang, Yinan Li, Jiashi Feng, Xinglong Wu, Xiaoshuai Sun, Rongrong Ji
We then introduce \textbf{Clover}\textemdash a Correlated Video-Language pre-training method\textemdash towards a universal Video-Language model for solving multiple video understanding tasks with neither performance nor efficiency compromise.
Ranked #1 on
Video Question Answering
on LSMDC-FiB
2 code implementations • 1 Mar 2022 • ZiHao Wang, Wei Liu, Qian He, Xinglong Wu, Zili Yi
Once trained, the transformer can generate coherent image tokens based on the text embedding extracted from the text encoder of CLIP upon an input text.
no code implementations • 17 May 2021 • Andrey Ignatov, Grigory Malivenko, Radu Timofte, Sheng Chen, Xin Xia, Zhaoyan Liu, Yuwei Zhang, Feng Zhu, Jiashi Li, Xuefeng Xiao, Yuan Tian, Xinglong Wu, Christos Kyrkou, Yixin Chen, Zexin Zhang, Yunbo Peng, Yue Lin, Saikat Dutta, Sourya Dipta Das, Nisarg A. Shah, Himanshu Kumar, Chao Ge, Pei-Lin Wu, Jin-Hua Du, Andrew Batutin, Juan Pablo Federico, Konrad Lyda, Levon Khojoyan, Abhishek Thanki, Sayak Paul, Shahid Siddiqui
To address this problem, we introduce the first Mobile AI challenge, where the target is to develop quantized deep learning-based camera scene classification solutions that can demonstrate a real-time performance on smartphones and IoT platforms.