Search Results for author: Xinglong Wu

Found 19 papers, 10 papers with code

DreamID: High-Fidelity and Fast diffusion-based Face Swapping via Triplet ID Group Learning

1 code implementation20 Apr 2025 Fulong Ye, Miao Hua, Pengze Zhang, Xinghui Li, Qichao Sun, Songtao Zhao, Qian He, Xinglong Wu

To address this issue, we leverage the accelerated diffusion model SD Turbo, reducing the inference steps to a single iteration, enabling efficient pixel-level end-to-end training with explicit Triplet ID Group supervision.

Attribute Face Swapping +1

Phantom: Subject-consistent video generation via cross-modal alignment

no code implementations16 Feb 2025 Lijie Liu, Tianxiang Ma, Bingchuan Li, Zhuowei Chen, Jiawei Liu, Qian He, Xinglong Wu

The continuous development of foundational models for video generation is evolving into various applications, with subject-consistent video generation still in the exploratory stage.

cross-modal alignment Triplet +1

DreamOmni: Unified Image Generation and Editing

no code implementations22 Dec 2024 Bin Xia, Yuechen Zhang, Jingyao Li, Chengyao Wang, Yitong Wang, Xinglong Wu, Bei Yu, Jiaya Jia

We begin by analyzing existing frameworks and the requirements of downstream tasks, proposing a unified framework that integrates both T2I models and various editing tasks.

Image Generation

CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation

no code implementations5 Dec 2024 HUI ZHANG, Dexiang Hong, Tingwei Gao, Yitong Wang, Jie Shao, Xinglong Wu, Zuxuan Wu, Yu-Gang Jiang

To Inherit the advantages of MM-DiT, we use a separate set of network weights to process the layout, treating it as equally important as the image and text modalities.

Layout Generation Layout-to-Image Generation

TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation

2 code implementations4 Dec 2024 Liao Qu, Huichao Zhang, Yiheng Liu, Xu Wang, Yi Jiang, Yiming Gao, Hu Ye, Daniel K. Du, Zehuan Yuan, Xinglong Wu

This design enables direct access to both high-level semantic representations crucial for understanding tasks and fine-grained visual features essential for generation through shared indices.

Image Generation Image Reconstruction +1

DBF-Net: A Dual-Branch Network with Feature Fusion for Ultrasound Image Segmentation

no code implementations17 Nov 2024 Guoping Xu, Ximing Wu, Wentao Liao, Xinglong Wu, Qing Huang, Chang Li

To address this, we introduce UBBS-Net, a dual-branch deep neural network that learns the relationship between body and boundary for improved segmentation.

Image Segmentation Segmentation +1

How to Bridge Spatial and Temporal Heterogeneity in Link Prediction? A Contrastive Method

no code implementations1 Nov 2024 Yu Tai, Xinglong Wu, HongWei Yang, Hui He, Duanjing Chen, Yuanming Shao, Weizhe Zhang

Specifically, aiming at spatial heterogeneity, we develop a spatial feature modeling layer to capture the fine-grained topological distribution patterns from node- and edge-level representations, respectively.

Contrastive Learning Link Prediction

Towards Bridging the Cross-modal Semantic Gap for Multi-modal Recommendation

1 code implementation7 Jul 2024 Xinglong Wu, Anfeng Huang, HongWei Yang, Hui He, Yu Tai, Weizhe Zhang

Multi-modal recommendation greatly enhances the performance of recommender systems by modeling the auxiliary information from multi-modality contents.

cross-modal alignment Multi-modal Recommendation +2

VCoME: Verbal Video Composition with Multimodal Editing Effects

no code implementations5 Jul 2024 Weibo Gong, Xiaojie Jin, Xin Li, Dongliang He, Xinglong Wu

To address this task, we propose VCoME, a general framework that employs a large multimodal model to generate editing effects for video composition.

Development of Skip Connection in Deep Neural Networks for Computer Vision and Medical Image Analysis: A Survey

1 code implementation2 May 2024 Guoping Xu, Xiaxia Wang, Xinglong Wu, Xuesong Leng, Yongchao Xu

Deep learning has made significant progress in computer vision, specifically in image classification, object detection, and semantic segmentation.

Image Classification Image Reconstruction +5

Graphic Design with Large Multimodal Model

1 code implementation22 Apr 2024 Yutao Cheng, Zhao Zhang, Maoke Yang, Hui Nie, Chunyuan Li, Xinglong Wu, Jie Shao

One existing practice is Graphic Layout Generation (GLG), which aims to layout sequential design elements.

Layout Generation model

DiffI2I: Efficient Diffusion Model for Image-to-Image Translation

no code implementations26 Aug 2023 Bin Xia, Yulun Zhang, Shiyin Wang, Yitong Wang, Xinglong Wu, Yapeng Tian, Wenming Yang, Radu Timotfe, Luc van Gool

Compared to traditional DMs, the compact IPR enables DiffI2I to obtain more accurate outcomes and employ a lighter denoising network and fewer iterations.

Denoising Image-to-Image Translation +2

DiffIR: Efficient Diffusion Model for Image Restoration

1 code implementation ICCV 2023 Bin Xia, Yulun Zhang, Shiyin Wang, Yitong Wang, Xinglong Wu, Yapeng Tian, Wenming Yang, Luc van Gool

Diffusion model (DM) has achieved SOTA performance by modeling the image synthesis process into a sequential application of a denoising network.

Denoising Image Generation +2

Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring

1 code implementation CVPR 2023 Ruyang Liu, Jingjia Huang, Ge Li, Jiashi Feng, Xinglong Wu, Thomas H. Li

In this paper, based on the CLIP model, we revisit temporal modeling in the context of image-to-video knowledge transferring, which is the key point for extending image-text pretrained models to the video domain.

Ranked #7 on Video Retrieval on MSR-VTT-1kA (using extra training data)

Representation Learning Text Retrieval +2

Class Prototype-based Cleaner for Label Noise Learning

1 code implementation21 Dec 2022 Jingjia Huang, Yuanqi Chen, Jiashi Feng, Xinglong Wu

Semi-supervised learning based methods are current SOTA solutions to the noisy-label learning problem, which rely on learning an unsupervised label cleaner first to divide the training samples into a labeled set for clean data and an unlabeled set for noise data.

Ranked #4 on Image Classification on Clothing1M (using extra training data)

Image Classification

Clover: Towards A Unified Video-Language Alignment and Fusion Model

1 code implementation CVPR 2023 Jingjia Huang, Yinan Li, Jiashi Feng, Xinglong Wu, Xiaoshuai Sun, Rongrong Ji

We then introduce \textbf{Clover}\textemdash a Correlated Video-Language pre-training method\textemdash towards a universal Video-Language model for solving multiple video understanding tasks with neither performance nor efficiency compromise.

Language Modeling Language Modelling +11

CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP

2 code implementations1 Mar 2022 ZiHao Wang, Wei Liu, Qian He, Xinglong Wu, Zili Yi

Once trained, the transformer can generate coherent image tokens based on the text embedding extracted from the text encoder of CLIP upon an input text.

Text-to-Image Generation

Cannot find the paper you are looking for? You can Submit a new open access paper.