Search Results for author: Xiaoshi Wu

Found 8 papers, 7 papers with code

CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

1 code implementation4 Apr 2024 Dongzhi Jiang, Guanglu Song, Xiaoshi Wu, Renrui Zhang, Dazhong Shen, Zhuofan Zong, Yu Liu, Hongsheng Li

We further attribute this phenomenon to the diffusion model's insufficient condition utilization, which is caused by its training paradigm.

Attribute Image Captioning +1

ECNet: Effective Controllable Text-to-Image Diffusion Models

no code implementations27 Mar 2024 Sicheng Li, Keqiang Sun, Zhixin Lai, Xiaoshi Wu, Feng Qiu, Haoran Xie, Kazunori Miyata, Hongsheng Li

Secondly, to overcome the issue of limited conditional supervision, we introduce Diffusion Consistency Loss (DCL), which applies supervision on the denoised latent code at any given time step.

Denoising Text-to-Image Generation

Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation

1 code implementation20 Mar 2024 Fu-Yun Wang, Xiaoshi Wu, Zhaoyang Huang, Xiaoyu Shi, Dazhong Shen, Guanglu Song, Yu Liu, Hongsheng Li

We introduce MOTIA Mastering Video Outpainting Through Input-Specific Adaptation, a diffusion-based pipeline that leverages both the intrinsic data-specific patterns of the source video and the image/video generative prior for effective outpainting.

Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis

1 code implementation15 Jun 2023 Xiaoshi Wu, Yiming Hao, Keqiang Sun, Yixiong Chen, Feng Zhu, Rui Zhao, Hongsheng Li

By fine-tuning CLIP on HPD v2, we obtain Human Preference Score v2 (HPS v2), a scoring model that can more accurately predict human preferences on generated images.

Image Generation

Human Preference Score: Better Aligning Text-to-Image Models with Human Preference

1 code implementation ICCV 2023 Xiaoshi Wu, Keqiang Sun, Feng Zhu, Rui Zhao, Hongsheng Li

To address this issue, we collect a dataset of human choices on generated images from the Stable Foundation Discord channel.

CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching

1 code implementation CVPR 2023 Xiaoshi Wu, Feng Zhu, Rui Zhao, Hongsheng Li

To overcome these obstacles, we propose CORA, a DETR-style framework that adapts CLIP for Open-vocabulary detection by Region prompting and Anchor pre-matching.

Ranked #6 on Open Vocabulary Object Detection on MSCOCO (using extra training data)

Described Object Detection object-detection +2

Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks

1 code implementation CVPR 2022 Xizhou Zhu, Jinguo Zhu, Hao Li, Xiaoshi Wu, Xiaogang Wang, Hongsheng Li, Xiaohua Wang, Jifeng Dai

The model is pre-trained on several uni-modal and multi-modal tasks, and evaluated on a variety of downstream tasks, including novel tasks that did not appear in the pre-training stage.

Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision

1 code implementation ICCV 2021 Xiaoshi Wu, Hadar Averbuch-Elor, Jin Sun, Noah Snavely

The abundance and richness of Internet photos of landmarks and cities has led to significant progress in 3D vision over the past two decades, including automated 3D reconstructions of the world's landmarks from tourist photos.

Descriptive Image Captioning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.