Search Results for author: Zhongang Qi

Found 30 papers, 12 papers with code

SphereDiffusion: Spherical Geometry-Aware Distortion Resilient Diffusion Model

no code implementations15 Mar 2024 Tao Wu, XueWei Li, Zhongang Qi, Di Hu, Xintao Wang, Ying Shan, Xi Li

Controllable spherical panoramic image generation holds substantial applicative potential across a variety of domains. However, it remains a challenging task due to the inherent spherical distortion and geometry characteristics, resulting in low-quality content generation. In this paper, we introduce a novel framework of SphereDiffusion to address these unique challenges, for better generating high-quality and precisely controllable spherical panoramic images. For the spherical distortion characteristic, we embed the semantics of the distorted object with text encoding, then explicitly construct the relationship with text-object correspondence to better use the pre-trained knowledge of the planar images. Meanwhile, we employ a deformable technique to mitigate the semantic deviation in latent space caused by spherical distortion. For the spherical geometry characteristic, in virtue of spherical rotation invariance, we improve the data diversity and optimization objectives in the training process, enabling the model to better learn the spherical geometry characteristic. Furthermore, we enhance the denoising process of the diffusion model, enabling it to effectively use the learned geometric characteristic to ensure the boundary continuity of the generated images. With these specific techniques, experiments on Structured3D dataset show that SphereDiffusion significantly improves the quality of controllable spherical image generation and relatively reduces around 35% FID on average.

Denoising Image Generation

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

1 code implementation7 Dec 2023 Zhen Li, Mingdeng Cao, Xintao Wang, Zhongang Qi, Ming-Ming Cheng, Ying Shan

Recent advances in text-to-image generation have made remarkable progress in synthesizing realistic human photos conditioned on given text prompts.

Diffusion Personalization Tuning Free Text-to-Image Generation

StyleAdapter: A Single-Pass LoRA-Free Model for Stylized Image Generation

no code implementations4 Sep 2023 Zhouxia Wang, Xintao Wang, Liangbin Xie, Zhongang Qi, Ying Shan, Wenping Wang, Ping Luo

StyleAdapter can generate high-quality images that match the content of the prompts and adopt the style of the references (even for unseen styles) in a single pass, which is more flexible and efficient than previous methods.

Image Generation

Towards Unseen Triples: Effective Text-Image-joint Learning for Scene Graph Generation

no code implementations23 Jun 2023 Qianji Di, Wenxi Ma, Zhongang Qi, Tianxiang Hou, Ying Shan, Hanzi Wang

In this work, we propose a Text-Image-joint Scene Graph Generation (TISGG) model to resolve the unseen triples and improve the generalisation capability of the SGG models.

Graph Generation Scene Graph Generation +1

SGAT4PASS: Spherical Geometry-Aware Transformer for PAnoramic Semantic Segmentation

1 code implementation6 Jun 2023 XueWei Li, Tao Wu, Zhongang Qi, Gaoang Wang, Ying Shan, Xi Li

Experimental results on Stanford2D3D Panoramic datasets show that SGAT4PASS significantly improves performance and robustness, with approximately a 2% increase in mIoU, and when small 3D disturbances occur in the data, the stability of our performance is improved by an order of magnitude.

Semantic Segmentation

MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing

3 code implementations ICCV 2023 Mingdeng Cao, Xintao Wang, Zhongang Qi, Ying Shan, XiaoHu Qie, Yinqiang Zheng

Despite the success in large-scale text-to-image generation and text-conditioned image editing, existing methods still struggle to produce consistent generation and editing results.

Text-based Image Editing

LayoutDiffusion: Controllable Diffusion Model for Layout-to-image Generation

2 code implementations CVPR 2023 Guangcong Zheng, Xianpan Zhou, XueWei Li, Zhongang Qi, Ying Shan, Xi Li

To overcome the difficult multimodal fusion of image and layout, we propose to construct a structural image patch with region information and transform the patched image into a special layout to fuse with the normal layout in a unified form.

Layout-to-Image Generation Object

T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models

2 code implementations16 Feb 2023 Chong Mou, Xintao Wang, Liangbin Xie, Yanze Wu, Jian Zhang, Zhongang Qi, Ying Shan, XiaoHu Qie

In this paper, we aim to ``dig out" the capabilities that T2I models have implicitly learned, and then explicitly use them to control the generation more granularly.

Image Generation Style Transfer

ViLEM: Visual-Language Error Modeling for Image-Text Retrieval

no code implementations CVPR 2023 Yuxin Chen, Zongyang Ma, Ziqi Zhang, Zhongang Qi, Chunfeng Yuan, Ying Shan, Bing Li, Weiming Hu, XiaoHu Qie, Jianping Wu

ViLEM then enforces the model to discriminate the correctness of each word in the plausible negative texts and further correct the wrong words via resorting to image information.

Contrastive Learning Retrieval +3

Do we really need temporal convolutions in action segmentation?

1 code implementation26 May 2022 Dazhao Du, Bing Su, Yu Li, Zhongang Qi, Lingyu Si, Ying Shan

Most state-of-the-art methods focus on designing temporal convolution-based models, but the inflexibility of temporal convolutions and the difficulties in modeling long-term temporal dependencies restrict the potential of these models.

Action Classification Action Segmentation +1

Accelerating the Training of Video Super-Resolution Models

no code implementations10 May 2022 Lijian Lin, Xintao Wang, Zhongang Qi, Ying Shan

In this work, we show that it is possible to gradually train video models from small to large spatial/temporal sizes, i. e., in an easy-to-hard manner.

Video Super-Resolution

CREATE: A Benchmark for Chinese Short Video Retrieval and Title Generation

no code implementations31 Mar 2022 Ziqi Zhang, Yuxin Chen, Zongyang Ma, Zhongang Qi, Chunfeng Yuan, Bing Li, Ying Shan, Weiming Hu

In this paper, we propose to CREATE, the first large-scale Chinese shoRt vidEo retrievAl and Title gEneration benchmark, to facilitate research and application in video titling and video retrieval in Chinese.

Retrieval Video Captioning +1

BTS: A Bi-Lingual Benchmark for Text Segmentation in the Wild

no code implementations CVPR 2022 Xixi Xu, Zhongang Qi, jianqi ma, Honglun Zhang, Ying Shan, XiaoHu Qie

Current researches mainly focus on only English characters and digits, while few work studies Chinese characters due to the lack of public large-scale and high-quality Chinese datasets, which limits the practical application scenarios of text segmentation.

Segmentation Style Transfer +2

From Heatmaps to Structural Explanations of Image Classifiers

no code implementations13 Sep 2021 Li Fuxin, Zhongang Qi, Saeed Khorram, Vivswan Shitole, Prasad Tadepalli, Minsuk Kahng, Alan Fern

This paper summarizes our endeavors in the past few years in terms of explaining image classifiers, with the aim of including negative results and insights we have gained.

Finding Discriminative Filters for Specific Degradations in Blind Super-Resolution

1 code implementation NeurIPS 2021 Liangbin Xie, Xintao Wang, Chao Dong, Zhongang Qi, Ying Shan

Unlike previous integral gradient methods, our FAIG aims at finding the most discriminative filters instead of input pixels/features for degradation removal in blind SR networks.

Blind Super-Resolution Super-Resolution

Stochastic Block-ADMM for Training Deep Networks

no code implementations1 May 2021 Saeed Khorram, Xiao Fu, Mohamad H. Danesh, Zhongang Qi, Li Fuxin

We prove the convergence of our proposed method and justify its capabilities through experiments in supervised and weakly-supervised settings.

Open-book Video Captioning with Retrieve-Copy-Generate Network

no code implementations CVPR 2021 Ziqi Zhang, Zhongang Qi, Chunfeng Yuan, Ying Shan, Bing Li, Ying Deng, Weiming Hu

Due to the rapid emergence of short videos and the requirement for content understanding and creation, the video captioning task has received increasing attention in recent years.

Retrieval Video Captioning

Visualizing Point Cloud Classifiers by Curvature Smoothing

1 code implementation23 Nov 2019 Chen Ziwen, Wenxuan Wu, Zhongang Qi, Li Fuxin

In this paper, we propose a novel approach to visualize features important to the point cloud classifiers.

Data Augmentation General Classification

Visualizing Deep Networks by Optimizing with Integrated Gradients

1 code implementation2 May 2019 Zhongang Qi, Saeed Khorram, Li Fuxin

Understanding and interpreting the decisions made by deep learning models is valuable in many domains.

Interactive Naming for Explaining Deep Neural Networks: A Formative Study

no code implementations18 Dec 2018 Mandana Hamidi-Haines, Zhongang Qi, Alan Fern, Fuxin Li, Prasad Tadepalli

For this purpose, we developed a user interface for "interactive naming," which allows a human annotator to manually cluster significant activation maps in a test set into meaningful groups called "visual concepts".

General Classification

PointConv: Deep Convolutional Networks on 3D Point Clouds

9 code implementations CVPR 2019 Wenxuan Wu, Zhongang Qi, Li Fuxin

Besides, our experiments converting CIFAR-10 into a point cloud showed that networks built on PointConv can match the performance of convolutional networks in 2D images of a similar structure.

3D Part Segmentation 3D Point Cloud Classification +1

Deep Air Learning: Interpolation, Prediction, and Feature Analysis of Fine-grained Air Quality

no code implementations2 Nov 2017 Zhongang Qi, Tianchun Wang, Guojie Song, Weisong Hu, Xi Li, Zhongfei, Zhang

The interpolation, prediction, and feature analysis of fine-gained air quality are three important topics in the area of urban air computing.

feature selection

Embedding Deep Networks into Visual Explanations

no code implementations15 Sep 2017 Zhongang Qi, Saeed Khorram, Fuxin Li

The XNN works by learning a nonlinear embedding of a high-dimensional activation vector of a deep network layer into a low-dimensional explanation space while retaining faithfulness i. e., the original deep learning predictions can be constructed from the few concepts extracted by our explanation network.

Image Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.