Search Results for author: Yuzhong Zhao

Found 11 papers, 9 papers with code

ControlCap: Controllable Region-level Captioning

1 code implementation • 31 Jan 2024 • Yuzhong Zhao, Yue Liu, Zonghao Guo, Weijia Wu, Chen Gong, Fang Wan, Qixiang Ye

The multimodal model is constrained to generate captions within a few sub-spaces containing the control words, which increases the opportunity of hitting less frequent captions, alleviating the caption degeneration issue.

Ranked #1 on Dense Captioning on Visual Genome

Dense Captioning

Paper
Code

VMamba: Visual State Space Model

2 code implementations • 18 Jan 2024 • Yue Liu, Yunjie Tian, Yuzhong Zhao, Hongtian Yu, Lingxi Xie, YaoWei Wang, Qixiang Ye, Yunfan Liu

Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) have long been the predominant backbone networks for visual representation learning.

Computational Efficiency Representation Learning

1,468

Paper
Code

Continual Learning for Image Segmentation with Dynamic Query

1 code implementation • 29 Nov 2023 • Weijia Wu, Yuzhong Zhao, Zhuang Li, Lianlei Shan, Hong Zhou, Mike Zheng Shou

Image segmentation based on continual learning exhibits a critical drop of performance, mainly due to catastrophic forgetting and background shift, as they are required to incorporate new classes continually.

Continual Learning Image Segmentation +5

Paper
Code

PrivImage: Differentially Private Synthetic Image Generation using Diffusion Models with Semantic-Aware Pretraining

1 code implementation • 19 Oct 2023 • Kecen Li, Chen Gong, Zhixiang Li, Yuzhong Zhao, Xinwen Hou, Tianhao Wang

Then, this function assists in querying the semantic distribution of the sensitive dataset, facilitating the selection of data from the public dataset with analogous semantics for pre-training.

Image Generation

Paper
Code

DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion Models

1 code implementation • NeurIPS 2023 • Weijia Wu, Yuzhong Zhao, Hao Chen, YuChao Gu, Rui Zhao, Yefei He, Hong Zhou, Mike Zheng Shou, Chunhua Shen

To showcase the power of the proposed approach, we generate datasets with rich dense pixel-wise labels for a wide range of downstream tasks, including semantic segmentation, instance segmentation, and depth estimation.

Depth Estimation Domain Generalization +5

281

Paper
Code

Generative Prompt Model for Weakly Supervised Object Localization

1 code implementation • ICCV 2023 • Yuzhong Zhao, Qixiang Ye, Weijia Wu, Chunhua Shen, Fang Wan

During training, GenPromp converts image category labels to learnable prompt embeddings which are fed to a generative model to conditionally recover the input image with noise and learn representative embeddings.

Ranked #1 on Weakly-Supervised Object Localization on CUB-200-2011 (Top-1 Localization Accuracy metric, using extra training data)

Image Denoising Language Modelling +2

Paper
Code

A Large Cross-Modal Video Retrieval Dataset with Reading Comprehension

1 code implementation • 5 May 2023 • Weijia Wu, Yuzhong Zhao, Zhuang Li, Jiahong Li, Hong Zhou, Mike Zheng Shou, Xiang Bai

Most existing cross-modal language-to-video retrieval (VR) research focuses on single-modal input from video, i. e., visual representation, while the text is omnipresent in human environments and frequently critical to understand video.

Reading Comprehension Retrieval +2

Paper
Code

FlowText: Synthesizing Realistic Scene Text Video with Optical Flow Estimation

1 code implementation • 5 May 2023 • Yuzhong Zhao, Weijia Wu, Zhuang Li, Jiahong Li, Weiqiang Wang

This paper introduces a novel video text synthesis technique called FlowText, which utilizes optical flow estimation to synthesize a large amount of text video data at a low cost for training robust video text spotters.

Optical Flow Estimation Text Spotting

Paper
Code

ICDAR 2023 Video Text Reading Competition for Dense and Small Text

no code implementations • 10 Apr 2023 • Weijia Wu, Yuzhong Zhao, Zhuang Li, Jiahong Li, Mike Zheng Shou, Umapada Pal, Dimosthenis Karatzas, Xiang Bai

In this competition report, we establish a video text reading benchmark, DSText, which focuses on dense and small text reading challenges in the video with various scenarios.

Task 2 Text Detection +2

Paper
Add Code

DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic Segmentation Using Diffusion Models

1 code implementation • ICCV 2023 • Weijia Wu, Yuzhong Zhao, Mike Zheng Shou, Hong Zhou, Chunhua Shen

In contrast, synthetic data can be freely available using a generative model (e. g., DALL-E, Stable Diffusion).

Image Generation Semantic Segmentation

Paper
Code

Explore Faster Localization Learning For Scene Text Detection

no code implementations • 4 Jul 2022 • Yuzhong Zhao, Yuanqiang Cai, Weijia Wu, Weiqiang Wang

Generally pre-training and long-time training computation are necessary for obtaining a good-performance text detector based on deep networks.

Scene Text Detection Text Detection

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.