Search Results for author: Yuzhong Zhao

Found 18 papers, 13 papers with code

From Easy to Hard: Building a Shortcut for Differentially Private Image Synthesis

1 code implementation2 Apr 2025 Kecen Li, Chen Gong, Xiaochen Li, Yuzhong Zhao, Xinwen Hou, Tianhao Wang

In this work, inspired by curriculum learning, we propose a two-stage DP image synthesis framework, where diffusion models learn to generate DP synthetic images from easy to hard.

Image Generation

Finger in Camera Speaks Everything: Unconstrained Air-Writing for Real-World

1 code implementation27 Dec 2024 Meiqi Wu, Kaiqi Huang, Yuanqiang Cai, Shiyu Hu, Yuzhong Zhao, Weiqiang Wang

This dataset captures handwritten trajectories in various real-world scenarios using commonly accessible RGB cameras, eliminating the need for complex sensors.

Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model

no code implementations28 Nov 2024 Feng Liu, Shiwei Zhang, XiaoFeng Wang, Yujie Wei, Haonan Qiu, Yuzhong Zhao, Yingya Zhang, Qixiang Ye, Fang Wan

As a fundamental backbone for video generation, diffusion models are challenged by low inference speed due to the sequential nature of denoising.

Denoising Video Generation

Evaluation of Text-to-Video Generation Models: A Dynamics Perspective

1 code implementation1 Jul 2024 Mingxiang Liao, Hannan Lu, Xinyu Zhang, Fang Wan, Tianyu Wang, Yuzhong Zhao, WangMeng Zuo, Qixiang Ye, Jingdong Wang

For this purpose, we establish a new benchmark comprising text prompts that fully reflect multiple dynamics grades, and define a set of dynamics scores corresponding to various temporal granularities to comprehensively evaluate the dynamics of each generated video.

Text-to-Video Generation Video Generation

DynRefer: Delving into Region-level Multi-modality Tasks via Dynamic Resolution

1 code implementation25 May 2024 Yuzhong Zhao, Feng Liu, Yue Liu, Mingxiang Liao, Chen Gong, Qixiang Ye, Fang Wan

Unfortunately, most of existing methods using fixed visual inputs remain lacking the resolution adaptability to find out precise language descriptions.

Attribute

ControlCap: Controllable Region-level Captioning

1 code implementation31 Jan 2024 Yuzhong Zhao, Yue Liu, Zonghao Guo, Weijia Wu, Chen Gong, Fang Wan, Qixiang Ye

The multimodal model is constrained to generate captions within a few sub-spaces containing the control words, which increases the opportunity of hitting less frequent captions, alleviating the caption degeneration issue.

Dense Captioning

VMamba: Visual State Space Model

12 code implementations18 Jan 2024 Yue Liu, Yunjie Tian, Yuzhong Zhao, Hongtian Yu, Lingxi Xie, YaoWei Wang, Qixiang Ye, Jianbin Jiao, Yunfan Liu

At the core of VMamba is a stack of Visual State-Space (VSS) blocks with the 2D Selective Scan (SS2D) module.

Computational Efficiency Language Modeling +4

Continual Learning for Image Segmentation with Dynamic Query

1 code implementation29 Nov 2023 Weijia Wu, Yuzhong Zhao, Zhuang Li, Lianlei Shan, Hong Zhou, Mike Zheng Shou

Image segmentation based on continual learning exhibits a critical drop of performance, mainly due to catastrophic forgetting and background shift, as they are required to incorporate new classes continually.

Continual Learning Diversity +6

PrivImage: Differentially Private Synthetic Image Generation using Diffusion Models with Semantic-Aware Pretraining

1 code implementation19 Oct 2023 Kecen Li, Chen Gong, Zhixiang Li, Yuzhong Zhao, Xinwen Hou, Tianhao Wang

Then, this function assists in querying the semantic distribution of the sensitive dataset, facilitating the selection of data from the public dataset with analogous semantics for pre-training.

Image Generation

DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion Models

1 code implementation NeurIPS 2023 Weijia Wu, Yuzhong Zhao, Hao Chen, YuChao Gu, Rui Zhao, Yefei He, Hong Zhou, Mike Zheng Shou, Chunhua Shen

To showcase the power of the proposed approach, we generate datasets with rich dense pixel-wise labels for a wide range of downstream tasks, including semantic segmentation, instance segmentation, and depth estimation.

Dataset Generation Decoder +7

Generative Prompt Model for Weakly Supervised Object Localization

1 code implementation ICCV 2023 Yuzhong Zhao, Qixiang Ye, Weijia Wu, Chunhua Shen, Fang Wan

During training, GenPromp converts image category labels to learnable prompt embeddings which are fed to a generative model to conditionally recover the input image with noise and learn representative embeddings.

 Ranked #1 on Weakly-Supervised Object Localization on CUB-200-2011 (Top-1 Localization Accuracy metric, using extra training data)

Image Denoising Language Modeling +4

A Large Cross-Modal Video Retrieval Dataset with Reading Comprehension

1 code implementation5 May 2023 Weijia Wu, Yuzhong Zhao, Zhuang Li, Jiahong Li, Hong Zhou, Mike Zheng Shou, Xiang Bai

Most existing cross-modal language-to-video retrieval (VR) research focuses on single-modal input from video, i. e., visual representation, while the text is omnipresent in human environments and frequently critical to understand video.

Reading Comprehension Retrieval +2

FlowText: Synthesizing Realistic Scene Text Video with Optical Flow Estimation

1 code implementation5 May 2023 Yuzhong Zhao, Weijia Wu, Zhuang Li, Jiahong Li, Weiqiang Wang

This paper introduces a novel video text synthesis technique called FlowText, which utilizes optical flow estimation to synthesize a large amount of text video data at a low cost for training robust video text spotters.

Optical Flow Estimation Text Spotting

ICDAR 2023 Video Text Reading Competition for Dense and Small Text

no code implementations10 Apr 2023 Weijia Wu, Yuzhong Zhao, Zhuang Li, Jiahong Li, Mike Zheng Shou, Umapada Pal, Dimosthenis Karatzas, Xiang Bai

In this competition report, we establish a video text reading benchmark, DSText, which focuses on dense and small text reading challenges in the video with various scenarios.

Task 2 Text Detection +2

Explore Faster Localization Learning For Scene Text Detection

no code implementations4 Jul 2022 Yuzhong Zhao, Yuanqiang Cai, Weijia Wu, Weiqiang Wang

Generally pre-training and long-time training computation are necessary for obtaining a good-performance text detector based on deep networks.

Scene Text Detection Text Detection

Cannot find the paper you are looking for? You can Submit a new open access paper.