Search Results for author: Weijia Wu

Found 29 papers, 17 papers with code

TextCohesion: Detecting Text for Arbitrary Shapes

no code implementations22 Apr 2019 Weijia Wu, Jici Xing, Hong Zhou

In this paper, we propose a pixel-wise method named TextCohesion for scene text detection, which splits a text instance into five key components: a Text Skeleton and four Directional Pixel Regions.

Curved Text Detection Text Detection

Synthetic-to-Real Unsupervised Domain Adaptation for Scene Text Detection in the Wild

1 code implementation3 Sep 2020 Weijia Wu, Ning Lu, Enze Xie

To address the severe domain distribution mismatch, we propose a synthetic-to-real domain adaptation method for scene text detection, which transfers knowledge from synthetic data (source domain) to real data (target domain).

Adversarial Text Scene Text Detection +2

Polygon-free: Unconstrained Scene Text Detection with Box Annotations

1 code implementation26 Nov 2020 Weijia Wu, Enze Xie, Ruimao Zhang, Wenhai Wang, Hong Zhou, Ping Luo

For example, without using polygon annotations, PSENet achieves an 80. 5% F-score on TotalText [3] (vs. 80. 9% of fully supervised counterpart), 31. 1% better than training directly with upright bounding box annotations, and saves 80%+ labeling costs.

Scene Text Detection Text Detection

Contrastive Learning of Semantic and Visual Representations for Text Tracking

1 code implementation30 Dec 2021 Zhuang Li, Weijia Wu, Mike Zheng Shou, Jiahong Li, Size Li, Zhongyuan Wang, Hong Zhou

Semantic representation is of great benefit to the video text tracking(VTT) task that requires simultaneously classifying, detecting, and tracking texts in the video.

Contrastive Learning

End-to-End Video Text Spotting with Transformer

1 code implementation20 Mar 2022 Weijia Wu, Yuanqiang Cai, Chunhua Shen, Debing Zhang, Ying Fu, Hong Zhou, Ping Luo

Recent video text spotting methods usually require the three-staged pipeline, i. e., detecting text in individual images, recognizing localized text, tracking text streams with post-processing to generate final results.

Text Detection Text Spotting

Data-Free Quantization with Accurate Activation Clipping and Adaptive Batch Normalization

no code implementations8 Apr 2022 Yefei He, Luoming Zhang, Weijia Wu, Hong Zhou

In this paper, we present a simple yet effective data-free quantization method with accurate activation clipping and adaptive batch normalization.

Data Free Quantization

Binarizing by Classification: Is soft function really necessary?

no code implementations16 May 2022 Yefei He, Luoming Zhang, Weijia Wu, Hong Zhou

Extensive experiments demonstrate that the proposed method yields surprising performance both in image classification and human pose estimation tasks.

 Ranked #1 on Binarization on ImageNet (Top 1 Accuracy metric)

Binarization Binary Classification +3

Explore Faster Localization Learning For Scene Text Detection

no code implementations4 Jul 2022 Yuzhong Zhao, Yuanqiang Cai, Weijia Wu, Weiqiang Wang

Generally pre-training and long-time training computation are necessary for obtaining a good-performance text detector based on deep networks.

Scene Text Detection Text Detection

BiViT: Extremely Compressed Binary Vision Transformer

no code implementations14 Nov 2022 Yefei He, Zhenyu Lou, Luoming Zhang, Jing Liu, Weijia Wu, Hong Zhou, Bohan Zhuang

To solve this, we propose Softmax-aware Binarization, which dynamically adapts to the data distribution and reduces the error caused by binarization.

Binarization object-detection +1

BiViT: Extremely Compressed Binary Vision Transformers

no code implementations ICCV 2023 Yefei He, Zhenyu Lou, Luoming Zhang, Jing Liu, Weijia Wu, Hong Zhou, Bohan Zhuang

To solve this, we propose Softmax-aware Binarization, which dynamically adapts to the data distribution and reduces the error caused by binarization.

Binarization object-detection +1

POSGen: Personalized Opening Sentence Generation for Online Insurance Sales

no code implementations10 Feb 2023 Yu Li, Yi Zhang, Weijia Wu, Zimu Zhou, Qiang Li

Such personalized opening sentence generation is challenging because (i) there are limited historical samples for conversation topic recommendation in online insurance sales and (ii) existing text generation schemes often fail to support customized topic ordering based on user preferences.

Chatbot Management +2

ICDAR 2023 Video Text Reading Competition for Dense and Small Text

no code implementations10 Apr 2023 Weijia Wu, Yuzhong Zhao, Zhuang Li, Jiahong Li, Mike Zheng Shou, Umapada Pal, Dimosthenis Karatzas, Xiang Bai

In this competition report, we establish a video text reading benchmark, DSText, which focuses on dense and small text reading challenges in the video with various scenarios.

Task 2 Text Detection +2

FlowText: Synthesizing Realistic Scene Text Video with Optical Flow Estimation

1 code implementation5 May 2023 Yuzhong Zhao, Weijia Wu, Zhuang Li, Jiahong Li, Weiqiang Wang

This paper introduces a novel video text synthesis technique called FlowText, which utilizes optical flow estimation to synthesize a large amount of text video data at a low cost for training robust video text spotters.

Optical Flow Estimation Text Spotting

A Large Cross-Modal Video Retrieval Dataset with Reading Comprehension

1 code implementation5 May 2023 Weijia Wu, Yuzhong Zhao, Zhuang Li, Jiahong Li, Hong Zhou, Mike Zheng Shou, Xiang Bai

Most existing cross-modal language-to-video retrieval (VR) research focuses on single-modal input from video, i. e., visual representation, while the text is omnipresent in human environments and frequently critical to understand video.

Reading Comprehension Retrieval +2

Generative Prompt Model for Weakly Supervised Object Localization

1 code implementation ICCV 2023 Yuzhong Zhao, Qixiang Ye, Weijia Wu, Chunhua Shen, Fang Wan

During training, GenPromp converts image category labels to learnable prompt embeddings which are fed to a generative model to conditionally recover the input image with noise and learn representative embeddings.

 Ranked #1 on Weakly-Supervised Object Localization on CUB-200-2011 (Top-1 Localization Accuracy metric, using extra training data)

Image Denoising Language Modelling +2

DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion Models

1 code implementation NeurIPS 2023 Weijia Wu, Yuzhong Zhao, Hao Chen, YuChao Gu, Rui Zhao, Yefei He, Hong Zhou, Mike Zheng Shou, Chunhua Shen

To showcase the power of the proposed approach, we generate datasets with rich dense pixel-wise labels for a wide range of downstream tasks, including semantic segmentation, instance segmentation, and depth estimation.

Depth Estimation Domain Generalization +5

EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models

no code implementations5 Oct 2023 Yefei He, Jing Liu, Weijia Wu, Hong Zhou, Bohan Zhuang

While PTQ exhibits efficiency in terms of both time and data usage, it may lead to diminished performance in low bit-width.

Denoising Image Generation +1

Dual Grained Quantization: Efficient Fine-Grained Quantization for LLM

no code implementations7 Oct 2023 Luoming Zhang, Wen Fei, Weijia Wu, Yefei He, Zhenyu Lou, Hong Zhou

Fine-grained quantization has smaller quantization loss, consequently achieving superior performance.

Quantization

MotionDirector: Motion Customization of Text-to-Video Diffusion Models

1 code implementation12 Oct 2023 Rui Zhao, YuChao Gu, Jay Zhangjie Wu, David Junhao Zhang, Jiawei Liu, Weijia Wu, Jussi Keppo, Mike Zheng Shou

Given a set of video clips of the same motion concept, the task of Motion Customization is to adapt existing text-to-video diffusion models to generate videos with this motion.

Paragraph-to-Image Generation with Information-Enriched Diffusion Model

1 code implementation24 Nov 2023 Weijia Wu, Zhuang Li, Yefei He, Mike Zheng Shou, Chunhua Shen, Lele Cheng, Yan Li, Tingting Gao, Di Zhang, Zhongyuan Wang

In this paper, we introduce an information-enriched diffusion model for paragraph-to-image generation task, termed ParaDiffusion, which delves into the transference of the extensive semantic comprehension capabilities of large language models to the task of image generation.

Image Generation Language Modelling +1

Continual Learning for Image Segmentation with Dynamic Query

1 code implementation29 Nov 2023 Weijia Wu, Yuzhong Zhao, Zhuang Li, Lianlei Shan, Hong Zhou, Mike Zheng Shou

Image segmentation based on continual learning exhibits a critical drop of performance, mainly due to catastrophic forgetting and background shift, as they are required to incorporate new classes continually.

Continual Learning Image Segmentation +5

ControlCap: Controllable Region-level Captioning

1 code implementation31 Jan 2024 Yuzhong Zhao, Yue Liu, Zonghao Guo, Weijia Wu, Chen Gong, Fang Wan, Qixiang Ye

The multimodal model is constrained to generate captions within a few sub-spaces containing the control words, which increases the opportunity of hitting less frequent captions, alleviating the caption degeneration issue.

Dense Captioning

Towards Accurate Post-training Quantization for Reparameterized Models

1 code implementation25 Feb 2024 Luoming Zhang, Yefei He, Wen Fei, Zhenyu Lou, Weijia Wu, YangWei Ying, Hong Zhou

Our framework outperforms previous methods by approximately 1\% for 8-bit PTQ and 2\% for 6-bit PTQ, showcasing its superior performance.

Quantization

DragAnything: Motion Control for Anything using Entity Representation

2 code implementations12 Mar 2024 Weijia Wu, Zhuang Li, YuChao Gu, Rui Zhao, Yefei He, David Junhao Zhang, Mike Zheng Shou, Yan Li, Tingting Gao, Di Zhang

We introduce DragAnything, which utilizes a entity representation to achieve motion control for any object in controllable video generation.

Object Video Generation

Cannot find the paper you are looking for? You can Submit a new open access paper.