Search Results for author: Hongfa Wang

Found 19 papers, 11 papers with code

Inverse-like Antagonistic Scene Text Spotting via Reading-Order Estimation and Dynamic Sampling

no code implementations8 Jan 2024 Shi-Xue Zhang, Chun Yang, Xiaobin Zhu, Hongyang Zhou, Hongfa Wang, Xu-Cheng Yin

Specifically, we propose an innovative reading-order estimation module (REM) that extracts reading-order information from the initial text boundary generated by an initial boundary module (IBM).

Text Detection Text Spotting

Global and Local Semantic Completion Learning for Vision-Language Pre-training

1 code implementation12 Jun 2023 Rong-Cheng Tu, Yatai Ji, Jie Jiang, Weijie Kong, Chengfei Cai, Wenzhe Zhao, Hongfa Wang, Yujiu Yang, Wei Liu

MGSC promotes learning more representative global features, which have a great impact on the performance of downstream tasks, while MLTC reconstructs modal-fusion local tokens, further enhancing accurate comprehension of multimodal data.

Language Modelling Masked Language Modeling +5

Img2Vec: A Teacher of High Token-Diversity Helps Masked AutoEncoders

no code implementations25 Apr 2023 Heng Pan, Chenyang Liu, Wenxiao Wang, Li Yuan, Hongfa Wang, Zhifeng Li, Wei Liu

To study which type of deep features is appropriate for MIM as a learning target, we propose a simple MIM framework with serials of well-trained self-supervised models to convert an Image to a feature Vector as the learning target of MIM, where the feature extractor is also known as a teacher model.

Attribute Vocal Bursts Intensity Prediction

Unsupervised Hashing with Semantic Concept Mining

1 code implementation23 Sep 2022 Rong-Cheng Tu, Xian-Ling Mao, Kevin Qinghong Lin, Chengfei Cai, Weize Qin, Hongfa Wang, Wei Wei, Heyan Huang

Recently, to improve the unsupervised image retrieval performance, plenty of unsupervised hashing methods have been proposed by designing a semantic similarity matrix, which is based on the similarities between image features extracted by a pre-trained CNN model.

Image Retrieval Prompt Engineering +4

Adaptive Perception Transformer for Temporal Action Localization

no code implementations25 Aug 2022 Yizheng Ouyang, Tianjin Zhang, Weibo Gu, Hongfa Wang

Besides, their multi-stage designs cannot generate action boundaries and categories straightforwardly.

Temporal Action Localization

DPTNet: A Dual-Path Transformer Architecture for Scene Text Detection

no code implementations21 Aug 2022 Jingyu Lin, Jie Jiang, Yan Yan, Chunchao Guo, Hongfa Wang, Wei Liu, Hanzi Wang

We further propose a parallel design that integrates the convolutional network with a powerful self-attention mechanism to provide complementary clues between the attention path and convolutional path.

Scene Text Detection Text Detection

Boosting Multi-Modal E-commerce Attribute Value Extraction via Unified Learning Scheme and Dynamic Range Minimization

no code implementations15 Jul 2022 Mengyin Liu, Chao Zhu, Hongyu Gao, Weibo Gu, Hongfa Wang, Wei Liu, Xu-Cheng Yin

2) Secondly, a text-guided information range minimization method is proposed to adaptively encode descriptive parts of each modality into an identical space with a powerful pretrained linguistic model.

Attribute Attribute Value Extraction +2

Egocentric Video-Language Pretraining @ Ego4D Challenge 2022

1 code implementation4 Jul 2022 Kevin Qinghong Lin, Alex Jinpeng Wang, Mattia Soldan, Michael Wray, Rui Yan, Eric Zhongcong Xu, Difei Gao, RongCheng Tu, Wenzhe Zhao, Weijie Kong, Chengfei Cai, Hongfa Wang, Dima Damen, Bernard Ghanem, Wei Liu, Mike Zheng Shou

In this report, we propose a video-language pretraining (VLP) based solution \cite{kevin2022egovlp} for four Ego4D challenge tasks, including Natural Language Query (NLQ), Moment Query (MQ), Object State Change Classification (OSCC), and PNR Localization (PNR).

Language Modelling Object State Change Classification

Tencent Text-Video Retrieval: Hierarchical Cross-Modal Interactions with Multi-Level Representations

no code implementations7 Apr 2022 Jie Jiang, Shaobo Min, Weijie Kong, Dihong Gong, Hongfa Wang, Zhifeng Li, Wei Liu

With multi-level representations for video and text, hierarchical contrastive learning is designed to explore fine-grained cross-modal relationships, i. e., frame-word, clip-phrase, and video-sentence, which enables HCMI to achieve a comprehensive semantic comparison between video and text modalities.

 Ranked #1 on Video Retrieval on MSR-VTT-1kA (using extra training data)

Contrastive Learning Denoising +4

Deep Unsupervised Hashing with Latent Semantic Components

no code implementations17 Mar 2022 Qinghong Lin, Xiaojun Chen, Qin Zhang, Shaotian Cai, Wenzhe Zhao, Hongfa Wang

Firstly, DSCH constructs a semantic component structure by uncovering the fine-grained semantics components of images with a Gaussian Mixture Modal~(GMM), where an image is represented as a mixture of multiple components, and the semantics co-occurrence are exploited.

Common Sense Reasoning Image Retrieval +1

Adaptive Boundary Proposal Network for Arbitrary Shape Text Detection

1 code implementation ICCV 2021 Shi-Xue Zhang, Xiaobin Zhu, Chun Yang, Hongfa Wang, Xu-Cheng Yin

In this work, we propose a novel adaptive boundary proposal network for arbitrary shape text detection, which can learn to directly produce accurate boundary for arbitrary shape text without any post-processing.

Text Detection

AdaDNNs: Adaptive Ensemble of Deep Neural Networks for Scene Text Recognition

no code implementations10 Oct 2017 Chun Yang, Xu-Cheng Yin, Zejun Li, Jianwei Wu, Chunchao Guo, Hongfa Wang, Lei Xiao

Recognizing text in the wild is a really challenging task because of complex backgrounds, various illuminations and diverse distortions, even with deep neural networks (convolutional neural networks and recurrent neural networks).

Scene Text Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.