Search Results for author: Hongfa Wang

Found 27 papers, 16 papers with code

BalanceBenchmark: A Survey for Imbalanced Learning

1 code implementation15 Feb 2025 Shaoxuan Xu, Menglu Cui, Chengxiang Huang, Hongfa Wang, DiHu

Multimodal learning has gained attention for its capacity to integrate information from different modalities.

Survey

Follow-Your-Canvas: Higher-Resolution Video Outpainting with Extensive Content Generation

1 code implementation2 Sep 2024 Qihua Chen, Yue Ma, Hongfa Wang, Junkun Yuan, Wenzhe Zhao, Qi Tian, Hongmei Wang, Shaobo Min, Qifeng Chen, Wei Liu

Coupling with these two designs enables us to generate higher-resolution outpainting videos with rich content while keeping spatial and temporal consistency.

LayoutDiT: Exploring Content-Graphic Balance in Layout Generation with Diffusion Transformer

no code implementations21 Jul 2024 Yu Li, Yifan Chen, Gongye Liu, Fei Yin, Qingyan Bai, Jie Wu, Hongfa Wang, Ruihang Chu, Yujiu Yang

To address these challenges, we introduce LayoutDiT, an effective framework that balances content and graphic features to generate high-quality, visually appealing layouts.

Blocking

Video-Language Alignment via Spatio-Temporal Graph Transformer

1 code implementation16 Jul 2024 Shi-Xue Zhang, Hongfa Wang, Xiaobin Zhu, Weibo Gu, Tianjin Zhang, Chun Yang, Wei Liu, Xu-Cheng Yin

In this paper, we propose a novel Spatio-Temporal Graph Transformer module to uniformly learn spatial and temporal contexts for video-language alignment pre-training (dubbed STGT).

Contrastive Learning Question Answering +3

Follow-Your-Pose v2: Multiple-Condition Guided Character Image Animation for Stable Pose Control

no code implementations5 Jun 2024 Jingyun Xue, Hongfa Wang, Qi Tian, Yue Ma, Andong Wang, Zhiyuan Zhao, Shaobo Min, Wenzhe Zhao, Kaihao Zhang, Heung-Yeung Shum, Wei Liu, Mengyang Liu, Wenhan Luo

While existing character image animation methods using pose sequences and reference images have shown promising performance, they tend to struggle with incoherent animation in complex scenarios, such as multiple character animation and body occlusion.

Image Animation Video Generation

Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation

no code implementations4 Jun 2024 Yue Ma, Hongyu Liu, Hongfa Wang, Heng Pan, Yingqing He, Junkun Yuan, Ailing Zeng, Chengfei Cai, Heung-Yeung Shum, Wei Liu, Qifeng Chen

We present Follow-Your-Emoji, a diffusion-based framework for portrait animation, which animates a reference portrait with target landmark sequences.

Portrait Animation

CoHD: A Counting-Aware Hierarchical Decoding Framework for Generalized Referring Expression Segmentation

2 code implementations24 May 2024 Zhuoyan Luo, Yinghao Wu, Tianheng Cheng, Yong liu, Yicheng Xiao, Hongfa Wang, Xiao-Ping Zhang, Yujiu Yang

By decoupling the intricate referring semantics into different granularity with a visual-linguistic hierarchy, and dynamic aggregating it with intra- and inter-selection, CoHD boosts multi-granularity comprehension with the reciprocal benefit of the hierarchical nature.

Generalized Referring Expression Segmentation Object +2

Inverse-like Antagonistic Scene Text Spotting via Reading-Order Estimation and Dynamic Sampling

no code implementations8 Jan 2024 Shi-Xue Zhang, Chun Yang, Xiaobin Zhu, Hongyang Zhou, Hongfa Wang, Xu-Cheng Yin

Specifically, we propose an innovative reading-order estimation module (REM) that extracts reading-order information from the initial text boundary generated by an initial boundary module (IBM).

Text Detection Text Spotting

Global and Local Semantic Completion Learning for Vision-Language Pre-training

1 code implementation12 Jun 2023 Rong-Cheng Tu, Yatai Ji, Jie Jiang, Weijie Kong, Chengfei Cai, Wenzhe Zhao, Hongfa Wang, Yujiu Yang, Wei Liu

MGSC promotes learning more representative global features, which have a great impact on the performance of downstream tasks, while MLTC reconstructs modal-fusion local tokens, further enhancing accurate comprehension of multimodal data.

cross-modal alignment Image-text Retrieval +6

Img2Vec: A Teacher of High Token-Diversity Helps Masked AutoEncoders

no code implementations25 Apr 2023 Heng Pan, Chenyang Liu, Wenxiao Wang, Li Yuan, Hongfa Wang, Zhifeng Li, Wei Liu

To study which type of deep features is appropriate for MIM as a learning target, we propose a simple MIM framework with serials of well-trained self-supervised models to convert an Image to a feature Vector as the learning target of MIM, where the feature extractor is also known as a teacher model.

Attribute Diversity +1

Unsupervised Hashing with Semantic Concept Mining

1 code implementation23 Sep 2022 Rong-Cheng Tu, Xian-Ling Mao, Kevin Qinghong Lin, Chengfei Cai, Weize Qin, Hongfa Wang, Wei Wei, Heyan Huang

Recently, to improve the unsupervised image retrieval performance, plenty of unsupervised hashing methods have been proposed by designing a semantic similarity matrix, which is based on the similarities between image features extracted by a pre-trained CNN model.

Image Retrieval Prompt Engineering +4

Adaptive Perception Transformer for Temporal Action Localization

no code implementations25 Aug 2022 Yizheng Ouyang, Tianjin Zhang, Weibo Gu, Hongfa Wang

Besides, their multi-stage designs cannot generate action boundaries and categories straightforwardly.

Temporal Action Localization

DPTNet: A Dual-Path Transformer Architecture for Scene Text Detection

no code implementations21 Aug 2022 Jingyu Lin, Jie Jiang, Yan Yan, Chunchao Guo, Hongfa Wang, Wei Liu, Hanzi Wang

We further propose a parallel design that integrates the convolutional network with a powerful self-attention mechanism to provide complementary clues between the attention path and convolutional path.

Scene Text Detection Text Detection

Boosting Multi-Modal E-commerce Attribute Value Extraction via Unified Learning Scheme and Dynamic Range Minimization

no code implementations15 Jul 2022 Mengyin Liu, Chao Zhu, Hongyu Gao, Weibo Gu, Hongfa Wang, Wei Liu, Xu-Cheng Yin

2) Secondly, a text-guided information range minimization method is proposed to adaptively encode descriptive parts of each modality into an identical space with a powerful pretrained linguistic model.

Attribute Attribute Value Extraction +2

Egocentric Video-Language Pretraining @ Ego4D Challenge 2022

1 code implementation4 Jul 2022 Kevin Qinghong Lin, Alex Jinpeng Wang, Mattia Soldan, Michael Wray, Rui Yan, Eric Zhongcong Xu, Difei Gao, RongCheng Tu, Wenzhe Zhao, Weijie Kong, Chengfei Cai, Hongfa Wang, Dima Damen, Bernard Ghanem, Wei Liu, Mike Zheng Shou

In this report, we propose a video-language pretraining (VLP) based solution \cite{kevin2022egovlp} for four Ego4D challenge tasks, including Natural Language Query (NLQ), Moment Query (MQ), Object State Change Classification (OSCC), and PNR Localization (PNR).

Language Modeling Language Modelling +1

Tencent Text-Video Retrieval: Hierarchical Cross-Modal Interactions with Multi-Level Representations

no code implementations7 Apr 2022 Jie Jiang, Shaobo Min, Weijie Kong, Dihong Gong, Hongfa Wang, Zhifeng Li, Wei Liu

With multi-level representations for video and text, hierarchical contrastive learning is designed to explore fine-grained cross-modal relationships, i. e., frame-word, clip-phrase, and video-sentence, which enables HCMI to achieve a comprehensive semantic comparison between video and text modalities.

 Ranked #1 on Video Retrieval on MSR-VTT-1kA (using extra training data)

Contrastive Learning Denoising +4

Deep Unsupervised Hashing with Latent Semantic Components

no code implementations17 Mar 2022 Qinghong Lin, Xiaojun Chen, Qin Zhang, Shaotian Cai, Wenzhe Zhao, Hongfa Wang

Firstly, DSCH constructs a semantic component structure by uncovering the fine-grained semantics components of images with a Gaussian Mixture Modal~(GMM), where an image is represented as a mixture of multiple components, and the semantics co-occurrence are exploited.

Common Sense Reasoning Image Retrieval +1

Adaptive Boundary Proposal Network for Arbitrary Shape Text Detection

1 code implementation ICCV 2021 Shi-Xue Zhang, Xiaobin Zhu, Chun Yang, Hongfa Wang, Xu-Cheng Yin

In this work, we propose a novel adaptive boundary proposal network for arbitrary shape text detection, which can learn to directly produce accurate boundary for arbitrary shape text without any post-processing.

Decoder Text Detection

AdaDNNs: Adaptive Ensemble of Deep Neural Networks for Scene Text Recognition

no code implementations10 Oct 2017 Chun Yang, Xu-Cheng Yin, Zejun Li, Jianwei Wu, Chunchao Guo, Hongfa Wang, Lei Xiao

Recognizing text in the wild is a really challenging task because of complex backgrounds, various illuminations and diverse distortions, even with deep neural networks (convolutional neural networks and recurrent neural networks).

Diversity Scene Text Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.