Search Results for author: Hongtao Xie

Found 37 papers, 24 papers with code

AlignZeg: Mitigating Objective Misalignment for Zero-shot Semantic Segmentation

no code implementations • 8 Apr 2024 • Jiannan Ge, Lingxi Xie, Hongtao Xie, Pandeng Li, Xiaopeng Zhang, Yongdong Zhang, Qi Tian

(1) Mutually-Refined Proposal Extraction.

Image Segmentation Segmentation +3

Paper
Add Code

DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations

1 code implementation • 11 Mar 2024 • Tianhao Qi, Shancheng Fang, Yanze Wu, Hongtao Xie, Jiawei Liu, Lang Chen, Qian He, Yongdong Zhang

The Q-Formers are trained using paired images rather than the identical target, in which the reference image and the ground-truth image are with the same style or semantics.

Disentanglement

141

Paper
Code

Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval

1 code implementation • 19 Dec 2023 • Zhihang Liu, Jun Li, Hongtao Xie, Pandeng Li, Jiannan Ge, Sun-Ao Liu, Guoqing Jin

In this paper, we introduce Modal-Enhanced Semantic Modeling (MESM), a novel framework for more balanced alignment through enhancing features at two levels.

Moment Retrieval Retrieval +1

Paper
Code

CARIS: Context-Augmented Referring Image Segmentation

1 code implementation • ACM MM 2023 • Sun-Ao Liu, Yiheng Zhang, Zhaofan Qiu, Hongtao Xie, Yongdong Zhang, Ting Yao

Technically, CARIS develops a context-aware mask decoder with sequential bidirectional cross-modal attention to integrate the linguistic features with visual context, which are then aligned with pixel-wise visual features.

Image Segmentation Segmentation +1

Paper
Code

Dual-Stream Knowledge-Preserving Hashing for Unsupervised Video Retrieval

no code implementations • 12 Oct 2023 • Pandeng Li, Hongtao Xie, Jiannan Ge, Lei Zhang, Shaobo Min, Yongdong Zhang

Hence, we address this problem by decomposing video information into reconstruction-dependent and semantic-dependent information, which disentangles the semantic extraction from reconstruction constraint.

Retrieval Semantic Retrieval +3

Paper
Add Code

Symmetrical Linguistic Feature Distillation with CLIP for Scene Text Recognition

1 code implementation • 8 Oct 2023 • Zixiao Wang, Hongtao Xie, Yuxin Wang, Jianjun Xu, Boqiang Zhang, Yongdong Zhang

In this paper, we explore the potential of the Contrastive Language-Image Pretraining (CLIP) model in scene text recognition (STR), and establish a novel Symmetrical Linguistic Feature Distillation framework (named CLIP-OCR) to leverage both visual and linguistic knowledge in CLIP.

Optical Character Recognition (OCR) Scene Text Recognition

Paper
Code

Learning Complete Topology-Aware Correlations Between Relations for Inductive Link Prediction

no code implementations • 20 Sep 2023 • Jie Wang, Hanzhu Chen, Qitan Lv, Zhihao Shi, Jiajun Chen, Huarui He, Hongtao Xie, Yongdong Zhang, Feng Wu

This implies the great potential of the semantic correlations for the entity-independent inductive link prediction task.

Inductive Link Prediction Knowledge Graphs

Paper
Add Code

TextPainter: Multimodal Text Image Generation with Visual-harmony and Text-comprehension for Poster Design

no code implementations • 9 Aug 2023 • Yifan Gao, Jinpeng Lin, Min Zhou, Chuanbin Liu, Hongtao Xie, Tiezheng Ge, Yuning Jiang

Specifically, TextPainter takes the global-local background image as a hint of style and guides the text image generation with visual harmony.

Image Generation Language Modelling +2

Paper
Add Code

Balanced Classification: A Unified Framework for Long-Tailed Object Detection

1 code implementation • 4 Aug 2023 • Tianhao Qi, Hongtao Xie, Pandeng Li, Jiannan Ge, Yongdong Zhang

In this paper, we contend that the learning bias originates from two factors: 1) the unequal competition arising from the imbalanced distribution of foreground categories, and 2) the lack of sample diversity in tail categories.

Ranked #1 on Long-tailed Object Detection on LVIS v1.0 val

Hallucination Long-tailed Object Detection +1

Paper
Code

MomentDiff: Generative Video Moment Retrieval from Random to Real

1 code implementation • NeurIPS 2023 • Pandeng Li, Chen-Wei Xie, Hongtao Xie, Liming Zhao, Lei Zhang, Yun Zheng, Deli Zhao, Yongdong Zhang

Video moment retrieval pursues an efficient and generalized solution to identify the specific temporal segments within an untrimmed video that correspond to a given language description.

Moment Retrieval Retrieval

Paper
Code

TPS++: Attention-Enhanced Thin-Plate Spline for Scene Text Recognition

1 code implementation • 9 May 2023 • Tianlun Zheng, Zhineng Chen, Jinfeng Bai, Hongtao Xie, Yu-Gang Jiang

In this work, we introduce TPS++, an attention-enhanced TPS transformation that incorporates the attention mechanism to text rectification for the first time.

Ranked #1 on Scene Text Recognition on SVT-P

Optical Character Recognition (OCR) Scene Text Recognition

Paper
Code

Linguistic More: Taking a Further Step toward Efficient and Accurate Scene Text Recognition

1 code implementation • 9 May 2023 • Boqiang Zhang, Hongtao Xie, Yuxin Wang, Jianjun Xu, Yongdong Zhang

Vision model have gained increasing attention due to their simplicity and efficiency in Scene Text Recognition (STR) task.

Scene Text Recognition

Paper
Code

Learning Orthogonal Prototypes for Generalized Few-Shot Semantic Segmentation

1 code implementation • CVPR 2023 • Sun-Ao Liu, Yiheng Zhang, Zhaofan Qiu, Hongtao Xie, Yongdong Zhang, Ting Yao

POP builds a set of orthogonal prototypes, each of which represents a semantic class, and makes the prediction for each class separately based on the features projected onto its prototype.

Ranked #1 on Generalized Few-Shot Semantic Segmentation on COCO-20i (1-shot)

Generalized Few-Shot Semantic Segmentation

Paper
Code

Progressive Spatio-Temporal Prototype Matching for Text-Video Retrieval

1 code implementation • ICCV 2023 • Pandeng Li, Chen-Wei Xie, Liming Zhao, Hongtao Xie, Jiannan Ge, Yun Zheng, Deli Zhao, Yongdong Zhang

In the event-sentence prototype matching phase, we design a temporal prototype generation mechanism to associate intra-frame objects and interact inter-frame temporal relations.

Object Retrieval +2

Paper
Code

Exploring Stroke-Level Modifications for Scene Text Editing

1 code implementation • 5 Dec 2022 • Yadong Qu, Qingfeng Tan, Hongtao Xie, Jianjun Xu, Yuxin Wang, Yongdong Zhang

Moreover, two new datasets (Tamper-Syn2k and Tamper-Scene) are proposed to fill the blank of public evaluation datasets.

Attribute Scene Text Editing

Paper
Code

ABINet++: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Spotting

1 code implementation • 19 Nov 2022 • Shancheng Fang, Zhendong Mao, Hongtao Xie, Yuxin Wang, Chenggang Yan, Yongdong Zhang

In this paper, we argue that the limited capacity of language models comes from 1) implicit language modeling; 2) unidirectional feature representation; and 3) language model with noise input.

Ranked #4 on Text Spotting on SCUT-CTW1500

Blocking Language Modelling +2

Paper
Code

Bridging the Gap Between Vision Transformers and Convolutional Neural Networks on Small Datasets

1 code implementation • 12 Oct 2022 • Zhiying Lu, Hongtao Xie, Chuanbin Liu, Yongdong Zhang

On channel aspect, we introduce a dynamic feature aggregation module in MLP and a brand new "head token" design in multi-head self-attention module to help re-calibrate channel representation and make different channel group representation interacts with each other.

Inductive Bias

Paper
Code

Geometry Aligned Variational Transformer for Image-conditioned Layout Generation

no code implementations • 2 Sep 2022 • Yunning Cao, Ye Ma, Min Zhou, Chuanbin Liu, Hongtao Xie, Tiezheng Ge, Yuning Jiang

First, self-attention mechanism is adopted to model the contextual relationship within layout elements, while cross-attention mechanism is used to fuse the visual information of conditional images.

Layout Design Object Localization

Paper
Add Code

REMOT: A Region-to-Whole Framework for Realistic Human Motion Transfer

no code implementations • 1 Sep 2022 • Quanwei Yang, Xinchen Liu, Wu Liu, Hongtao Xie, Xiaoyan Gu, Lingyun Yu, Yongdong Zhang

Human Video Motion Transfer (HVMT) aims to, given an image of a source person, generate his/her video that imitates the motion of the driving person.

Paper
Add Code

Partial Class Activation Attention for Semantic Segmentation

1 code implementation • CVPR 2022 • Sun-Ao Liu, Hongtao Xie, Hai Xu, Yongdong Zhang, Qi Tian

Current attention-based methods for semantic segmentation mainly model pixel relation through pairwise affinity and coarse segmentation.

Relation Segmentation +1

Paper
Code

CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition

2 code implementations • 22 Nov 2021 • Tianlun Zheng, Zhineng Chen, Shancheng Fang, Hongtao Xie, Yu-Gang Jiang

In this paper, we propose a novel module called Multi-Domain Character Distance Perception (MDCDP) to establish a visually and semantically related position embedding.

Ranked #11 on Scene Text Recognition on ICDAR2015

Position Scene Text Recognition

106

Paper
Code

From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network

4 code implementations • ICCV 2021 • Yuxin Wang, Hongtao Xie, Shancheng Fang, Jing Wang, Shenggao Zhu, Yongdong Zhang

Such operation guides the vision model to use not only the visual texture of characters, but also the linguistic information in visual context for recognition when the visual cues are confused (e. g. occlusion, noise, etc.).

Language Modelling Scene Text Recognition

38,382

Paper
Code

Look Back Again: Dual Parallel Attention Network for Accurate and Robust Scene Text Recognition

2 code implementations • ICMR 2021 • Zilong Fu, Guoqing Jin, Hongtao Xie, Junbo Guo

To tackle this issue, in this paper, we propose a dual parallel attention network (DPAN), in which a newly designed parallel context attention module (PCAM) is cascaded with the original PPAM, using linguistic contextual information to compensate for the information inconsistency between queries and keys.

Ranked #12 on Scene Text Recognition on ICDAR2013

Language Modelling Position +1

Paper
Code

PERT: A Progressively Region-based Network for Scene Text Removal

1 code implementation • 24 Jun 2021 • Yuxin Wang, Hongtao Xie, Shancheng Fang, Yadong Qu, Yongdong Zhang

However, there exists two problems: 1) the implicit erasure guidance causes the excessive erasure to non-text areas; 2) the one-stage erasure lacks the exhaustive removal of text region.

Paper
Code

Cross-Modal Attention Consistency for Video-Audio Unsupervised Learning

no code implementations • 13 Jun 2021 • Shaobo Min, Qi Dai, Hongtao Xie, Chuang Gan, Yongdong Zhang, Jingdong Wang

Cross-modal correlation provides an inherent supervision for video unsupervised representation learning.

Contrastive Learning Representation Learning

Paper
Add Code

Memory Enhanced Embedding Learning for Cross-Modal Video-Text Retrieval

no code implementations • 29 Mar 2021 • Rui Zhao, Kecheng Zheng, Zheng-Jun Zha, Hongtao Xie, Jiebo Luo

The cross-modal memory module is employed to record the instance embeddings of all the datasets for global negative mining.

Retrieval Text Retrieval +1

Paper
Add Code

Frequency-aware Discriminative Feature Learning Supervised by Single-Center Loss for Face Forgery Detection

no code implementations • CVPR 2021 • Jiaming Li, Hongtao Xie, Jiahong Li, Zhongyuan Wang, Yongdong Zhang

Face forgery detection is raising ever-increasing interest in computer vision since facial manipulation technologies cause serious worries.

Paper
Add Code

Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition

3 code implementations • CVPR 2021 • Shancheng Fang, Hongtao Xie, Yuxin Wang, Zhendong Mao, Yongdong Zhang

Additionally, based on the ensemble of iterative predictions, we propose a self-training method which can learn from unlabeled images effectively.

Language Modelling Scene Text Recognition

38,382

Paper
Code

Hierarchical Granularity Transfer Learning

no code implementations • NeurIPS 2020 • Shaobo Min, Hongtao Xie, Hantao Yao, Xuran Deng, Zheng-Jun Zha, Yongdong Zhang

In this paper, we introduce a new task, named Hierarchical Granularity Transfer Learning (HGTL), to recognize sub-level categories with basic-level annotations and semantic descriptions for hierarchical categories.

Transfer Learning

Paper
Add Code

Curriculum Learning for Natural Language Understanding

no code implementations • ACL 2020 • Benfeng Xu, Licheng Zhang, Zhendong Mao, Quan Wang, Hongtao Xie, Yongdong Zhang

With the great success of pre-trained language models, the pretrain-finetune paradigm now becomes the undoubtedly dominant solution for natural language understanding (NLU) tasks.

Natural Language Understanding

Paper
Add Code

ContourNet: Taking a Further Step toward Accurate Arbitrary-shaped Scene Text Detection

1 code implementation • CVPR 2020 • Yuxin Wang, Hongtao Xie, Zheng-Jun Zha, Mengting Xing, Zilong Fu, Yongdong Zhang

Then a novel Local Orthogonal Texture-aware Module (LOTM) models the local texture information of proposal features in two orthogonal directions and represents text region with a set of contour points.

Region Proposal Scene Text Detection +1

226

Paper
Code

Graph Structured Network for Image-Text Matching

1 code implementation • CVPR 2020 • Chunxiao Liu, Zhendong Mao, Tianzhu Zhang, Hongtao Xie, Bin Wang, Yongdong Zhang

The GSMN explicitly models object, relation and attribute as a structured phrase, which not only allows to learn correspondence of object, relation and attribute separately, but also benefits to learn fine-grained correspondence of structured phrase.

Ranked #16 on Cross-Modal Retrieval on Flickr30k

Attribute Image-text matching +3

160

Paper
Code

Multi-Objective Matrix Normalization for Fine-grained Visual Recognition

1 code implementation • 30 Mar 2020 • Shaobo Min, Hantao Yao, Hongtao Xie, Zheng-Jun Zha, Yongdong Zhang

In this paper, we propose an efficient Multi-Objective Matrix Normalization (MOMN) method that can simultaneously normalize a bilinear representation in terms of square-root, low-rank, and sparsity.

Fine-Grained Visual Recognition

Paper
Code

Domain-aware Visual Bias Eliminating for Generalized Zero-Shot Learning

1 code implementation • CVPR 2020 • Shaobo Min, Hantao Yao, Hongtao Xie, Chaoqun Wang, Zheng-Jun Zha, Yongdong Zhang

Recent methods focus on learning a unified semantic-aligned visual representation to transfer knowledge between two domains, while ignoring the effect of semantic-free visual representation in alleviating the biased recognition problem.

Generalized Zero-Shot Learning

Paper
Code

ACE-Net: Biomedical Image Segmentation with Augmented Contracting and Expansive Paths

no code implementations • 23 Aug 2019 • Yanhao Zhu, Zhineng Chen, Shuai Zhao, Hongtao Xie, Wenming Guo, Yongdong Zhang

Nowadays U-net-like FCNs predominate various biomedical image segmentation applications and attain promising performance, largely due to their elegant architectures, e. g., symmetric contracting and expansive paths as well as lateral skip-connections.

Image Segmentation Segmentation +1

Paper
Add Code

Domain-Specific Embedding Network for Zero-Shot Recognition

1 code implementation • 12 Aug 2019 • Shaobo Min, Hantao Yao, Hongtao Xie, Zheng-Jun Zha, Yongdong Zhang

In contrast to previous methods, the DSEN decomposes the domain-shared projection function into one domain-invariant and two domain-specific sub-functions to explore the similarities and differences between two domains.

Zero-Shot Learning

Paper
Code

CA3Net: Contextual-Attentional Attribute-Appearance Network for Person Re-Identification

no code implementations • 19 Nov 2018 • Jiawei Liu, Zheng-Jun Zha, Hongtao Xie, Zhiwei Xiong, Yongdong Zhang

An appearance network is developed to learn appearance features from the full body, horizontal and vertical body parts of pedestrians with spatial dependencies among body parts.

Attribute Multi-Task Learning +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.