Search Results for author: Xiang Bai

Found 153 papers, 94 papers with code

Intra-class Feature Variation Distillation for Semantic Segmentation

1 code implementation ECCV 2020 Yukang Wang, Wei Zhou, Tao Jiang, Xiang Bai, Yongchao Xu

In this paper, different from previous methods performing knowledge distillation for densely pairwise relations, we propose a novel intra-class feature variation distillation (IFVD) to transfer the intra-class feature variation (IFV) of the cumbersome model (teacher) to the compact model (student).

Knowledge Distillation Segmentation +1

Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using Diffusion Models

no code implementations28 Nov 2023 Ling Fu, Zijie Wu, Yingying Zhu, Yuliang Liu, Xiang Bai

We contend that one main limitation of existing generation methods is the insufficient integration of foreground text with the background.

Image Generation Scene Text Detection +1

Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models

1 code implementation11 Nov 2023 Zhang Li, Biao Yang, Qiang Liu, Zhiyin Ma, Shuo Zhang, Jingxu Yang, Yabo Sun, Yuliang Liu, Xiang Bai

Additionally, experiments on 18 datasets further demonstrate that Monkey surpasses existing LMMs in many tasks like Image Captioning and various Visual Question Answering formats.

Image Captioning Question Answering +2

SingleInsert: Inserting New Concepts from a Single Image into Text-to-Image Models for Flexible Editing

no code implementations12 Oct 2023 Zijie Wu, Chaohui Yu, Zhen Zhu, Fan Wang, Xiang Bai

To utilize the abundant visual priors in the off-the-shelf T2I models, a series of methods try to invert an image to proper embedding that aligns with the semantic space of the T2I model.

Image Generation Novel View Synthesis

A Discrepancy Aware Framework for Robust Anomaly Detection

1 code implementation11 Oct 2023 Yuxuan Cai, Dingkang Liang, Dongliang Luo, Xinwei He, Xin Yang, Xiang Bai

To alleviate this issue, we present a Discrepancy Aware Framework (DAF), which demonstrates robust performance consistently with simple and cheap strategies across different anomaly detection benchmarks.

Anomaly Detection Defect Detection +2

Semantic Graph Representation Learning for Handwritten Mathematical Expression Recognition

no code implementations21 Aug 2023 Zhuang Liu, Ye Yuan, Zhilong Ji, Jingfeng Bai, Xiang Bai

Then we design a semantic aware module (SAM), which projects the visual and classification feature into semantic space.

Graph Representation Learning

Turning a CLIP Model into a Scene Text Spotter

1 code implementation21 Aug 2023 Wenwen Yu, Yuliang Liu, Xingkui Zhu, Haoyu Cao, Xing Sun, Xiang Bai

Utilizing only 10% of the supervised data, FastTCM-CR50 improves performance by an average of 26. 5% and 5. 5% for text detection and spotting tasks, respectively.

object-detection Object Detection +3

ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer

3 code implementations ICCV 2023 Mingxin Huang, Jiaxin Zhang, Dezhi Peng, Hao Lu, Can Huang, Yuliang Liu, Xiang Bai, Lianwen Jin

To this end, we introduce a new model named Explicit Synergy-based Text Spotting Transformer framework (ESTextSpotter), which achieves explicit synergy by modeling discriminative and interactive features for text detection and recognition within a single decoder.

Text Detection Text Spotting

SparseTrack: Multi-Object Tracking by Performing Scene Decomposition based on Pseudo-Depth

1 code implementation8 Jun 2023 Zelin Liu, Xinggang Wang, Cheng Wang, Wenyu Liu, Xiang Bai

By integrating the pseudo-depth method and the DCM strategy into the data association process, we propose a new tracker, called SparseTrack.

 Ranked #1 on Multi-Object Tracking on MOT20 (using extra training data)

Depth Estimation Multi-Object Tracking +1

SAM3D: Zero-Shot 3D Object Detection via Segment Anything Model

1 code implementation4 Jun 2023 Dingyuan Zhang, Dingkang Liang, Hongcheng Yang, Zhikang Zou, Xiaoqing Ye, Zhe Liu, Xiang Bai

In the spirit of unleashing the capability of foundation models on vision tasks, the Segment Anything Model (SAM), a vision foundation model for image segmentation, has been proposed recently and presents strong zero-shot ability on many downstream 2D tasks.

3D Object Detection Image Segmentation +2

Multi-Modal 3D Object Detection by Box Matching

1 code implementation12 May 2023 Zhe Liu, Xiaoqing Ye, Zhikang Zou, Xinwei He, Xiao Tan, Errui Ding, Jingdong Wang, Xiang Bai

Extensive experiments on the nuScenes dataset demonstrate that our method is much more stable in dealing with challenging cases such as asynchronous sensors, misaligned sensor placement, and degenerated camera images than existing fusion methods.

3D Object Detection Autonomous Driving +1

Visual Information Extraction in the Wild: Practical Dataset and End-to-end Solution

1 code implementation12 May 2023 Jianfeng Kuang, Wei Hua, Dingkang Liang, Mingkun Yang, Deqiang Jiang, Bo Ren, Xiang Bai

We evaluate the existing end-to-end methods for VIE on the proposed dataset and observe that the performance of these methods has a distinguishable drop from SROIE (a widely used English dataset) to our proposed dataset due to the larger variance of layout and entities.

Contrastive Learning Optical Character Recognition (OCR)

A Large Cross-Modal Video Retrieval Dataset with Reading Comprehension

1 code implementation5 May 2023 Weijia Wu, Yuzhong Zhao, Zhuang Li, Jiahong Li, Hong Zhou, Mike Zheng Shou, Xiang Bai

Most existing cross-modal language-to-video retrieval (VR) research focuses on single-modal input from video, i. e., visual representation, while the text is omnipresent in human environments and frequently critical to understand video.

Reading Comprehension Retrieval +1

ICDAR 2023 Competition on Reading the Seal Title

no code implementations24 Apr 2023 Wenwen Yu, MingYu Liu, Mingrui Chen, Ning Lu, Yinlong Wen, Yuliang Liu, Dimosthenis Karatzas, Xiang Bai

To promote research in this area, we organized ICDAR 2023 competition on reading the seal title (ReST), which included two tasks: seal title text detection (Task 1) and end-to-end seal title recognition (Task 2).

Optical Character Recognition (OCR) Text Detection

SOOD: Towards Semi-Supervised Oriented Object Detection

1 code implementation CVPR 2023 Wei Hua, Dingkang Liang, Jingyu Li, Xiaolong Liu, Zhikang Zou, Xiaoqing Ye, Xiang Bai

Semi-Supervised Object Detection (SSOD), aiming to explore unlabeled data for boosting object detectors, has become an active task in recent years.

object-detection Object Detection +3

ICDAR 2023 Video Text Reading Competition for Dense and Small Text

no code implementations10 Apr 2023 Weijia Wu, Yuzhong Zhao, Zhuang Li, Jiahong Li, Mike Zheng Shou, Umapada Pal, Dimosthenis Karatzas, Xiang Bai

In this competition report, we establish a video text reading benchmark, DSText, which focuses on dense and small text reading challenges in the video with various scenarios.

Text Detection Text Spotting +1

Modeling Entities as Semantic Points for Visual Information Extraction in the Wild

no code implementations CVPR 2023 Zhibo Yang, Rujiao Long, Pengfei Wang, Sibo Song, Humen Zhong, Wenqing Cheng, Xiang Bai, Cong Yao

As the first contribution of this work, we curate and release a new dataset for VIE, in which the document images are much more challenging in that they are taken from real applications, and difficulties such as blur, partial occlusion, and printing shift are quite common.

Text Spotting

InstMove: Instance Motion for Object-centric Video Segmentation

1 code implementation CVPR 2023 Qihao Liu, Junfeng Wu, Yi Jiang, Xiang Bai, Alan Yuille, Song Bai

A common solution is to use optical flow to provide motion information, but essentially it only considers pixel-level motion, which still relies on appearance similarity and hence is often inaccurate under occlusion and fast movement.

Optical Flow Estimation Segmentation +2

Turning a CLIP Model into a Scene Text Detector

1 code implementation CVPR 2023 Wenwen Yu, Yuliang Liu, Wei Hua, Deqiang Jiang, Bo Ren, Xiang Bai

Recently, pretraining approaches based on vision language models have made effective progresses in the field of text detection.

Domain Adaptation Scene Text Detection +1

Side Adapter Network for Open-Vocabulary Semantic Segmentation

3 code implementations CVPR 2023 Mengde Xu, Zheng Zhang, Fangyun Wei, Han Hu, Xiang Bai

A side network is attached to a frozen CLIP model with two branches: one for predicting mask proposals, and the other for predicting attention bias which is applied in the CLIP model to recognize the class of masks.

Language Modelling Open Vocabulary Semantic Segmentation +3

StereoDistill: Pick the Cream from LiDAR for Distilling Stereo-based 3D Object Detection

no code implementations4 Jan 2023 Zhe Liu, Xiaoqing Ye, Xiao Tan, Errui Ding, Xiang Bai

In this paper, we propose a cross-modal distillation method named StereoDistill to narrow the gap between the stereo and LiDAR-based approaches via distilling the stereo detectors from the superior LiDAR model at the response level, which is usually overlooked in 3D object detection distillation.

3D Object Detection object-detection

SPTS v2: Single-Point Scene Text Spotting

3 code implementations4 Jan 2023 Yuliang Liu, Jiaxin Zhang, Dezhi Peng, Mingxin Huang, Xinyu Wang, Jingqun Tang, Can Huang, Dahua Lin, Chunhua Shen, Xiang Bai, Lianwen Jin

Within the context of our SPTS v2 framework, our experiments suggest a potential preference for single-point representation in scene text spotting when compared to other representations.

Text Detection Text Spotting

The Runner-up Solution for YouTube-VIS Long Video Challenge 2022

no code implementations18 Nov 2022 Junfeng Wu, Yi Jiang, Qihao Liu, Xiang Bai, Song Bai

This technical report describes our 2nd-place solution for the ECCV 2022 YouTube-VIS Long Video Challenge.

Contrastive Learning Instance Segmentation +2

Affinity Feature Strengthening for Accurate, Complete and Robust Vessel Segmentation

1 code implementation12 Nov 2022 Tianyi Shi, Xiaohuan Ding, Wei Zhou, Feng Pan, Zengqiang Yan, Xiang Bai, Xin Yang

Vessel segmentation is crucial in many medical image applications, such as detecting coronary stenoses, retinal vessel diseases and brain aneurysms.

Toward Understanding WordArt: Corner-Guided Transformer for Scene Text Recognition

1 code implementation31 Jul 2022 Xudong Xie, Ling Fu, Zhifei Zhang, Zhaowen Wang, Xiang Bai

Thirdly, we utilize Transformer to learn the global feature on image-level and model the global relationship of the corner points, with the assistance of a corner-query cross-attention mechanism.

Scene Text Recognition

When Counting Meets HMER: Counting-Aware Network for Handwritten Mathematical Expression Recognition

1 code implementation23 Jul 2022 Bohan Li, Ye Yuan, Dingkang Liang, Xiao Liu, Zhilong Ji, Jinfeng Bai, Wenyu Liu, Xiang Bai

Recently, most handwritten mathematical expression recognition (HMER) methods adopt the encoder-decoder networks, which directly predict the markup sequences from formula images with the attention mechanism.

Optical Character Recognition (OCR)

In Defense of Online Models for Video Instance Segmentation

1 code implementation21 Jul 2022 Junfeng Wu, Qihao Liu, Yi Jiang, Song Bai, Alan Yuille, Xiang Bai

In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.

Ranked #6 on Video Instance Segmentation on YouTube-VIS validation (using extra training data)

Contrastive Learning Instance Segmentation +5

CCPL: Contrastive Coherence Preserving Loss for Versatile Style Transfer

1 code implementation11 Jul 2022 Zijie Wu, Zhen Zhu, Junping Du, Xiang Bai

CCPL can preserve the coherence of the content source during style transfer without degrading stylization.

Image-to-Image Translation Style Transfer +1

Reading and Writing: Discriminative and Generative Modeling for Self-Supervised Text Recognition

1 code implementation1 Jul 2022 Mingkun Yang, Minghui Liao, Pu Lu, Jing Wang, Shenggao Zhu, Hualin Luo, Qi Tian, Xiang Bai

Inspired by the observation that humans learn to recognize the texts through both reading and writing, we propose to learn discrimination and generation by integrating contrastive learning and masked image modeling in our self-supervised method.

Contrastive Learning Scene Text Recognition

Vision-Language Pre-Training for Boosting Scene Text Detectors

2 code implementations CVPR 2022 Sibo Song, Jianqiang Wan, Zhibo Yang, Jun Tang, Wenqing Cheng, Xiang Bai, Cong Yao

In this paper, we specifically adapt vision-language joint learning for scene text detection, a task that intrinsically involves cross-modal interaction between the two modalities: vision and language, since text is the written form of language.

Contrastive Learning Language Modelling +4

GitNet: Geometric Prior-based Transformation for Birds-Eye-View Segmentation

no code implementations16 Apr 2022 Shi Gong, Xiaoqing Ye, Xiao Tan, Jingdong Wang, Errui Ding, Yu Zhou, Xiang Bai

Birds-eye-view (BEV) semantic segmentation is critical for autonomous driving for its powerful spatial representation ability.

Autonomous Driving Image Segmentation +2

An Empirical Study of End-to-End Temporal Action Detection

1 code implementation CVPR 2022 Xiaolong Liu, Song Bai, Xiang Bai

Rather than end-to-end learning, most existing methods adopt a head-only learning paradigm, where the video encoder is pre-trained for action classification, and only the detection head upon the encoder is optimized for TAD.

Action Classification Action Detection +2

Few Could Be Better Than All: Feature Sampling and Grouping for Scene Text Detection

no code implementations CVPR 2022 Jingqun Tang, Wenqing Zhang, Hongye Liu, Mingkun Yang, Bo Jiang, Guanglong Hu, Xiang Bai

Different from previous approaches that learn robust deep representations of scene text in a holistic manner, our method performs scene text detection based on a few representative features, which avoids the disturbance by background and reduces the computational cost.

Ranked #17 on Object Detection In Aerial Images on DOTA (using extra training data)

object-detection Object Detection In Aerial Images +2

Comprehensive Benchmark Datasets for Amharic Scene Text Detection and Recognition

no code implementations23 Mar 2022 Wondimu Dikubab, Dingkang Liang, Minghui Liao, Xiang Bai

Ethiopic/Amharic script is one of the oldest African writing systems, which serves at least 23 languages (e. g., Amharic, Tigrinya) in East Africa for more than 120 million people.

Benchmarking Scene Text Detection +1

Syntax-Aware Network for Handwritten Mathematical Expression Recognition

2 code implementations CVPR 2022 Ye Yuan, Xiao Liu, Wondimu Dikubab, Hui Liu, Zhilong Ji, Zhongqin Wu, Xiang Bai

In this paper, we propose a simple and efficient method for HMER, which is the first to incorporate syntax information into an encoder-decoder network.

An End-to-End Transformer Model for Crowd Localization

1 code implementation26 Feb 2022 Dingkang Liang, Wei Xu, Xiang Bai

Crowd localization, predicting head positions, is a more practical and high-level task than simply counting.

Real-Time Scene Text Detection with Differentiable Binarization and Adaptive Scale Fusion

5 code implementations21 Feb 2022 Minghui Liao, Zhisheng Zou, Zhaoyi Wan, Cong Yao, Xiang Bai

By incorporating the proposed DB and ASF with the segmentation network, our proposed scene text detector consistently achieves state-of-the-art results, in terms of both detection accuracy and speed, on five standard benchmarks.

Binarization Model Optimization +3

A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-language Model

2 code implementations29 Dec 2021 Mengde Xu, Zheng Zhang, Fangyun Wei, Yutong Lin, Yue Cao, Han Hu, Xiang Bai

However, semantic segmentation and the CLIP model perform on different visual granularity, that semantic segmentation processes on pixels while CLIP performs on images.

Image Classification Language Modelling +8

EPNet++: Cascade Bi-directional Fusion for Multi-Modal 3D Object Detection

1 code implementation21 Dec 2021 Zhe Liu, Tengteng Huang, Bingling Li, Xiwu Chen, Xi Wang, Xiang Bai

Recently, fusing the LiDAR point cloud and camera image to improve the performance and robustness of 3D object detection has received more and more attention, as these two modalities naturally possess strong complementarity.

3D Object Detection object-detection

SPTS: Single-Point Text Spotting

1 code implementation15 Dec 2021 Dezhi Peng, Xinyu Wang, Yuliang Liu, Jiaxin Zhang, Mingxin Huang, Songxuan Lai, Shenggao Zhu, Jing Li, Dahua Lin, Chunhua Shen, Xiang Bai, Lianwen Jin

For the first time, we demonstrate that training scene text spotting models can be achieved with an extremely low-cost annotation of a single-point for each instance.

Language Modelling Text Detection +1

SeqFormer: Sequential Transformer for Video Instance Segmentation

2 code implementations15 Dec 2021 Junfeng Wu, Yi Jiang, Song Bai, Wenqing Zhang, Xiang Bai

Nevertheless, we observe that a stand-alone instance query suffices for capturing a time sequence of instances in a video, but attention mechanisms shall be done with each frame independently.

Instance Segmentation Semantic Segmentation +1

PRA-Net: Point Relation-Aware Network for 3D Point Cloud Analysis

1 code implementation9 Dec 2021 Silin Cheng, Xiwu Chen, Xinwei He, Zhe Liu, Xiang Bai

Learning intra-region contexts and inter-region relations are two effective strategies to strengthen feature representations for point cloud analysis.

3D Point Cloud Classification Keypoint Estimation

Occluded Video Instance Segmentation: Dataset and ICCV 2021 Challenge

no code implementations15 Nov 2021 Jiyang Qi, Yan Gao, Yao Hu, Xinggang Wang, Xiaoyu Liu, Xiang Bai, Serge Belongie, Alan Yuille, Philip H. S. Torr, Song Bai

To promote the development of occlusion understanding, we collect a large-scale dataset called OVIS for video instance segmentation in the occluded scenario.

Instance Segmentation Object Recognition +3

Video Text Tracking With a Spatio-Temporal Complementary Model

1 code implementation9 Nov 2021 Yuzhe Gao, Xing Li, Jiajian Zhang, Yu Zhou, Dian Jin, Jing Wang, Shenggao Zhu, Xiang Bai

We leverage a Siamese ComplementaryModule to fully exploit the continuity characteristic of the textinstances in the temporal dimension, which effectively alleviatesthe missed detection of the text instances, and hence ensuresthe completeness of each text trajectory.

text similarity

Bootstrap Your Object Detector via Mixed Training

1 code implementation NeurIPS 2021 Mengde Xu, Zheng Zhang, Fangyun Wei, Yutong Lin, Yue Cao, Stephen Lin, Han Hu, Xiang Bai

We introduce MixTraining, a new training paradigm for object detection that can improve the performance of existing detectors for free.

Data Augmentation object-detection +1

Improving OCR-Based Image Captioning by Incorporating Geometrical Relationship

no code implementations CVPR 2021 Jing Wang, Jinhui Tang, Mingkun Yang, Xiang Bai, Jiebo Luo

Under the guidance of the geometrical relationship between OCR tokens, our LSTM-R capitalizes on a newly-devised relation-aware pointer network to select OCR tokens from the scene text for OCR-based image captioning.

Image Captioning Optical Character Recognition (OCR)

End-to-end Temporal Action Detection with Transformer

1 code implementation18 Jun 2021 Xiaolong Liu, Qimeng Wang, Yao Hu, Xu Tang, Shiwei Zhang, Song Bai, Xiang Bai

Temporal action detection (TAD) aims to determine the semantic label and the temporal interval of every action instance in an untrimmed video.

Action Detection Temporal Action Localization +1

End-to-End Semi-Supervised Object Detection with Soft Teacher

8 code implementations ICCV 2021 Mengde Xu, Zheng Zhang, Han Hu, JianFeng Wang, Lijuan Wang, Fangyun Wei, Xiang Bai, Zicheng Liu

This paper presents an end-to-end semi-supervised object detection approach, in contrast to previous more complex multi-stage methods.

Instance Segmentation object-detection +4

TransCrowd: weakly-supervised crowd counting with transformers

1 code implementation19 Apr 2021 Dingkang Liang, Xiwu Chen, Wei Xu, Yu Zhou, Xiang Bai

Current weakly-supervised counting methods adopt the CNN to regress a total count of the crowd by an image-to-count paradigm.

Crowd Counting

Scene Text Retrieval via Joint Text Detection and Similarity Learning

1 code implementation CVPR 2021 Hao Wang, Xiang Bai, Mingkun Yang, Shenggao Zhu, Jing Wang, Wenyu Liu

Such a task is usually realized by matching a query text to the recognized words, outputted by an end-to-end scene text spotter.

Retrieval Scene Text Detection +3

MOST: A Multi-Oriented Scene Text Detector with Localization Refinement

no code implementations CVPR 2021 Minghang He, Minghui Liao, Zhibo Yang, Humen Zhong, Jun Tang, Wenqing Cheng, Cong Yao, Yongpan Wang, Xiang Bai

Over the past few years, the field of scene text detection has progressed rapidly that modern text detectors are able to hunt text in various challenging scenarios.

Scene Text Detection Text Detection

Progressive and Aligned Pose Attention Transfer for Person Image Generation

1 code implementation22 Mar 2021 Zhen Zhu, Tengteng Huang, Mengde Xu, Baoguang Shi, Wenqing Cheng, Xiang Bai

This paper proposes a new generative adversarial network for pose transfer, i. e., transferring the pose of a given person to a target pose.

Data Augmentation Person Re-Identification +1

ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction

1 code implementation18 Mar 2021 Zheng Huang, Kai Chen, Jianhua He, Xiang Bai, Dimosthenis Karatzas, Shjian Lu, C. V. Jawahar

In this competition, we set up three tasks, namely, Scanned Receipt Text Localisation (Task 1), Scanned Receipt OCR (Task 2) and Key Information Extraction from Scanned Receipts (Task 3).

Key Information Extraction Optical Character Recognition (OCR)

FaceController: Controllable Attribute Editing for Face in the Wild

no code implementations23 Feb 2021 Zhiliang Xu, Xiyu Yu, Zhibin Hong, Zhen Zhu, Junyu Han, Jingtuo Liu, Errui Ding, Xiang Bai

By simply employing some existing and easy-obtainable prior information, our method can control, transfer, and edit diverse attributes of faces in the wild.

 Ranked #1 on Face Swapping on FaceForensics++ (FID metric)

Disentanglement Face Swapping

Occluded Video Instance Segmentation: A Benchmark

2 code implementations2 Feb 2021 Jiyang Qi, Yan Gao, Yao Hu, Xinggang Wang, Xiaoyu Liu, Xiang Bai, Serge Belongie, Alan Yuille, Philip H. S. Torr, Song Bai

On the OVIS dataset, the highest AP achieved by state-of-the-art algorithms is only 16. 3, which reveals that we are still at a nascent stage for understanding objects, instances, and videos in a real-world scenario.

Instance Segmentation Segmentation +3

WDNet: Watermark-Decomposition Network for Visible Watermark Removal

1 code implementation14 Dec 2020 Yang Liu, Zhen Zhu, Xiang Bai

Visible watermarks are widely-used in images to protect copyright ownership.

Image-to-Image Translation

Scene Text Detection with Scribble Lines

no code implementations9 Dec 2020 Wenqing Zhang, Yang Qiu, Minghui Liao, Rui Zhang, Xiaolin Wei, Xiang Bai

It is a general labeling method for texts with various shapes and requires low labeling costs.

Scene Text Detection Text Detection

Affinity Space Adaptation for Semantic Segmentation Across Domains

1 code implementation26 Sep 2020 Wei Zhou, Yukang Wang, Jiajia Chu, Jiehua Yang, Xiang Bai, Yongchao Xu

Specifically, we perform domain adaptation on the affinity relationship between adjacent pixels termed affinity space of source and target domain.

Segmentation Semantic Segmentation +1

FedOCR: Communication-Efficient Federated Learning for Scene Text Recognition

no code implementations22 Jul 2020 Wenqing Zhang, Yang Qiu, Song Bai, Rui Zhang, Xiaolin Wei, Xiang Bai

In this paper, we study how to make use of decentralized datasets for training a robust scene text recognizer while keeping them stay on local devices.

Federated Learning Privacy Preserving +1

EPNet: Enhancing Point Features with Image Semantics for 3D Object Detection

1 code implementation ECCV 2020 Tengteng Huang, Zhe Liu, Xiwu Chen, Xiang Bai

In this paper, we aim at addressing two critical issues in the 3D detection task, including the exploitation of multiple sensors~(namely LiDAR point cloud and camera image), as well as the inconsistency between the localization and classification confidence.

3D Object Detection General Classification +1

Maximum Entropy Regularization and Chinese Text Recognition

no code implementations9 Jul 2020 Changxu Cheng, Wuheng Xu, Xiang Bai, Bin Feng, Wenyu Liu

Chinese text recognition is more challenging than Latin text due to the large amount of fine-grained Chinese characters and the great imbalance over classes, which causes a serious overfitting problem.

Fine-Grained Image Classification

Super-BPD: Super Boundary-to-Pixel Direction for Fast Image Segmentation

1 code implementation CVPR 2020 Jianqiang Wan, Yang Liu, Donglai Wei, Xiang Bai, Yongchao Xu

In this paper, we propose a fast image segmentation method based on a novel super boundary-to-pixel direction (super-BPD) and a customized segmentation algorithm with super-BPD.

Image Segmentation Segmentation +2

Semantically Multi-modal Image Synthesis

1 code implementation CVPR 2020 Zhen Zhu, Zhiliang Xu, Ansheng You, Xiang Bai

Experiments on several challenging datasets demonstrate the superiority of GroupDNet on performing the SMIS task.

Image Generation

AutoSTR: Efficient Backbone Search for Scene Text Recognition

1 code implementation ECCV 2020 Hui Zhang, Quanming Yao, Mingkun Yang, Yongchao Xu, Xiang Bai

In this work, inspired by the success of neural architecture search (NAS), which can identify better architectures than human-designed ones, we propose automated STR (AutoSTR) to search data-dependent backbones to boost text recognition performance.

Deblurring Neural Architecture Search +1

Progressive Object Transfer Detection

no code implementations12 Feb 2020 Hao Chen, Yali Wang, Guoyou Wang, Xiang Bai, Yu Qiao

Inspired by this procedure of learning to detect, we propose a novel Progressive Object Transfer Detection (POTD) framework.

object-detection Object Detection

AutoScale: Learning to Scale for Crowd Counting and Localization

2 code implementations20 Dec 2019 Chenfeng Xu, Dingkang Liang, Yongchao Xu, Song Bai, Wei Zhan, Xiang Bai, Masayoshi Tomizuka

A major issue is that the density map on dense regions usually accumulates density values from a number of nearby Gaussian blobs, yielding different large density values on a small set of pixels.

Crowd Counting Model Optimization

ICDAR 2019 Robust Reading Challenge on Reading Chinese Text on Signboard

no code implementations20 Dec 2019 Xi Liu, Rui Zhang, Yongsheng Zhou, Qianyi Jiang, Qi Song, Nan Li, Kai Zhou, Lei Wang, Dong Wang, Minghui Liao, Mingkun Yang, Xiang Bai, Baoguang Shi, Dimosthenis Karatzas, Shijian Lu, C. V. Jawahar

21 teams submit results for Task 1, 23 teams submit results for Task 2, 24 teams submit results for Task 3, and 13 teams submit results for Task 4.

Line Detection

TANet: Robust 3D Object Detection from Point Clouds with Triple Attention

1 code implementation11 Dec 2019 Zhe Liu, Xin Zhao, Tengteng Huang, Ruolan Hu, Yu Zhou, Xiang Bai

In this paper, we focus on exploring the robustness of the 3D object detection in point clouds, which has been rarely discussed in existing approaches.

object-detection Robust 3D Object Detection

Patch Aggregator for Scene Text Script Identification

no code implementations9 Dec 2019 Changxu Cheng, Qiuhui Huang, Xiang Bai, Bin Feng, Wenyu Liu

Script identification in the wild is of great importance in a multi-lingual robust-reading system.


All You Need Is Boundary: Toward Arbitrary-Shaped Text Spotting

no code implementations21 Nov 2019 Hao Wang, Pu Lu, HUI ZHANG, Mingkun Yang, Xiang Bai, Yongchao Xu, Mengchao He, Yongpan Wang, Wenyu Liu

Recently, end-to-end text spotting that aims to detect and recognize text from cluttered images simultaneously has received particularly growing interest in computer vision.

Instance Segmentation Scene Text Detection +3

Gliding vertex on the horizontal bounding box for multi-oriented object detection

1 code implementation21 Nov 2019 Yongchao Xu, Mingtao Fu, Qimeng Wang, Yukang Wang, Kai Chen, Gui-Song Xia, Xiang Bai

Yet, the widely adopted horizontal bounding box representation is not appropriate for ubiquitous oriented objects such as objects in aerial images and scene texts.

Ranked #38 on Object Detection In Aerial Images on DOTA (using extra training data)

object-detection Object Detection In Aerial Images +4

Real-time Scene Text Detection with Differentiable Binarization

15 code implementations20 Nov 2019 Minghui Liao, Zhaoyi Wan, Cong Yao, Kai Chen, Xiang Bai

Recently, segmentation-based methods are quite popular in scene text detection, as the segmentation results can more accurately describe scene text of various shapes such as curve text.

Binarization Optical Character Recognition (OCR) +3

MASTER: Multi-Aspect Non-local Network for Scene Text Recognition

7 code implementations7 Oct 2019 Ning Lu, Wenwen Yu, Xianbiao Qi, Yihao Chen, Ping Gong, Rong Xiao, Xiang Bai

Attention-based scene text recognizers have gained huge success, which leverages a more compact intermediate representation to learn 1d- or 2d- attention by a RNN-based encoder-decoder architecture.

Scene Text Recognition

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

1 code implementation ECCV 2018 Minghui Liao, Pengyuan Lyu, Minghang He, Cong Yao, Wenhao Wu, Xiang Bai

Moreover, we further investigate the recognition module of our method separately, which significantly outperforms state-of-the-art methods on both regular and irregular text datasets for scene text recognition.

Scene Text Recognition Semantic Segmentation +2

Asymmetric Non-local Neural Networks for Semantic Segmentation

5 code implementations ICCV 2019 Zhen Zhu, Mengde Xu, Song Bai, Tengteng Huang, Xiang Bai

The non-local module works as a particularly useful technique for semantic segmentation while criticized for its prohibitive computation and GPU memory occupation.

Segmentation Semantic Segmentation +1

Editing Text in the Wild

2 code implementations8 Aug 2019 Liang Wu, Chengquan Zhang, Jiaming Liu, Junyu Han, Jingtuo Liu, Errui Ding, Xiang Bai

Specifically, we propose an end-to-end trainable style retention network (SRNet) that consists of three modules: text conversion module, background inpainting module and fusion module.

Image Inpainting Image-to-Image Translation +1

View N-gram Network for 3D Object Retrieval

no code implementations ICCV 2019 Xinwei He, Tengteng Huang, Song Bai, Xiang Bai

By doing so, spatial information across multiple views is captured, which helps to learn a discriminative global embedding for each 3D object.

3D Object Retrieval 3D Shape Classification +2

Symmetry-constrained Rectification Network for Scene Text Recognition

no code implementations ICCV 2019 MingKun Yang, Yushuo Guan, Minghui Liao, Xin He, Kaigui Bian, Song Bai, Cong Yao, Xiang Bai

Reading text in the wild is a very challenging task due to the diversity of text instances and the complexity of natural scenes.

Scene Text Recognition

Learn to Scale: Generating Multipolar Normalized Density Maps for Crowd Counting

no code implementations ICCV 2019 Chenfeng Xu, Kai Qiu, Jianlong Fu, Song Bai, Yongchao Xu, Xiang Bai

Dense crowd counting aims to predict thousands of human instances from an image, by calculating integrals of a density map over image pixels.

Crowd Counting Density Estimation

2D-CTC for Scene Text Recognition

no code implementations23 Jul 2019 Zhaoyi Wan, Fengming Xie, Yibo Liu, Xiang Bai, Cong Yao

Scene text recognition has been an important, active research topic in computer vision for years.

Scene Text Recognition speech-recognition +1

SynthText3D: Synthesizing Scene Text Images from 3D Virtual Worlds

1 code implementation13 Jul 2019 Minghui Liao, Boyu Song, Shangbang Long, Minghang He, Cong Yao, Xiang Bai

Different from the previous methods which paste the rendered text on static 2D images, our method can render the 3D virtual scene and text instances as an entirety.

Image Generation Scene Text Detection +1

iSAID: A Large-scale Dataset for Instance Segmentation in Aerial Images

3 code implementations30 May 2019 Syed Waqas Zamir, Aditya Arora, Akshita Gupta, Salman Khan, Guolei Sun, Fahad Shahbaz Khan, Fan Zhu, Ling Shao, Gui-Song Xia, Xiang Bai

Compared to existing small-scale aerial image based instance segmentation datasets, iSAID contains 15$\times$ the number of object categories and 5$\times$ the number of instances.

Instance Segmentation object-detection +3

Progressive Pose Attention Transfer for Person Image Generation

2 code implementations CVPR 2019 Zhen Zhu, Tengteng Huang, Baoguang Shi, Miao Yu, Bofei Wang, Xiang Bai

This paper proposes a new generative adversarial network for pose transfer, i. e., transferring the pose of a given person to a target pose.

Person Re-Identification Pose Transfer

TextField: Learning A Deep Direction Field for Irregular Scene Text Detection

1 code implementation4 Dec 2018 Yongchao Xu, Yukang Wang, Wei Zhou, Yongpan Wang, Zhibo Yang, Xiang Bai

Experimental results show that the proposed TextField outperforms the state-of-the-art methods by a large margin (28% and 8%) on two curved text datasets: Total-Text and CTW1500, respectively, and also achieves very competitive performance on multi-oriented datasets: ICDAR 2015 and MSRA-TD500.

Scene Text Detection Text Detection

DeepFlux for Skeletons in the Wild

2 code implementations CVPR 2019 Yukang Wang, Yongchao Xu, Stavros Tsogkas, Xiang Bai, Sven Dickinson, Kaleem Siddiqi

In the present article, we depart from this strategy by training a CNN to predict a two-dimensional vector field, which maps each scene point to a candidate skeleton pixel, in the spirit of flux-based skeletonization algorithms.

Edge Detection Object Skeleton Detection +2

Scene Text Recognition from Two-Dimensional Perspective

no code implementations18 Sep 2018 Minghui Liao, Jian Zhang, Zhaoyi Wan, Fengming Xie, Jiajun Liang, Pengyuan Lyu, Cong Yao, Xiang Bai

Inspired by speech recognition, recent state-of-the-art algorithms mostly consider scene text recognition as a sequence prediction problem.

Scene Text Recognition Semantic Segmentation +4

Hard-Aware Point-to-Set Deep Metric for Person Re-identification

1 code implementation ECCV 2018 Rui Yu, Zhiyong Dou, Song Bai, Zhao-Xiang Zhang, Yongchao Xu, Xiang Bai

Person re-identification (re-ID) is a highly challenging task due to large variations of pose, viewpoint, illumination, and occlusion.

Metric Learning Person Re-Identification

Adaptively Transforming Graph Matching

no code implementations ECCV 2018 Fu-Dong Wang, Nan Xue, Yi-Peng Zhang, Xiang Bai, Gui-Song Xia

Due to an efficient Frank-Wolfe method-based optimization strategy, we can handle graphs with hundreds and thousands of nodes within an acceptable amount of time.

Domain Adaptation Graph Matching

PCL: Proposal Cluster Learning for Weakly Supervised Object Detection

4 code implementations9 Jul 2018 Peng Tang, Xinggang Wang, Song Bai, Wei Shen, Xiang Bai, Wenyu Liu, Alan Yuille

The iterative instance classifier refinement is implemented online using multiple streams in convolutional neural networks, where the first is an MIL network and the others are for instance classifier refinement supervised by the preceding one.

Multiple Instance Learning object-detection +2

Non-Stationary Texture Synthesis by Adversarial Expansion

1 code implementation11 May 2018 Yang Zhou, Zhen Zhu, Xiang Bai, Dani Lischinski, Daniel Cohen-Or, Hui Huang

We demonstrate that this conceptually simple approach is highly effective for capturing large-scale structures, as well as other non-stationary attributes of the input exemplar.

Texture Synthesis

Triplet-Center Loss for Multi-View 3D Object Retrieval

1 code implementation CVPR 2018 Xinwei He, Yang Zhou, Zhichao Zhou, Song Bai, Xiang Bai

Most existing 3D object recognition algorithms focus on leveraging the strong discriminative power of deep learning models with softmax loss for the classification of 3D data, while learning discriminative features with deep metric learning for 3D object retrieval is more or less neglected.

3D Object Recognition 3D Object Retrieval +5

TextBoxes++: A Single-Shot Oriented Scene Text Detector

3 code implementations9 Jan 2018 Minghui Liao, Baoguang Shi, Xiang Bai

In this paper, we present an end-to-end trainable fast scene text detector, named TextBoxes++, which detects arbitrary-oriented scene text with both high accuracy and efficiency in a single network forward pass.

object-detection Object Detection +3

Deep-Person: Learning Discriminative Deep Features for Person Re-Identification

1 code implementation29 Nov 2017 Xiang Bai, Mingkun Yang, Tengteng Huang, Zhiyong Dou, Rui Yu, Yongchao Xu

Recently, many methods of person re-identification (Re-ID) rely on part-based feature representation to learn a discriminative pedestrian descriptor.

Person Re-Identification Re-Ranking

Ensemble Diffusion for Retrieval

no code implementations ICCV 2017 Song Bai, Zhichao Zhou, Jingdong Wang, Xiang Bai, Longin Jan Latecki, Qi Tian

This stimulates a great research interest of considering similarity fusion in the framework of diffusion process (i. e., fusion with diffusion) for robust retrieval.

3D Shape Classification 3D Shape Retrieval +2

Auto-Encoder Guided GAN for Chinese Calligraphy Synthesis

no code implementations27 Jun 2017 Pengyuan Lyu, Xiang Bai, Cong Yao, Zhen Zhu, Tengteng Huang, Wenyu Liu

In this paper, we investigate the Chinese calligraphy synthesis problem: synthesizing Chinese calligraphy images with specified style from standard font(eg.

Image-to-Image Translation Translation

Deep Patch Learning for Weakly Supervised Object Classification and Discovery

1 code implementation6 May 2017 Peng Tang, Xinggang Wang, Zilong Huang, Xiang Bai, Wenyu Liu

Patch-level image representation is very important for object classification and detection, since it is robust to spatial transformation, scale variation, and cluttered background.

Classification General Classification +3

Integrating Scene Text and Visual Appearance for Fine-Grained Image Classification

no code implementations15 Apr 2017 Xiang Bai, Mingkun Yang, Pengyuan Lyu, Yongchao Xu, Jiebo Luo

Then, we combine the word embedding of the recognized words and the deep visual features into a single representation, which is optimized by a convolutional neural network for fine-grained image classification.

Classification Fine-Grained Image Classification +2

Multiple Instance Detection Network with Online Instance Classifier Refinement

4 code implementations CVPR 2017 Peng Tang, Xinggang Wang, Xiang Bai, Wenyu Liu

We propose a novel online instance classifier refinement algorithm to integrate MIL and the instance classifier refinement procedure into a single deep network, and train the network end-to-end with only image-level supervision, i. e., without object location information.

Multiple Instance Learning object-detection +2

Scalable Person Re-identification on Supervised Smoothed Manifold

no code implementations CVPR 2017 Song Bai, Xiang Bai, Qi Tian

Most existing person re-identification algorithms either extract robust visual features or learn discriminative metrics for person images.

Person Re-Identification

Detecting Oriented Text in Natural Images by Linking Segments

6 code implementations CVPR 2017 Baoguang Shi, Xiang Bai, Serge Belongie

It achieves an f-measure of 75. 0% on the standard ICDAR 2015 Incidental (Challenge 4) benchmark, outperforming the previous best by a large margin.

Curved Text Detection Text Detection

Anisotropic-Scale Junction Detection and Matching for Indoor Images

no code implementations16 Mar 2017 Nan Xue, Gui-Song Xia, Xiang Bai, Liangpei Zhang, Weiming Shen

This paper presents a novel approach to junction detection and characterization that exploits the locally anisotropic geometries of a junction and estimates the scales of these geometries using an \emph{a contrario} model.

Junction Detection

Image Stitching by Line-guided Local Warping with Global Similarity Constraint

no code implementations25 Feb 2017 Tian-Zhu Xiang, Gui-Song Xia, Xiang Bai, Liangpei Zhang

On one hand, the line features are integrated into a local warping model through a designed weight function.

Image Stitching

Texture Characterization by Using Shape Co-occurrence Patterns

no code implementations10 Feb 2017 Gui-Song Xia, Gang Liu, Xiang Bai, Liangpei Zhang

In contrast with existing works, the proposed method not only inherits the strong ability to depict geometrical aspects of textures and the high robustness to variations of imaging conditions from the shape-based method, but also provides a flexible way to consider shape relationships and to compute high-order statistics on the tree.

Descriptive Texture Classification

TextBoxes: A Fast Text Detector with a Single Deep Neural Network

4 code implementations21 Nov 2016 Minghui Liao, Baoguang Shi, Xiang Bai, Xinggang Wang, Wenyu Liu

This paper presents an end-to-end trainable fast scene text detector, named TextBoxes, which detects scene text with both high accuracy and efficiency in a single network forward pass, involving no post-process except for a standard non-maximum suppression.

Revisiting Multiple Instance Neural Networks

no code implementations8 Oct 2016 Xinggang Wang, Yongluan Yan, Peng Tang, Xiang Bai, Wenyu Liu

We propose a new multiple instance neural network to learn bag representations, which is different from the existing multiple instance neural networks that focus on estimating instance label.

Multiple Instance Learning Weakly-supervised Learning

DeepSkeleton: Learning Multi-task Scale-associated Deep Side Outputs for Object Skeleton Extraction in Natural Images

1 code implementation13 Sep 2016 Wei Shen, Kai Zhao, Yuan Jiang, Yan Wang, Xiang Bai, Alan Yuille

By observing the relationship between the receptive field sizes of the different layers in the network and the skeleton scales they can capture, we introduce two scale-associated side outputs to each stage of the network.

Multi-Task Learning object-detection +2

Deep FisherNet for Object Classification

no code implementations31 Jul 2016 Peng Tang, Xinggang Wang, Baoguang Shi, Xiang Bai, Wenyu Liu, Zhuowen Tu

Our proposed FisherNet combines convolutional neural network training and Fisher Vector encoding in a single end-to-end structure.

Classification General Classification +1

Scene Text Detection via Holistic, Multi-Channel Prediction

no code implementations29 Jun 2016 Cong Yao, Xiang Bai, Nong Sang, Xinyu Zhou, Shuchang Zhou, Zhimin Cao

Recently, scene text detection has become an active research topic in computer vision and document analysis, because of its great importance and significant challenge.

Scene Text Detection Semantic Segmentation +1

Multidimensional Scaling on Multiple Input Distance Matrices

no code implementations1 May 2016 Song Bai, Xiang Bai, Longin Jan Latecki, Qi Tian

How to do multidimensional scaling on multiple input distance matrices is still unsolved to our best knowledge.

Object Skeleton Extraction in Natural Images by Fusing Scale-associated Deep Side Outputs

no code implementations CVPR 2016 Wei Shen, Kai Zhao, Yuan Jiang, Yan Wang, Zhijiang Zhang, Xiang Bai

Object skeleton is a useful cue for object detection, complementary to the object contour, as it provides a structural representation to describe the relationship among object parts.

object-detection Object Detection

Relaxed Multiple-Instance SVM with Application to Object Discovery

no code implementations ICCV 2015 Xinggang Wang, Zhuotun Zhu, Cong Yao, Xiang Bai

Multiple-instance learning (MIL) has served as an important tool for a wide range of vision applications, for instance, image classification, object detection, and visual tracking.

General Classification Image Classification +5

An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

82 code implementations21 Jul 2015 Baoguang Shi, Xiang Bai, Cong Yao

In this paper, we investigate the problem of scene text recognition, which is among the most important and challenging tasks in image-based sequence recognition.

Optical Character Recognition (OCR) Scene Text Recognition