Intra-class Feature Variation Distillation for Semantic Segmentation

1 code implementation ECCV 2020 Yukang Wang, Wei Zhou, Tao Jiang, Xiang Bai, Yongchao Xu

In this paper, different from previous methods performing knowledge distillation for densely pairwise relations, we propose a novel intra-class feature variation distillation (IFVD) to transfer the intra-class feature variation (IFV) of the cumbersome model (teacher) to the compact model (student).

Knowledge Distillation Semantic Segmentation

Vision-Language Pre-Training for Boosting Scene Text Detectors

no code implementations29 Apr 2022 Sibo Song, Jianqiang Wan, Zhibo Yang, Jun Tang, Wenqing Cheng, Xiang Bai, Cong Yao

In this paper, we specifically adapt vision-language joint learning for scene text detection, a task that intrinsically involves cross-modal interaction between the two modalities: vision and language, since text is the written form of language.

Contrastive Learning Language Modelling +3

GitNet: Geometric Prior-based Transformation for Birds-Eye-View Segmentation

no code implementations16 Apr 2022 Shi Gong, Xiaoqing Ye, Xiao Tan, Jingdong Wang, Errui Ding, Yu Zhou, Xiang Bai

Birds-eye-view (BEV) semantic segmentation is critical for autonomous driving for its powerful spatial representation ability.

Autonomous Driving Semantic Segmentation

An Empirical Study of End-to-End Temporal Action Detection

1 code implementation6 Apr 2022 Xiaolong Liu, Song Bai, Xiang Bai

Rather than end-to-end learning, most existing methods adopt a head-only learning paradigm, where the video encoder is pre-trained for action classification, and only the detection head upon the encoder is optimized for TAD.

Action Classification Action Detection +2

Few Could Be Better Than All: Feature Sampling and Grouping for Scene Text Detection

no code implementations29 Mar 2022 Jingqun Tang, Wenqing Zhang, Hongye Liu, Mingkun Yang, Bo Jiang, Guanglong Hu, Xiang Bai

Different from previous approaches that learn robust deep representations of scene text in a holistic manner, our method performs scene text detection based on a few representative features, which avoids the disturbance by background and reduces the computational cost.

Object Detection In Aerial Images Scene Text Detection

Comprehensive Benchmark Datasets for Amharic Scene Text Detection and Recognition

no code implementations23 Mar 2022 Wondimu Dikubab, Dingkang Liang, Minghui Liao, Xiang Bai

Ethiopic/Amharic script is one of the oldest African writing systems, which serves at least 23 languages (e. g., Amharic, Tigrinya) in East Africa for more than 120 million people.

Scene Text Detection

Syntax-Aware Network for Handwritten Mathematical Expression Recognition

1 code implementation3 Mar 2022 Ye Yuan, Xiao Liu, Wondimu Dikubab, Hui Liu, Zhilong Ji, Zhongqin Wu, Xiang Bai

In this paper, we propose a simple and efficient method for HMER, which is the first to incorporate syntax information into an encoder-decoder network.

An End-to-End Transformer Model for Crowd Localization

no code implementations26 Feb 2022 Dingkang Liang, Wei Xu, Xiang Bai

Crowd localization, predicting head positions, is a more practical and high-level task than simply counting.

Real-Time Scene Text Detection with Differentiable Binarization and Adaptive Scale Fusion

1 code implementation21 Feb 2022 Minghui Liao, Zhisheng Zou, Zhaoyi Wan, Cong Yao, Xiang Bai

By incorporating the proposed DB and ASF with the segmentation network, our proposed scene text detector consistently achieves state-of-the-art results, in terms of both detection accuracy and speed, on five standard benchmarks.

Binarization Scene Text Detection

A Simple Baseline for Zero-shot Semantic Segmentation with Pre-trained Vision-language Model

1 code implementation29 Dec 2021 Mengde Xu, Zheng Zhang, Fangyun Wei, Yutong Lin, Yue Cao, Han Hu, Xiang Bai

Recently, zero-shot image classification by vision-language pre-training has demonstrated incredible achievements, that the model can classify arbitrary category without seeing additional annotated images of that category.

Image Classification Language Modelling +4

EPNet++: Cascade Bi-directional Fusion for Multi-Modal 3D Object Detection

no code implementations21 Dec 2021 Zhe Liu, Tengteng Huang, Bingling Li, Xiwu Chen, Xi Wang, Xiang Bai

Recently, fusing the LiDAR point cloud and camera image to improve the performance and robustness of 3D object detection has received more and more attention, as these two modalities naturally possess strong complementarity.

3D Object Detection

SPTS: Single-Point Text Spotting

no code implementations15 Dec 2021 Dezhi Peng, Xinyu Wang, Yuliang Liu, Jiaxin Zhang, Mingxin Huang, Songxuan Lai, Shenggao Zhu, Jing Li, Dahua Lin, Chunhua Shen, Xiang Bai, Lianwen Jin

For the first time, we demonstrate that training scene text spotting models can be achieved with an extremely low-cost annotation of a single-point for each instance.

Language Modelling Text Spotting

SeqFormer: a Frustratingly Simple Model for Video Instance Segmentation

1 code implementation15 Dec 2021 Junfeng Wu, Yi Jiang, Wenqing Zhang, Xiang Bai, Song Bai

Nevertheless, we observe that a stand-alone instance query suffices for capturing a time sequence of instances in a video, but attention mechanisms should be done with each frame independently.

Ranked #2 on Video Instance Segmentation on YouTube-VIS validation (using extra training data)

Frame Instance Segmentation +2

PRA-Net: Point Relation-Aware Network for 3D Point Cloud Analysis

1 code implementation9 Dec 2021 Silin Cheng, Xiwu Chen, Xinwei He, Zhe Liu, Xiang Bai

Learning intra-region contexts and inter-region relations are two effective strategies to strengthen feature representations for point cloud analysis.

3D Point Cloud Classification

Occluded Video Instance Segmentation: Dataset and ICCV 2021 Challenge

no code implementations15 Nov 2021 Jiyang Qi, Yan Gao, Yao Hu, Xinggang Wang, Xiaoyu Liu, Xiang Bai, Serge Belongie, Alan Yuille, Philip H. S. Torr, Song Bai

To promote the development of occlusion understanding, we collect a large-scale dataset called OVIS for video instance segmentation in the occluded scenario.

Instance Segmentation Object Recognition +3

Video Text Tracking With a Spatio-Temporal Complementary Model

1 code implementation9 Nov 2021 Yuzhe Gao, Xing Li, Jiajian Zhang, Yu Zhou, Dian Jin, Jing Wang, Shenggao Zhu, Xiang Bai

We leverage a Siamese ComplementaryModule to fully exploit the continuity characteristic of the textinstances in the temporal dimension, which effectively alleviatesthe missed detection of the text instances, and hence ensuresthe completeness of each text trajectory.

Frame text similarity

Bootstrap Your Object Detector via Mixed Training

1 code implementation NeurIPS 2021 Mengde Xu, Zheng Zhang, Fangyun Wei, Yutong Lin, Yue Cao, Stephen Lin, Han Hu, Xiang Bai

We introduce MixTraining, a new training paradigm for object detection that can improve the performance of existing detectors for free.

Data Augmentation Object Detection

CAP-Net: Correspondence-Aware Point-view Fusion Network for 3D Shape Analysis

no code implementations3 Sep 2021 Xinwei He, Silin Cheng, Song Bai, Xiang Bai

The core element of CAP-Net is a module named Correspondence-Aware Fusion (CAF) which integrates the local features of the two modalities based on their correspondence scores.

3D Object Classification

Improving OCR-Based Image Captioning by Incorporating Geometrical Relationship

no code implementations CVPR 2021 Jing Wang, Jinhui Tang, Mingkun Yang, Xiang Bai, Jiebo Luo

Under the guidance of the geometrical relationship between OCR tokens, our LSTM-R capitalizes on a newly-devised relation-aware pointer network to select OCR tokens from the scene text for OCR-based image captioning.

Image Captioning Optical Character Recognition

End-to-end Temporal Action Detection with Transformer

1 code implementation18 Jun 2021 Xiaolong Liu, Qimeng Wang, Yao Hu, Xu Tang, Song Bai, Xiang Bai

Temporal action detection (TAD) aims to determine the semantic label and the boundaries of every action instance in an untrimmed video.

Action Detection Temporal Action Localization +1

TransCrowd: Weakly-Supervised Crowd Counting with Transformer

1 code implementation19 Apr 2021 Dingkang Liang, Xiwu Chen, Wei Xu, Yu Zhou, Xiang Bai

Current weakly-supervised counting methods adopt the CNN to regress a total count of the crowd by an image-to-count paradigm.

Crowd Counting

Scene Text Retrieval via Joint Text Detection and Similarity Learning

1 code implementation CVPR 2021 Hao Wang, Xiang Bai, Mingkun Yang, Shenggao Zhu, Jing Wang, Wenyu Liu

Such a task is usually realized by matching a query text to the recognized words, outputted by an end-to-end scene text spotter.

Scene Text Detection Text Spotting

MOST: A Multi-Oriented Scene Text Detector with Localization Refinement

no code implementations CVPR 2021 Minghang He, Minghui Liao, Zhibo Yang, Humen Zhong, Jun Tang, Wenqing Cheng, Cong Yao, Yongpan Wang, Xiang Bai

Over the past few years, the field of scene text detection has progressed rapidly that modern text detectors are able to hunt text in various challenging scenarios.

Scene Text Detection

InsertGNN: Can Graph Neural Networks Outperform Humans in TOEFL Sentence Insertion Problem?

no code implementations28 Mar 2021 Fang Wu, Xiang Bai

To the best of our knowledge, this is the first recorded attempt to apply a supervised graph-structured model in sentence insertion.

Question Answering Sentence Ordering

Progressive and Aligned Pose Attention Transfer for Person Image Generation

1 code implementation22 Mar 2021 Zhen Zhu, Tengteng Huang, Mengde Xu, Baoguang Shi, Wenqing Cheng, Xiang Bai

This paper proposes a new generative adversarial network for pose transfer, i. e., transferring the pose of a given person to a target pose.

Data Augmentation Person Re-Identification +1

ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction

1 code implementation18 Mar 2021 Zheng Huang, Kai Chen, Jianhua He, Xiang Bai, Dimosthenis Karatzas, Shjian Lu, C. V. Jawahar

In this competition, we set up three tasks, namely, Scanned Receipt Text Localisation (Task 1), Scanned Receipt OCR (Task 2) and Key Information Extraction from Scanned Receipts (Task 3).

Key information extraction Optical Character Recognition

FaceController: Controllable Attribute Editing for Face in the Wild

no code implementations23 Feb 2021 Zhiliang Xu, Xiyu Yu, Zhibin Hong, Zhen Zhu, Junyu Han, Jingtuo Liu, Errui Ding, Xiang Bai

By simply employing some existing and easy-obtainable prior information, our method can control, transfer, and edit diverse attributes of faces in the wild.

 Ranked #1 on Face Swapping on FaceForensics++ (FID metric)

Disentanglement Face Swapping

Occluded Video Instance Segmentation: A Benchmark

1 code implementation2 Feb 2021 Jiyang Qi, Yan Gao, Yao Hu, Xinggang Wang, Xiaoyu Liu, Xiang Bai, Serge Belongie, Alan Yuille, Philip H. S. Torr, Song Bai

On the OVIS dataset, the highest AP achieved by state-of-the-art algorithms is only 16. 3, which reveals that we are still at a nascent stage for understanding objects, instances, and videos in a real-world scenario.

Instance Segmentation Semantic Segmentation +2

WDNet: Watermark-Decomposition Network for Visible Watermark Removal

1 code implementation14 Dec 2020 Yang Liu, Zhen Zhu, Xiang Bai

Visible watermarks are widely-used in images to protect copyright ownership.

Image-to-Image Translation

Scene Text Detection with Scribble Lines

no code implementations9 Dec 2020 Wenqing Zhang, Yang Qiu, Minghui Liao, Rui Zhang, Xiaolin Wei, Xiang Bai

It is a general labeling method for texts with various shapes and requires low labeling costs.

Scene Text Detection

Affinity Space Adaptation for Semantic Segmentation Across Domains

1 code implementation26 Sep 2020 Wei Zhou, Yukang Wang, Jiajia Chu, Jiehua Yang, Xiang Bai, Yongchao Xu

Specifically, we perform domain adaptation on the affinity relationship between adjacent pixels termed affinity space of source and target domain.

Semantic Segmentation Unsupervised Domain Adaptation

FedOCR: Communication-Efficient Federated Learning for Scene Text Recognition

no code implementations22 Jul 2020 Wenqing Zhang, Yang Qiu, Song Bai, Rui Zhang, Xiaolin Wei, Xiang Bai

In this paper, we study how to make use of decentralized datasets for training a robust scene text recognizer while keeping them stay on local devices.

Federated Learning Scene Text Recognition

Mask TextSpotter v3: Segmentation Proposal Network for Robust Scene Text Spotting

1 code implementation ECCV 2020 Minghui Liao, Guan Pang, Jing Huang, Tal Hassner, Xiang Bai

Recent end-to-end trainable methods for scene text spotting, integrating detection and recognition, showed much progress.

Region Proposal Text Spotting

EPNet: Enhancing Point Features with Image Semantics for 3D Object Detection

1 code implementation ECCV 2020 Tengteng Huang, Zhe Liu, Xiwu Chen, Xiang Bai

In this paper, we aim at addressing two critical issues in the 3D detection task, including the exploitation of multiple sensors~(namely LiDAR point cloud and camera image), as well as the inconsistency between the localization and classification confidence.

3D Object Detection General Classification

Maximum Entropy Regularization and Chinese Text Recognition

no code implementations9 Jul 2020 Changxu Cheng, Wuheng Xu, Xiang Bai, Bin Feng, Wenyu Liu

Chinese text recognition is more challenging than Latin text due to the large amount of fine-grained Chinese characters and the great imbalance over classes, which causes a serious overfitting problem.

Fine-Grained Image Classification

Super-BPD: Super Boundary-to-Pixel Direction for Fast Image Segmentation

1 code implementation CVPR 2020 Jianqiang Wan, Yang Liu, Donglai Wei, Xiang Bai, Yongchao Xu

In this paper, we propose a fast image segmentation method based on a novel super boundary-to-pixel direction (super-BPD) and a customized segmentation algorithm with super-BPD.

BSDS500 Semantic Segmentation +1

Semantically Multi-modal Image Synthesis

1 code implementation CVPR 2020 Zhen Zhu, Zhiliang Xu, Ansheng You, Xiang Bai

Experiments on several challenging datasets demonstrate the superiority of GroupDNet on performing the SMIS task.

Image Generation

AutoSTR: Efficient Backbone Search for Scene Text Recognition

1 code implementation ECCV 2020 Hui Zhang, Quanming Yao, Mingkun Yang, Yongchao Xu, Xiang Bai

In this work, inspired by the success of neural architecture search (NAS), which can identify better architectures than human-designed ones, we propose automated STR (AutoSTR) to search data-dependent backbones to boost text recognition performance.

Deblurring Neural Architecture Search +1

Progressive Object Transfer Detection

no code implementations12 Feb 2020 Hao Chen, Yali Wang, Guoyou Wang, Xiang Bai, Yu Qiao

Inspired by this procedure of learning to detect, we propose a novel Progressive Object Transfer Detection (POTD) framework.

Object Detection

AutoScale: Learning to Scale for Crowd Counting and Localization

2 code implementations20 Dec 2019 Chenfeng Xu, Dingkang Liang, Yongchao Xu, Song Bai, Wei Zhan, Xiang Bai, Masayoshi Tomizuka

A major issue is that the density map on dense regions usually accumulates density values from a number of nearby Gaussian blobs, yielding different large density values on a small set of pixels.

Crowd Counting

ICDAR 2019 Robust Reading Challenge on Reading Chinese Text on Signboard

no code implementations20 Dec 2019 Xi Liu, Rui Zhang, Yongsheng Zhou, Qianyi Jiang, Qi Song, Nan Li, Kai Zhou, Lei Wang, Dong Wang, Minghui Liao, Mingkun Yang, Xiang Bai, Baoguang Shi, Dimosthenis Karatzas, Shijian Lu, C. V. Jawahar

21 teams submit results for Task 1, 23 teams submit results for Task 2, 24 teams submit results for Task 3, and 13 teams submit results for Task 4.

Line Detection

TANet: Robust 3D Object Detection from Point Clouds with Triple Attention

1 code implementation11 Dec 2019 Zhe Liu, Xin Zhao, Tengteng Huang, Ruolan Hu, Yu Zhou, Xiang Bai

In this paper, we focus on exploring the robustness of the 3D object detection in point clouds, which has been rarely discussed in existing approaches.

3D Object Detection

Patch Aggregator for Scene Text Script Identification

no code implementations9 Dec 2019 Changxu Cheng, Qiuhui Huang, Xiang Bai, Bin Feng, Wenyu Liu

Script identification in the wild is of great importance in a multi-lingual robust-reading system.

Gliding vertex on the horizontal bounding box for multi-oriented object detection

1 code implementation21 Nov 2019 Yongchao Xu, Mingtao Fu, Qimeng Wang, Yukang Wang, Kai Chen, Gui-Song Xia, Xiang Bai

Yet, the widely adopted horizontal bounding box representation is not appropriate for ubiquitous oriented objects such as objects in aerial images and scene texts.

Object Detection In Aerial Images Pedestrian Detection +1

All You Need Is Boundary: Toward Arbitrary-Shaped Text Spotting

no code implementations21 Nov 2019 Hao Wang, Pu Lu, HUI ZHANG, Mingkun Yang, Xiang Bai, Yongchao Xu, Mengchao He, Yongpan Wang, Wenyu Liu

Recently, end-to-end text spotting that aims to detect and recognize text from cluttered images simultaneously has received particularly growing interest in computer vision.

Instance Segmentation Scene Text Detection +2

Real-time Scene Text Detection with Differentiable Binarization

11 code implementations20 Nov 2019 Minghui Liao, Zhaoyi Wan, Cong Yao, Kai Chen, Xiang Bai

Recently, segmentation-based methods are quite popular in scene text detection, as the segmentation results can more accurately describe scene text of various shapes such as curve text.

Ranked #5 on Scene Text Detection on SCUT-CTW1500 (F-Measure metric)

Binarization Optical Character Recognition +1

MASTER: Multi-Aspect Non-local Network for Scene Text Recognition

5 code implementations7 Oct 2019 Ning Lu, Wenwen Yu, Xianbiao Qi, Yihao Chen, Ping Gong, Rong Xiao, Xiang Bai

Attention-based scene text recognizers have gained huge success, which leverages a more compact intermediate representation to learn 1d- or 2d- attention by a RNN-based encoder-decoder architecture.

Scene Text Recognition

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

1 code implementation ECCV 2018 Minghui Liao, Pengyuan Lyu, Minghang He, Cong Yao, Wenhao Wu, Xiang Bai

Moreover, we further investigate the recognition module of our method separately, which significantly outperforms state-of-the-art methods on both regular and irregular text datasets for scene text recognition.

Scene Text Recognition Semantic Segmentation +1

Asymmetric Non-local Neural Networks for Semantic Segmentation

5 code implementations ICCV 2019 Zhen Zhu, Mengde Xu, Song Bai, Tengteng Huang, Xiang Bai

The non-local module works as a particularly useful technique for semantic segmentation while criticized for its prohibitive computation and GPU memory occupation.

Semantic Segmentation

Editing Text in the Wild

2 code implementations8 Aug 2019 Liang Wu, Chengquan Zhang, Jiaming Liu, Junyu Han, Jingtuo Liu, Errui Ding, Xiang Bai

Specifically, we propose an end-to-end trainable style retention network (SRNet) that consists of three modules: text conversion module, background inpainting module and fusion module.

Image Inpainting Image-to-Image Translation +1

View N-gram Network for 3D Object Retrieval

no code implementations ICCV 2019 Xinwei He, Tengteng Huang, Song Bai, Xiang Bai

By doing so, spatial information across multiple views is captured, which helps to learn a discriminative global embedding for each 3D object.

3D Object Retrieval 3D Shape Classification +1

Symmetry-constrained Rectification Network for Scene Text Recognition

no code implementations ICCV 2019 MingKun Yang, Yushuo Guan, Minghui Liao, Xin He, Kaigui Bian, Song Bai, Cong Yao, Xiang Bai

Reading text in the wild is a very challenging task due to the diversity of text instances and the complexity of natural scenes.

Scene Text Recognition

Learn to Scale: Generating Multipolar Normalized Density Maps for Crowd Counting

1 code implementation ICCV 2019 Chenfeng Xu, Kai Qiu, Jianlong Fu, Song Bai, Yongchao Xu, Xiang Bai

Dense crowd counting aims to predict thousands of human instances from an image, by calculating integrals of a density map over image pixels.

Crowd Counting Density Estimation

2D-CTC for Scene Text Recognition

no code implementations23 Jul 2019 Zhaoyi Wan, Fengming Xie, Yibo Liu, Xiang Bai, Cong Yao

Scene text recognition has been an important, active research topic in computer vision for years.

Scene Text Recognition Speech Recognition

SynthText3D: Synthesizing Scene Text Images from 3D Virtual Worlds

1 code implementation13 Jul 2019 Minghui Liao, Boyu Song, Shangbang Long, Minghang He, Cong Yao, Xiang Bai

Different from the previous methods which paste the rendered text on static 2D images, our method can render the 3D virtual scene and text instances as an entirety.

Image Generation Scene Text Detection

iSAID: A Large-scale Dataset for Instance Segmentation in Aerial Images

3 code implementations30 May 2019 Syed Waqas Zamir, Aditya Arora, Akshita Gupta, Salman Khan, Guolei Sun, Fahad Shahbaz Khan, Fan Zhu, Ling Shao, Gui-Song Xia, Xiang Bai

Compared to existing small-scale aerial image based instance segmentation datasets, iSAID contains 15$\times$ the number of object categories and 5$\times$ the number of instances.

Instance Segmentation Object Detection +1

Progressive Pose Attention Transfer for Person Image Generation

2 code implementations CVPR 2019 Zhen Zhu, Tengteng Huang, Baoguang Shi, Miao Yu, Bofei Wang, Xiang Bai

This paper proposes a new generative adversarial network for pose transfer, i. e., transferring the pose of a given person to a target pose.

Person Re-Identification Pose Transfer

TextField: Learning A Deep Direction Field for Irregular Scene Text Detection

1 code implementation4 Dec 2018 Yongchao Xu, Yukang Wang, Wei Zhou, Yongpan Wang, Zhibo Yang, Xiang Bai

Experimental results show that the proposed TextField outperforms the state-of-the-art methods by a large margin (28% and 8%) on two curved text datasets: Total-Text and CTW1500, respectively, and also achieves very competitive performance on multi-oriented datasets: ICDAR 2015 and MSRA-TD500.

Scene Text Detection

DeepFlux for Skeletons in the Wild

1 code implementation CVPR 2019 Yukang Wang, Yongchao Xu, Stavros Tsogkas, Xiang Bai, Sven Dickinson, Kaleem Siddiqi

In the present article, we depart from this strategy by training a CNN to predict a two-dimensional vector field, which maps each scene point to a candidate skeleton pixel, in the spirit of flux-based skeletonization algorithms.

Edge Detection Frame +3

Scene Text Recognition from Two-Dimensional Perspective

no code implementations18 Sep 2018 Minghui Liao, Jian Zhang, Zhaoyi Wan, Fengming Xie, Jiajun Liang, Pengyuan Lyu, Cong Yao, Xiang Bai

Inspired by speech recognition, recent state-of-the-art algorithms mostly consider scene text recognition as a sequence prediction problem.

Scene Text Recognition Semantic Segmentation +1

Hard-Aware Point-to-Set Deep Metric for Person Re-identification

1 code implementation ECCV 2018 Rui Yu, Zhiyong Dou, Song Bai, Zhao-Xiang Zhang, Yongchao Xu, Xiang Bai

Person re-identification (re-ID) is a highly challenging task due to large variations of pose, viewpoint, illumination, and occlusion.

Metric Learning Person Re-Identification

Adaptively Transforming Graph Matching

no code implementations ECCV 2018 Fu-Dong Wang, Nan Xue, Yi-Peng Zhang, Xiang Bai, Gui-Song Xia

Due to an efficient Frank-Wolfe method-based optimization strategy, we can handle graphs with hundreds and thousands of nodes within an acceptable amount of time.

Domain Adaptation Graph Matching

PCL: Proposal Cluster Learning for Weakly Supervised Object Detection

3 code implementations9 Jul 2018 Peng Tang, Xinggang Wang, Song Bai, Wei Shen, Xiang Bai, Wenyu Liu, Alan Yuille

The iterative instance classifier refinement is implemented online using multiple streams in convolutional neural networks, where the first is an MIL network and the others are for instance classifier refinement supervised by the preceding one.

Multiple Instance Learning Object Recognition +1

Non-Stationary Texture Synthesis by Adversarial Expansion

1 code implementation11 May 2018 Yang Zhou, Zhen Zhu, Xiang Bai, Dani Lischinski, Daniel Cohen-Or, Hui Huang

We demonstrate that this conceptually simple approach is highly effective for capturing large-scale structures, as well as other non-stationary attributes of the input exemplar.

Texture Synthesis

Triplet-Center Loss for Multi-View 3D Object Retrieval

1 code implementation CVPR 2018 Xinwei He, Yang Zhou, Zhichao Zhou, Song Bai, Xiang Bai

Most existing 3D object recognition algorithms focus on leveraging the strong discriminative power of deep learning models with softmax loss for the classification of 3D data, while learning discriminative features with deep metric learning for 3D object retrieval is more or less neglected.

3D Object Recognition 3D Object Retrieval +4

TextBoxes++: A Single-Shot Oriented Scene Text Detector

3 code implementations9 Jan 2018 Minghui Liao, Baoguang Shi, Xiang Bai

In this paper, we present an end-to-end trainable fast scene text detector, named TextBoxes++, which detects arbitrary-oriented scene text with both high accuracy and efficiency in a single network forward pass.

Object Detection Scene Text Detection +1

Deep-Person: Learning Discriminative Deep Features for Person Re-Identification

1 code implementation29 Nov 2017 Xiang Bai, Mingkun Yang, Tengteng Huang, Zhiyong Dou, Rui Yu, Yongchao Xu

Recently, many methods of person re-identification (Re-ID) rely on part-based feature representation to learn a discriminative pedestrian descriptor.

Person Re-Identification Re-Ranking

Ensemble Diffusion for Retrieval

no code implementations ICCV 2017 Song Bai, Zhichao Zhou, Jingdong Wang, Xiang Bai, Longin Jan Latecki, Qi Tian

This stimulates a great research interest of considering similarity fusion in the framework of diffusion process (i. e., fusion with diffusion) for robust retrieval.

3D Shape Classification 3D Shape Retrieval +1

Auto-Encoder Guided GAN for Chinese Calligraphy Synthesis

no code implementations27 Jun 2017 Pengyuan Lyu, Xiang Bai, Cong Yao, Zhen Zhu, Tengteng Huang, Wenyu Liu

In this paper, we investigate the Chinese calligraphy synthesis problem: synthesizing Chinese calligraphy images with specified style from standard font(eg.

Image-to-Image Translation Translation

Deep Patch Learning for Weakly Supervised Object Classification and Discovery

1 code implementation6 May 2017 Peng Tang, Xinggang Wang, Zilong Huang, Xiang Bai, Wenyu Liu

Patch-level image representation is very important for object classification and detection, since it is robust to spatial transformation, scale variation, and cluttered background.

Classification General Classification +2

Integrating Scene Text and Visual Appearance for Fine-Grained Image Classification

no code implementations15 Apr 2017 Xiang Bai, Mingkun Yang, Pengyuan Lyu, Yongchao Xu, Jiebo Luo

Then, we combine the word embedding of the recognized words and the deep visual features into a single representation, which is optimized by a convolutional neural network for fine-grained image classification.

Classification Fine-Grained Image Classification +1

Multiple Instance Detection Network with Online Instance Classifier Refinement

3 code implementations CVPR 2017 Peng Tang, Xinggang Wang, Xiang Bai, Wenyu Liu

We propose a novel online instance classifier refinement algorithm to integrate MIL and the instance classifier refinement procedure into a single deep network, and train the network end-to-end with only image-level supervision, i. e., without object location information.

Multiple Instance Learning Object Recognition +1

Scalable Person Re-identification on Supervised Smoothed Manifold

no code implementations CVPR 2017 Song Bai, Xiang Bai, Qi Tian

Most existing person re-identification algorithms either extract robust visual features or learn discriminative metrics for person images.

Person Re-Identification

Detecting Oriented Text in Natural Images by Linking Segments

6 code implementations CVPR 2017 Baoguang Shi, Xiang Bai, Serge Belongie

It achieves an f-measure of 75. 0% on the standard ICDAR 2015 Incidental (Challenge 4) benchmark, outperforming the previous best by a large margin.

Curved Text Detection

Anisotropic-Scale Junction Detection and Matching for Indoor Images

no code implementations16 Mar 2017 Nan Xue, Gui-Song Xia, Xiang Bai, Liangpei Zhang, Weiming Shen

This paper presents a novel approach to junction detection and characterization that exploits the locally anisotropic geometries of a junction and estimates the scales of these geometries using an \emph{a contrario} model.

Junction Detection

Image Stitching by Line-guided Local Warping with Global Similarity Constraint

no code implementations25 Feb 2017 Tian-Zhu Xiang, Gui-Song Xia, Xiang Bai, Liangpei Zhang

On one hand, the line features are integrated into a local warping model through a designed weight function.

Image Stitching

Texture Characterization by Using Shape Co-occurrence Patterns

no code implementations10 Feb 2017 Gui-Song Xia, Gang Liu, Xiang Bai, Liangpei Zhang

In contrast with existing works, the proposed method not only inherits the strong ability to depict geometrical aspects of textures and the high robustness to variations of imaging conditions from the shape-based method, but also provides a flexible way to consider shape relationships and to compute high-order statistics on the tree.

Texture Classification

TextBoxes: A Fast Text Detector with a Single Deep Neural Network

4 code implementations21 Nov 2016 Minghui Liao, Baoguang Shi, Xiang Bai, Xinggang Wang, Wenyu Liu

This paper presents an end-to-end trainable fast scene text detector, named TextBoxes, which detects scene text with both high accuracy and efficiency in a single network forward pass, involving no post-process except for a standard non-maximum suppression.

Revisiting Multiple Instance Neural Networks

no code implementations8 Oct 2016 Xinggang Wang, Yongluan Yan, Peng Tang, Xiang Bai, Wenyu Liu

We propose a new multiple instance neural network to learn bag representations, which is different from the existing multiple instance neural networks that focus on estimating instance label.

Multiple Instance Learning

DeepSkeleton: Learning Multi-task Scale-associated Deep Side Outputs for Object Skeleton Extraction in Natural Images

1 code implementation13 Sep 2016 Wei Shen, Kai Zhao, Yuan Jiang, Yan Wang, Xiang Bai, Alan Yuille

By observing the relationship between the receptive field sizes of the different layers in the network and the skeleton scales they can capture, we introduce two scale-associated side outputs to each stage of the network.

Multi-Task Learning Object Detection +1

Deep FisherNet for Object Classification

no code implementations31 Jul 2016 Peng Tang, Xinggang Wang, Baoguang Shi, Xiang Bai, Wenyu Liu, Zhuowen Tu

Our proposed FisherNet combines convolutional neural network training and Fisher Vector encoding in a single end-to-end structure.

Classification General Classification +1

Scene Text Detection via Holistic, Multi-Channel Prediction

no code implementations29 Jun 2016 Cong Yao, Xiang Bai, Nong Sang, Xinyu Zhou, Shuchang Zhou, Zhimin Cao

Recently, scene text detection has become an active research topic in computer vision and document analysis, because of its great importance and significant challenge.

Scene Text Detection Semantic Segmentation

Multidimensional Scaling on Multiple Input Distance Matrices

no code implementations1 May 2016 Song Bai, Xiang Bai, Longin Jan Latecki, Qi Tian

How to do multidimensional scaling on multiple input distance matrices is still unsolved to our best knowledge.

Object Skeleton Extraction in Natural Images by Fusing Scale-associated Deep Side Outputs

no code implementations CVPR 2016 Wei Shen, Kai Zhao, Yuan Jiang, Yan Wang, Zhijiang Zhang, Xiang Bai

Object skeleton is a useful cue for object detection, complementary to the object contour, as it provides a structural representation to describe the relationship among object parts.

Object Detection

Relaxed Multiple-Instance SVM with Application to Object Discovery

no code implementations ICCV 2015 Xinggang Wang, Zhuotun Zhu, Cong Yao, Xiang Bai

Multiple-instance learning (MIL) has served as an important tool for a wide range of vision applications, for instance, image classification, object detection, and visual tracking.

General Classification Image Classification +4

An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

76 code implementations21 Jul 2015 Baoguang Shi, Xiang Bai, Cong Yao

In this paper, we investigate the problem of scene text recognition, which is among the most important and challenging tasks in image-based sequence recognition.

Optical Character Recognition Scene Text Recognition

Symmetry-Based Text Line Detection in Natural Scenes

no code implementations CVPR 2015 Zheng Zhang, Wei Shen, Cong Yao, Xiang Bai

Recently, a variety of real-world applications have triggered huge demand for techniques that can extract textual information from natural scenes.

Line Detection Scene Text Detection

Automatic Script Identification in the Wild

no code implementations12 May 2015 Baoguang Shi, Cong Yao, Chengquan Zhang, Xiaowei Guo, Feiyue Huang, Xiang Bai

With the rapid increase of transnational communication and cooperation, people frequently encounter multilingual scenarios in various situations.

General Classification Image Classification

Deep Learning Representation using Autoencoder for 3D Shape Retrieval

no code implementations25 Sep 2014 Zhuotun Zhu, Xinggang Wang, Song Bai, Cong Yao, Xiang Bai

By combing the global deep learning representation and the local descriptor representation, our method can obtain the state-of-the-art performance on 3D shape retrieval benchmarks.

3D Shape Classification 3D Shape Recognition +3

Deep Regression for Face Alignment

no code implementations18 Sep 2014 Baoguang Shi, Xiang Bai, Wenyu Liu, Jingdong Wang

In this paper, we present a deep regression approach for face alignment.

Face Alignment

Strokelets: A Learned Multi-Scale Representation for Scene Text Recognition

no code implementations CVPR 2014 Cong Yao, Xiang Bai, Baoguang Shi, Wenyu Liu

Driven by the wide range of applications, scene text detection and recognition have become active research topics in computer vision.

Scene Text Detection Scene Text Recognition

Fusion with Diffusion for Robust Visual Tracking

no code implementations NeurIPS 2012 Yu Zhou, Xiang Bai, Wenyu Liu, Longin J. Latecki

A key feature of our approach is that the time complexity of the dif-fusion on the TPG is the same as the diffusion process on each of the original graphs, Moreover, it is not necessary to explicitly construct the TPG in our frame-work.

Frame Visual Tracking

Multiscale Random Fields with Application to Contour Grouping

no code implementations NeurIPS 2008 Longin J. Latecki, Chengen Lu, Marc Sobel, Xiang Bai

It is based on a new operator, called append, that combines sets of random variables (RVs) to single RVs.

