Search Results for author: Zhanghui Kuang

Found 17 papers, 14 papers with code

Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition

1 code implementation CVPR 2018 Shuyang Sun, Zhanghui Kuang, Wanli Ouyang, Lu Sheng, Wei zhang

In this study, we introduce a novel compact motion representation for video action recognition, named Optical Flow guided Feature (OFF), which enables the network to distill temporal information through a fast and robust approach.

Action Recognition In Videos Optical Flow Estimation +1

Temporal Sequence Distillation: Towards Few-Frame Action Recognition in Videos

no code implementations15 Aug 2018 Zhaoyang Zhang, Zhanghui Kuang, Ping Luo, Litong Feng, Wei zhang

Secondly, TSD significantly reduces the computations to run video action recognition with compressed frames on the cloud, while maintaining high recognition accuracies.

Action Recognition In Videos Temporal Action Localization

Learning Efficient Detector with Semi-supervised Adaptive Distillation

1 code implementation2 Jan 2019 Shitao Tang, Litong Feng, Wenqi Shao, Zhanghui Kuang, Wei zhang, Yimin Chen

ADL enlarges the distillation loss for hard-to-learn and hard-to-mimic samples and reduces distillation loss for the dominant easy samples, enabling distillation to work on the single-stage detector first time, even if the student and the teacher are identical.

Image Classification Knowledge Distillation +1

Data-Driven Neuron Allocation for Scale Aggregation Networks

1 code implementation CVPR 2019 Yi Li, Zhanghui Kuang, Yimin Chen, Wayne Zhang

The most informative output neurons in each block are preserved while others are discarded, and thus neurons for multiple scales are competitively and adaptively allocated.

Image Classification object-detection +1

Fashion Retrieval via Graph Reasoning Networks on a Similarity Pyramid

no code implementations ICCV 2019 Zhanghui Kuang, Yiming Gao, Guanbin Li, Ping Luo, Yimin Chen, Liang Lin, Wayne Zhang

To address this issue, we propose a novel Graph Reasoning Network (GRNet) on a Similarity Pyramid, which learns similarities between a query and a gallery cloth by using both global and local representations in multiple scales.

Image Retrieval Retrieval

RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition

4 code implementations ECCV 2020 Xiaoyu Yue, Zhanghui Kuang, Chenhao Lin, Hongbin Sun, Wayne Zhang

Theoretically, our proposed method, dubbed \emph{RobustScanner}, decodes individual characters with dynamic ratio between context and positional clues, and utilizes more positional ones when the decoding sequences with scarce context, and thus is robust and practical.

Irregular Text Recognition Position +1

Context-Aware RCNN: A Baseline for Action Detection in Videos

3 code implementations ECCV 2020 Jianchao Wu, Zhanghui Kuang, Li-Min Wang, Wayne Zhang, Gangshan Wu

In this work, we first empirically find the recognition accuracy is highly correlated with the bounding box size of an actor, and thus higher resolution of actors contributes to better performance.

Action Detection Action Recognition

Spatial Dual-Modality Graph Reasoning for Key Information Extraction

2 code implementations26 Mar 2021 Hongbin Sun, Zhanghui Kuang, Xiaoyu Yue, Chenhao Lin, Wayne Zhang

In order to roundly evaluate our proposed method as well as boost the future research, we release a new dataset named WildReceipt, which is collected and annotated tailored for the evaluation of key information extraction from document images of unseen templates in the wild.

Key Information Extraction Template Matching

Fourier Contour Embedding for Arbitrary-Shaped Text Detection

8 code implementations CVPR 2021 2021 Yiqin Zhu, Jianyong Chen, Lingyu Liang, Zhanghui Kuang, Lianwen Jin, Wayne Zhang

One of the main challenges for arbitrary-shaped text detection is to design a good text instance representation that allows networks to learn diverse text geometry variances.

Scene Text Detection Text Detection

Vision Transformer with Progressive Sampling

1 code implementation ICCV 2021 Xiaoyu Yue, Shuyang Sun, Zhanghui Kuang, Meng Wei, Philip Torr, Wayne Zhang, Dahua Lin

As a typical example, the Vision Transformer (ViT) directly applies a pure transformer architecture on image classification, by simply splitting images into tokens with a fixed length, and employing transformers to learn relations between these tokens.

Image Classification

MMOCR: A Comprehensive Toolbox for Text Detection, Recognition and Understanding

2 code implementations14 Aug 2021 Zhanghui Kuang, Hongbin Sun, Zhizhong Li, Xiaoyu Yue, Tsui Hin Lin, Jianyong Chen, Huaqiang Wei, Yiqin Zhu, Tong Gao, Wenwei Zhang, Kai Chen, Wayne Zhang, Dahua Lin

We present MMOCR-an open-source toolbox which provides a comprehensive pipeline for text detection and recognition, as well as their downstream tasks such as named entity recognition and key information extraction.

Key Information Extraction named-entity-recognition +4

Pseudo-mask Matters in Weakly-supervised Semantic Segmentation

2 code implementations ICCV 2021 Yi Li, Zhanghui Kuang, Liyang Liu, Yimin Chen, Wayne Zhang

For these matters, we propose the following designs to push the performance to new state-of-art: (i) Coefficient of Variation Smoothing to smooth the CAMs adaptively; (ii) Proportional Pseudo-mask Generation to project the expanded CAMs to pseudo-mask based on a new metric indicating the importance of each class on each location, instead of the scores trained from binary classifiers.

Segmentation Weakly supervised Semantic Segmentation +1

Cannot find the paper you are looking for? You can Submit a new open access paper.