Search Results for author: Chunhua Shen

Found 367 papers, 157 papers with code

Instance-Aware Embedding for Point Cloud Instance Segmentation

no code implementations ECCV 2020 Tong He, Yifan Liu, Chunhua Shen, Xinlong Wang, Changming Sun

However, these methods are unaware of the instance context and fail to realize the boundary and geometric information of an instance, which are critical to separate adjacent objects.

Instance Segmentation Semantic Segmentation

3D Human Reconstruction in the Wild with Synthetic Data Using Generative Models

no code implementations17 Mar 2024 Yongtao Ge, Wenjia Wang, Yongfan Chen, Hao Chen, Chunhua Shen

In this work, we show that synthetic data created by generative models is complementary to computer graphics (CG) rendered data for achieving remarkable generalization performance on diverse real-world scenes for 3D human pose and shape estimation (HPS).

Diffusion Models Trained with Large Data Are Transferable Visual Models

no code implementations10 Mar 2024 Guangkai Xu, Yongtao Ge, MingYu Liu, Chengxiang Fan, Kangyang Xie, Zhiyue Zhao, Hao Chen, Chunhua Shen

We show that, simply initializing image understanding models using a pre-trained UNet (or transformer) of diffusion models, it is possible to achieve remarkable transferable performance on fundamental vision perception tasks using a moderate amount of target data (even synthetic data only), including monocular depth, surface normal, image segmentation, matting, human pose estimation, among virtually many others.

Image Matting Image Segmentation +2

VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

1 code implementation1 Mar 2024 Xiangxiang Chu, Jianlin Su, Bo Zhang, Chunhua Shen

Large language models are built on top of a transformer-based architecture to process textual inputs.

Image Classification Image Generation +2

MobileVLM V2: Faster and Stronger Baseline for Vision Language Model

1 code implementation6 Feb 2024 Xiangxiang Chu, Limeng Qiao, Xinyu Zhang, Shuang Xu, Fei Wei, Yang Yang, Xiaofei Sun, Yiming Hu, Xinyang Lin, Bo Zhang, Chunhua Shen

We introduce MobileVLM V2, a family of significantly improved vision language models upon MobileVLM, which proves that a delicate orchestration of novel architectural design, an improved training scheme tailored for mobile VLMs, and rich high-quality dataset curation can substantially benefit VLMs' performance.

AutoML Language Modelling

Paragraph-to-Image Generation with Information-Enriched Diffusion Model

1 code implementation24 Nov 2023 Weijia Wu, Zhuang Li, Yefei He, Mike Zheng Shou, Chunhua Shen, Lele Cheng, Yan Li, Tingting Gao, Di Zhang, Zhongyuan Wang

In this paper, we introduce an information-enriched diffusion model for paragraph-to-image generation task, termed ParaDiffusion, which delves into the transference of the extensive semantic comprehension capabilities of large language models to the task of image generation.

Image Generation Language Modelling +1

DA-STC: Domain Adaptive Video Semantic Segmentation via Spatio-Temporal Consistency

1 code implementation22 Nov 2023 Zhe Zhang, Gaochang Wu, Jing Zhang, Chunhua Shen, DaCheng Tao, Tianyou Chai

To solve the challenge, we propose a novel DA-STC method for domain adaptive video semantic segmentation, which incorporates a bidirectional multi-level spatio-temporal fusion module and a category-aware spatio-temporal feature alignment module to facilitate consistent learning for domain-invariant features.

Representation Learning Segmentation +2

AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort

no code implementations19 Nov 2023 Wen Wang, Canyu Zhao, Hao Chen, Zhekai Chen, Kecheng Zheng, Chunhua Shen

We empirically find that sparse control conditions, such as bounding boxes, are suitable for layout planning, while dense control conditions, e. g., sketches and keypoints, are suitable for generating high-quality image content.

Image Generation Story Visualization

Object-aware Inversion and Reassembly for Image Editing

no code implementations18 Oct 2023 Zhen Yang, Ganggui Ding, Wen Wang, Hao Chen, Bohan Zhuang, Chunhua Shen

Subsequently, we propose an additional reassembly step to seamlessly integrate the respective editing results and the non-editing region to obtain the final edited image.

Benchmarking Denoising +1

RGM: A Robust Generalizable Matching Model

1 code implementation18 Oct 2023 Songyan Zhang, Xinyu Sun, Hao Chen, Bo Li, Chunhua Shen

Finding corresponding pixels within a pair of images is a fundamental computer vision task with various applications.

Optical Flow Estimation

De novo protein design using geometric vector field networks

no code implementations18 Oct 2023 Weian Mao, Muzhi Zhu, Zheng Sun, Shuaike Shen, Lin Yuanbo Wu, Hao Chen, Chunhua Shen

Most prior encoders rely on atom-wise features, such as angles and distances between atoms, which are not available in this context.

Protein Design

Self-Supervised 3D Scene Flow Estimation and Motion Prediction using Local Rigidity Prior

no code implementations17 Oct 2023 Ruibo Li, Chi Zhang, Zhe Wang, Chunhua Shen, Guosheng Lin

By rigidly aligning each region with its potential counterpart in the target point cloud, we obtain a region-specific rigid transformation to generate its pseudo flow labels.

Motion Estimation motion prediction +2

PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm

2 code implementations12 Oct 2023 Haoyi Zhu, Honghui Yang, Xiaoyang Wu, Di Huang, Sha Zhang, Xianglong He, Hengshuang Zhao, Chunhua Shen, Yu Qiao, Tong He, Wanli Ouyang

In this paper, we introduce a novel universal 3D pre-training framework designed to facilitate the acquisition of efficient 3D representation, thereby establishing a pathway to 3D foundational models.

Ranked #2 on Semantic Segmentation on ScanNet (using extra training data)

3D Object Detection 3D Reconstruction +5

Robust Geometry-Preserving Depth Estimation Using Differentiable Rendering

no code implementations ICCV 2023 Chi Zhang, Wei Yin, Gang Yu, Zhibin Wang, Tao Chen, Bin Fu, Joey Tianyi Zhou, Chunhua Shen

In this paper, we propose a learning framework that trains models to predict geometry-preserving depth without requiring extra data or annotations.

Monocular Depth Estimation

Target before Shooting: Accurate Anomaly Detection and Localization under One Millisecond via Cascade Patch Retrieval

1 code implementation13 Aug 2023 Hanxi Li, Jianfei Hu, Bo Li, Hao Chen, Yongbin Zheng, Chunhua Shen

In this framework, the anomaly detection problem is solved via a cascade patch retrieval procedure that retrieves the nearest neighbors for each test image patch in a coarse-to-fine fashion.

Supervised Anomaly Detection

SegPrompt: Boosting Open-world Segmentation via Category-level Prompt Learning

1 code implementation ICCV 2023 Muzhi Zhu, Hengtao Li, Hao Chen, Chengxiang Fan, Weian Mao, Chenchen Jing, Yifan Liu, Chunhua Shen

In this work, we propose a novel training mechanism termed SegPrompt that uses category information to improve the model's class-agnostic segmentation ability for both known and unknown categories.

Open-World Instance Segmentation Segmentation +1

DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion Models

1 code implementation NeurIPS 2023 Weijia Wu, Yuzhong Zhao, Hao Chen, YuChao Gu, Rui Zhao, Yefei He, Hong Zhou, Mike Zheng Shou, Chunhua Shen

To showcase the power of the proposed approach, we generate datasets with rich dense pixel-wise labels for a wide range of downstream tasks, including semantic segmentation, instance segmentation, and depth estimation.

Depth Estimation Domain Generalization +5

Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image

1 code implementation ICCV 2023 Wei Yin, Chi Zhang, Hao Chen, Zhipeng Cai, Gang Yu, Kaixuan Wang, Xiaozhi Chen, Chunhua Shen

State-of-the-art (SOTA) monocular metric depth estimation methods can only handle a single camera model and are unable to perform mixed-data training due to the metric ambiguity.

Ranked #15 on Monocular Depth Estimation on NYU-Depth V2 (using extra training data)

Image Reconstruction Monocular Depth Estimation +1

Generative Prompt Model for Weakly Supervised Object Localization

1 code implementation ICCV 2023 Yuzhong Zhao, Qixiang Ye, Weijia Wu, Chunhua Shen, Fang Wan

During training, GenPromp converts image category labels to learnable prompt embeddings which are fed to a generative model to conditionally recover the input image with noise and learn representative embeddings.

 Ranked #1 on Weakly-Supervised Object Localization on CUB-200-2011 (Top-1 Localization Accuracy metric, using extra training data)

Image Denoising Language Modelling +2

SegViTv2: Exploring Efficient and Continual Semantic Segmentation with Plain Vision Transformers

1 code implementation9 Jun 2023 BoWen Zhang, Liyang Liu, Minh Hieu Phan, Zhi Tian, Chunhua Shen, Yifan Liu

This paper investigates the capability of plain Vision Transformers (ViTs) for semantic segmentation using the encoder-decoder framework and introduces \textbf{SegViTv2}.

Continual Learning Continual Semantic Segmentation +2

A Geometric Perspective on Diffusion Models

no code implementations31 May 2023 Defang Chen, Zhenyu Zhou, Jian-Ping Mei, Chunhua Shen, Chun Chen, Can Wang

Recent years have witnessed significant progress in developing effective training and fast sampling techniques for diffusion models.

Denoising

Learning Conditional Attributes for Compositional Zero-Shot Learning

1 code implementation CVPR 2023 Qingsheng Wang, Lingqiao Liu, Chenchen Jing, Hao Chen, Guoqiang Liang, Peng Wang, Chunhua Shen

Compositional Zero-Shot Learning (CZSL) aims to train models to recognize novel compositional concepts based on learned concepts such as attribute-object combinations.

Attribute Compositional Zero-Shot Learning

LoRAPrune: Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning

no code implementations28 May 2023 Mingyang Zhang, Hao Chen, Chunhua Shen, Zhen Yang, Linlin Ou, Xinyi Yu, Bohan Zhuang

This is due to their utilization of unstructured pruning on LPMs, impeding the merging of LoRA weights, or their dependence on the gradients of pre-trained weights to guide pruning, which can impose significant memory overhead.

Model Compression Network Pruning

Matcher: Segment Anything with One Shot Using All-Purpose Feature Matching

1 code implementation22 May 2023 Yang Liu, Muzhi Zhu, Hengtao Li, Hao Chen, Xinlong Wang, Chunhua Shen

In this work, we present Matcher, a novel perception paradigm that utilizes off-the-shelf vision foundation models to address various perception tasks.

Segmentation Semantic Segmentation

SegGPT: Segmenting Everything In Context

1 code implementation6 Apr 2023 Xinlong Wang, Xiaosong Zhang, Yue Cao, Wen Wang, Chunhua Shen, Tiejun Huang

We unify various segmentation tasks into a generalist in-context learning framework that accommodates different kinds of segmentation data by transforming them into the same format of images.

 Ranked #1 on Few-Shot Semantic Segmentation on PASCAL-5i (5-Shot) (using extra training data)

Few-Shot Semantic Segmentation In-Context Learning +5

Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models

1 code implementation30 Mar 2023 Wen Wang, Yan Jiang, Kangyang Xie, Zide Liu, Hao Chen, Yue Cao, Xinlong Wang, Chunhua Shen

Our vid2vid-zero leverages off-the-shelf image diffusion models, and doesn't require training on any video.

Image Generation Video Alignment +1

Zolly: Zoom Focal Length Correctly for Perspective-Distorted Human Mesh Reconstruction

1 code implementation ICCV 2023 Wenjia Wang, Yongtao Ge, Haiyi Mei, Zhongang Cai, Qingping Sun, Yanjun Wang, Chunhua Shen, Lei Yang, Taku Komura

As it is hard to calibrate single-view RGB images in the wild, existing 3D human mesh reconstruction (3DHMR) methods either use a constant large focal length or estimate one based on the background environment context, which can not tackle the problem of the torso, limb, hand or face distortion caused by perspective camera projection when the camera is close to the human body.

3D Human Pose Estimation 3D Reconstruction

Background Matters: Enhancing Out-of-distribution Detection with Domain Features

no code implementations15 Mar 2023 Choubo Ding, Guansong Pang, Chunhua Shen

To this end, we propose a novel generic framework that can learn the domain features from the ID training samples by a dense prediction approach, with which different existing semantic-feature-based OOD detection methods can be seamlessly combined to jointly learn the in-distribution features from both the semantic and domain dimensions.

Object Recognition Out-of-Distribution Detection

Traffic Scene Parsing through the TSP6K Dataset

no code implementations6 Mar 2023 Peng-Tao Jiang, YuQi Yang, Yang Cao, Qibin Hou, Ming-Ming Cheng, Chunhua Shen

Traffic scene parsing is one of the most important tasks to achieve intelligent cities.

Scene Parsing

A Survey on Efficient Training of Transformers

no code implementations2 Feb 2023 Bohan Zhuang, Jing Liu, Zizheng Pan, Haoyu He, Yuetian Weng, Chunhua Shen

Recent advances in Transformers have come with a huge requirement on computing resources, highlighting the importance of developing efficient training techniques to make Transformer training faster, at lower cost, and to higher accuracy by the efficient use of computation and memory resources.

SPTS v2: Single-Point Scene Text Spotting

3 code implementations4 Jan 2023 Yuliang Liu, Jiaxin Zhang, Dezhi Peng, Mingxin Huang, Xinyu Wang, Jingqun Tang, Can Huang, Dahua Lin, Chunhua Shen, Xiang Bai, Lianwen Jin

Within the context of our SPTS v2 framework, our experiments suggest a potential preference for single-point representation in scene text spotting when compared to other representations.

Text Detection Text Spotting

SegGPT: Towards Segmenting Everything in Context

no code implementations ICCV 2023 Xinlong Wang, Xiaosong Zhang, Yue Cao, Wen Wang, Chunhua Shen, Tiejun Huang

We unify various segmentation tasks into a generalist in-context learning framework that accommodates different kinds of segmentation data by transforming them into the same format of images.

Few-Shot Semantic Segmentation In-Context Learning +4

Images Speak in Images: A Generalist Painter for In-Context Visual Learning

1 code implementation CVPR 2023 Xinlong Wang, Wen Wang, Yue Cao, Chunhua Shen, Tiejun Huang

In this work, we present Painter, a generalist model which addresses these obstacles with an "image"-centric solution, that is, to redefine the output of core vision tasks as images, and specify task prompts as also images.

In-Context Learning Keypoint Detection +2

FoPro: Few-Shot Guided Robust Webly-Supervised Prototypical Learning

1 code implementation1 Dec 2022 Yulei Qin, Xingyu Chen, Chao Chen, Yunhang Shen, Bo Ren, Yun Gu, Jie Yang, Chunhua Shen

Most existing methods focus on learning noise-robust models from web images while neglecting the performance drop caused by the differences between web domain and real-world domain.

Contrastive Learning Representation Learning

Learning from partially labeled data for multi-organ and tumor segmentation

1 code implementation13 Nov 2022 Yutong Xie, Jianpeng Zhang, Yong Xia, Chunhua Shen

To address this, we propose a Transformer based dynamic on-demand network (TransDoDNet) that learns to segment organs and tumors on multiple partially labeled datasets.

Image Segmentation Medical Image Segmentation +4

Adv-Attribute: Inconspicuous and Transferable Adversarial Attack on Face Recognition

no code implementations13 Oct 2022 Shuai Jia, Bangjie Yin, Taiping Yao, Shouhong Ding, Chunhua Shen, Xiaokang Yang, Chao Ma

For face recognition attacks, existing methods typically generate the l_p-norm perturbations on pixels, however, resulting in low attack transferability and high vulnerability to denoising defense models.

Adversarial Attack Attribute +2

Text-Adaptive Multiple Visual Prototype Matching for Video-Text Retrieval

no code implementations27 Sep 2022 Chengzhi Lin, AnCong Wu, Junwei Liang, Jun Zhang, Wenhang Ge, Wei-Shi Zheng, Chunhua Shen

To address this problem, we propose a Text-Adaptive Multiple Visual Prototype Matching model, which automatically captures multiple prototypes to describe a video by adaptive aggregation of video token features.

Cross-Modal Retrieval Retrieval +2

Multi-dataset Training of Transformers for Robust Action Recognition

1 code implementation26 Sep 2022 Junwei Liang, Enwei Zhang, Jun Zhang, Chunhua Shen

We study the task of robust feature representations, aiming to generalize well on multiple datasets for action recognition.

Action Recognition Temporal Action Localization

Towards Accurate Reconstruction of 3D Scene Shape from A Single Monocular Image

1 code implementation28 Aug 2022 Wei Yin, Jianming Zhang, Oliver Wang, Simon Niklaus, Simon Chen, Yifan Liu, Chunhua Shen

To do so, we propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image, and then exploits 3D point cloud data to predict the depth shift and the camera's focal length that allow us to recover 3D scene shapes.

Depth Estimation Depth Prediction

Towards Domain-agnostic Depth Completion

1 code implementation29 Jul 2022 Wei Yin, Jianming Zhang, Oliver Wang, Simon Niklaus, Simon Chen, Chunhua Shen

Our method leverages a data driven prior in the form of a single image depth prediction network trained on large-scale datasets, the output of which is used as an input to our model.

Depth Completion Depth Estimation +2

Real-time End-to-End Video Text Spotter with Contrastive Representation Learning

1 code implementation18 Jul 2022 Wejia Wu, Zhuang Li, Jiahong Li, Chunhua Shen, Hong Zhou, Size Li, Zhongyuan Wang, Ping Luo

Our contributions are three-fold: 1) CoText simultaneously address the three tasks (e. g., text detection, tracking, recognition) in a real-time end-to-end trainable framework.

Contrastive Learning Representation Learning +2

Efficient Decoder-free Object Detection with Transformers

2 code implementations14 Jun 2022 Peixian Chen, Mengdan Zhang, Yunhang Shen, Kekai Sheng, Yuting Gao, Xing Sun, Ke Li, Chunhua Shen

A natural usage of ViTs in detection is to replace the CNN-based backbone with a transformer-based backbone, which is straightforward and effective, with the price of bringing considerable computation burden for inference.

Object Object Detection

Fully Convolutional One-Stage 3D Object Detection on LiDAR Range Images

no code implementations27 May 2022 Zhi Tian, Xiangxiang Chu, Xiaoming Wang, Xiaolin Wei, Chunhua Shen

In this work, we tackle this challenging issue with a novel range view projection mechanism, and for the first time demonstrate the benefits of fusing multi-frame point clouds for a range-view based detector.

3D Object Detection Autonomous Driving +2

Super Vision Transformer

1 code implementation23 May 2022 Mingbao Lin, Mengzhao Chen, Yuxin Zhang, Chunhua Shen, Rongrong Ji, Liujuan Cao

Experimental results on ImageNet demonstrate that our SuperViT can considerably reduce the computational costs of ViT models with even performance increase.

PointInst3D: Segmenting 3D Instances by Points

no code implementations25 Apr 2022 Tong He, Wei Yin, Chunhua Shen, Anton Van Den Hengel

The current state-of-the-art methods in 3D instance segmentation typically involve a clustering step, despite the tendency towards heuristics, greedy algorithms, and a lack of robustness to the changes in data statistics.

3D Instance Segmentation Clustering +2

TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation

3 code implementations CVPR 2022 Wenqiang Zhang, Zilong Huang, Guozhong Luo, Tao Chen, Xinggang Wang, Wenyu Liu, Gang Yu, Chunhua Shen

Although vision transformers (ViTs) have achieved great success in computer vision, the heavy computational cost hampers their applications to dense prediction tasks such as semantic segmentation on mobile devices.

Segmentation Semantic Segmentation

Improving Monocular Visual Odometry Using Learned Depth

no code implementations4 Apr 2022 Libo Sun, Wei Yin, Enze Xie, Zhengrong Li, Changming Sun, Chunhua Shen

The core of our framework is a monocular depth estimation module with a strong generalization capability for diverse scenes.

Monocular Depth Estimation Monocular Visual Odometry

Catching Both Gray and Black Swans: Open-set Supervised Anomaly Detection

1 code implementation CVPR 2022 Choubo Ding, Guansong Pang, Chunhua Shen

Despite most existing anomaly detection studies assume the availability of normal training samples only, a few labeled anomaly examples are often available in many real-world applications, such as defect samples identified during random quality inspection, lesion images confirmed by radiologists in daily medical screening, etc.

Ranked #4 on Supervised Anomaly Detection on MVTec AD (using extra training data)

Supervised Anomaly Detection

End-to-End Video Text Spotting with Transformer

1 code implementation20 Mar 2022 Weijia Wu, Yuanqiang Cai, Chunhua Shen, Debing Zhang, Ying Fu, Hong Zhou, Ping Luo

Recent video text spotting methods usually require the three-staged pipeline, i. e., detecting text in individual images, recognizing localized text, tracking text streams with post-processing to generate final results.

Text Detection Text Spotting

PointAttN: You Only Need Attention for Point Cloud Completion

1 code implementation16 Mar 2022 Jun Wang, Ying Cui, Dongyan Guo, Junxia Li, Qingshan Liu, Chunhua Shen

To solve the problems, we leverage the cross-attention and self-attention mechanisms to design novel neural network for processing point cloud in a per-point manner to eliminate kNNs.

Point Cloud Completion

Training Protocol Matters: Towards Accurate Scene Text Recognition via Training Protocol Searching

2 code implementations13 Mar 2022 Xiaojie Chu, Yongtao Wang, Chunhua Shen, Jingdong Chen, Wei Chu

The development of scene text recognition (STR) in the era of deep learning has been mainly focused on novel architectures of STR models.

Scene Text Recognition

FreeSOLO: Learning to Segment Objects without Annotations

1 code implementation CVPR 2022 Xinlong Wang, Zhiding Yu, Shalini De Mello, Jan Kautz, Anima Anandkumar, Chunhua Shen, Jose M. Alvarez

FreeSOLO further demonstrates superiority as a strong pre-training method, outperforming state-of-the-art self-supervised pre-training methods by +9. 8% AP when fine-tuning instance segmentation with only 5% COCO masks.

Instance Segmentation object-detection +4

The devil is in the labels: Semantic segmentation from sentences

no code implementations4 Feb 2022 Wei Yin, Yifan Liu, Chunhua Shen, Anton Van Den Hengel, Baichuan Sun

The resulting merged semantic segmentation dataset of over 2 Million images enables training a model that achieves performance equal to that of state-of-the-art supervised methods on 7 benchmark datasets, despite not using any images therefrom.

Instance Segmentation Monocular Depth Estimation +2

Towards 3D Scene Reconstruction from Locally Scale-Aligned Monocular Video Depth

no code implementations3 Feb 2022 Guangkai Xu, Wei Yin, Hao Chen, Chunhua Shen, Kai Cheng, Feng Wu, Feng Zhao

However, in some video-based scenarios such as video depth estimation and 3D scene reconstruction from a video, the unknown scale and shift residing in per-frame prediction may cause the depth inconsistency.

3D Scene Reconstruction Depth Completion +1

DENSE: Data-Free One-Shot Federated Learning

1 code implementation23 Dec 2021 Jie Zhang, Chen Chen, Bo Li, Lingjuan Lyu, Shuang Wu, Shouhong Ding, Chunhua Shen, Chao Wu

One-shot Federated Learning (FL) has recently emerged as a promising approach, which allows the central server to learn a model in a single communication round.

Federated Learning

SPTS: Single-Point Text Spotting

1 code implementation15 Dec 2021 Dezhi Peng, Xinyu Wang, Yuliang Liu, Jiaxin Zhang, Mingxin Huang, Songxuan Lai, Shenggao Zhu, Jing Li, Dahua Lin, Chunhua Shen, Xiang Bai, Lianwen Jin

For the first time, we demonstrate that training scene text spotting models can be achieved with an extremely low-cost annotation of a single-point for each instance.

Language Modelling Text Detection +1

NAS-FCOS: Efficient Search for Object Detection Architectures

1 code implementation24 Oct 2021 Ning Wang, Yang Gao, Hao Chen, Peng Wang, Zhi Tian, Chunhua Shen, Yanning Zhang

Neural Architecture Search (NAS) has shown great potential in effectively reducing manual effort in network design by automatically discovering optimal architectures.

Neural Architecture Search Object +2

TSGB: Target-Selective Gradient Backprop for Probing CNN Visual Saliency

1 code implementation11 Oct 2021 Lin Cheng, Pengfei Fang, Yanjie Liang, Liao Zhang, Chunhua Shen, Hanzi Wang

Inspired by those observations, we propose a novel visual saliency method, termed Target-Selective Gradient Backprop (TSGB), which leverages rectification operations to effectively emphasize target classes and further efficiently propagate the saliency to the image space, thereby generating target-selective and fine-grained saliency maps.

Meta Navigator: Search for a Good Adaptation Policy for Few-shot Learning

no code implementations ICCV 2021 Chi Zhang, Henghui Ding, Guosheng Lin, Ruibo Li, Changhu Wang, Chunhua Shen

Inspired by the recent success in Automated Machine Learning literature (AutoML), in this paper, we present Meta Navigator, a framework that attempts to solve the aforementioned limitation in few-shot learning by seeking a higher-level strategy and proffer to automate the selection from various few-shot learning designs.

AutoML Few-Shot Learning

Explainable Deep Few-shot Anomaly Detection with Deviation Networks

1 code implementation1 Aug 2021 Guansong Pang, Choubo Ding, Chunhua Shen, Anton Van Den Hengel

Here, we study the problem of few-shot anomaly detection, in which we aim at using a few labeled anomaly examples to train sample-efficient discriminative detection models.

Ranked #5 on Supervised Anomaly Detection on MVTec AD (using extra training data)

Multiple Instance Learning Supervised Anomaly Detection +1

Dynamic Neural Representational Decoders for High-Resolution Semantic Segmentation

1 code implementation NeurIPS 2021 BoWen Zhang, Yifan Liu, Zhi Tian, Chunhua Shen

This neural representation enables our decoder to leverage the smoothness prior in the semantic label space, and thus makes our decoder more efficient.

Segmentation Semantic Segmentation +1

Dynamic Convolution for 3D Point Cloud Instance Segmentation

1 code implementation18 Jul 2021 Tong He, Chunhua Shen, Anton Van Den Hengel

The proposed approach is proposal-free, and instead exploits a convolution process that adapts to the spatial and semantic characteristics of each instance.

Instance Segmentation Semantic Segmentation

SOLO: A Simple Framework for Instance Segmentation

no code implementations30 Jun 2021 Xinlong Wang, Rufeng Zhang, Chunhua Shen, Tao Kong, Lei LI

Besides instance segmentation, our method yields state-of-the-art results in object detection (from our mask byproduct) and panoptic segmentation.

Image Matting Instance Segmentation +4

Learning Spatial-Semantic Relationship for Facial Attribute Recognition With Limited Labeled Data

no code implementations CVPR 2021 Ying Shu, Yan Yan, Si Chen, Jing-Hao Xue, Chunhua Shen, Hanzi Wang

First, three auxiliary tasks, consisting of a Patch Rotation Task (PRT), a Patch Segmentation Task (PST), and a Patch Classification Task (PCT), are jointly developed to learn the spatial-semantic relationship from large-scale unlabeled facial data.

Attribute Facial Attribute Classification +1

Unsupervised Scale-consistent Depth Learning from Video

2 code implementations25 May 2021 Jia-Wang Bian, Huangying Zhan, Naiyan Wang, Zhichao Li, Le Zhang, Chunhua Shen, Ming-Ming Cheng, Ian Reid

We propose a monocular depth estimator SC-Depth, which requires only unlabelled videos for training and enables the scale-consistent prediction at inference time.

Monocular Depth Estimation Monocular Visual Odometry +1

ABCNet v2: Adaptive Bezier-Curve Network for Real-time End-to-end Text Spotting

1 code implementation8 May 2021 Yuliang Liu, Chunhua Shen, Lianwen Jin, Tong He, Peng Chen, Chongyu Liu, Hao Chen

Previous methods can be roughly categorized into two groups: character-based and segmentation-based, which often require character-level annotations and/or complex post-processing due to the unstructured output.

Text Spotting

PAN++: Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped Text

1 code implementation2 May 2021 Wenhai Wang, Enze Xie, Xiang Li, Xuebo Liu, Ding Liang, Zhibo Yang, Tong Lu, Chunhua Shen

By systematically comparing with existing scene text representations, we show that our kernel representation can not only describe arbitrarily-shaped text but also well distinguish adjacent text.

Scene Text Detection Text Detection +1

Twins: Revisiting the Design of Spatial Attention in Vision Transformers

8 code implementations NeurIPS 2021 Xiangxiang Chu, Zhi Tian, Yuqing Wang, Bo Zhang, Haibing Ren, Xiaolin Wei, Huaxia Xia, Chunhua Shen

Very recently, a variety of vision transformer architectures for dense prediction tasks have been proposed and they show that the design of spatial attention is critical to their success in these tasks.

Image Classification Semantic Segmentation

DisCo: Remedy Self-supervised Learning on Lightweight Models with Distilled Contrastive Learning

2 code implementations19 Apr 2021 Yuting Gao, Jia-Xin Zhuang, Shaohui Lin, Hao Cheng, Xing Sun, Ke Li, Chunhua Shen

Specifically, we find the final embedding obtained by the mainstream SSL methods contains the most fruitful information, and propose to distill the final embedding to maximally transmit a teacher's knowledge to a lightweight model by constraining the last embedding of the student to be consistent with that of the teacher.

Contrastive Learning Representation Learning +1

A Simple Baseline for Semi-supervised Semantic Segmentation with Strong Data Augmentation

1 code implementation ICCV 2021 Jianlong Yuan, Yifan Liu, Chunhua Shen, Zhibin Wang, Hao Li

Previous works [3, 27] fail to employ strong augmentation in pseudo label learning efficiently, as the large distribution change caused by strong augmentation harms the batch normalisation statistics.

Data Augmentation Image Classification +3

An Adversarial Human Pose Estimation Network Injected with Graph Structure

no code implementations29 Mar 2021 Lei Tian, Guoqiang Liang, Peng Wang, Chunhua Shen

Because of the invisible human keypoints in images caused by illumination, occlusion and overlap, it is likely to produce unreasonable human pose prediction for most of the current human pose estimation methods.

Generative Adversarial Network Pose Estimation +1

Generic Perceptual Loss for Modeling Structured Output Dependencies

no code implementations CVPR 2021 Yifan Liu, Hao Chen, Yu Chen, Wei Yin, Chunhua Shen

We hope that this simple, extended perceptual loss may serve as a generic structured-output loss that is applicable to most structured output learning tasks.

Depth Estimation Image Generation +4

FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation

2 code implementations8 Mar 2021 Lingtong Kong, Chunhua Shen, Jie Yang

Experiments on both synthetic Sintel data and real-world KITTI datasets demonstrate the effectiveness of the proposed approach, which needs only 1/10 computation of comparable networks to achieve on par accuracy.

Optical Flow Estimation

CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image Segmentation

1 code implementation4 Mar 2021 Yutong Xie, Jianpeng Zhang, Chunhua Shen, Yong Xia

Convolutional neural networks (CNNs) have been the de facto standard for nowadays 3D medical image segmentation.

Image Segmentation Inductive Bias +4

Instance and Panoptic Segmentation Using Conditional Convolutions

no code implementations5 Feb 2021 Zhi Tian, BoWen Zhang, Hao Chen, Chunhua Shen

In the literature, top-performing instance segmentation methods typically follow the paradigm of Mask R-CNN and rely on ROI operations (typically ROIAlign) to attend to each instance.

Instance Segmentation Panoptic Segmentation +1

Object Detection Made Simpler by Eliminating Heuristic NMS

no code implementations28 Jan 2021 Qiang Zhou, Chaohui Yu, Chunhua Shen, Zhibin Wang, Hao Li

On the COCO dataset, our simple design achieves superior performance compared to both the FCOS baseline detector with NMS post-processing and the recent end-to-end NMS-free detectors.

Object object-detection +1

Multi-intersection Traffic Optimisation: A Benchmark Dataset and a Strong Baseline

no code implementations24 Jan 2021 Hu Wang, Hao Chen, Qi Wu, Congbo Ma, Yidong Li, Chunhua Shen

To address these issues, in this work we carefully design our settings and propose a new dataset including both synthetic and real traffic data in more complex scenarios.

Single-path Bit Sharing for Automatic Loss-aware Model Compression

no code implementations13 Jan 2021 Jing Liu, Bohan Zhuang, Peng Chen, Chunhua Shen, Jianfei Cai, Mingkui Tan

By jointly training the binary gates in conjunction with network parameters, the compression configurations of each layer can be automatically determined.

Model Compression Network Pruning +1

Occluded Person Re-Identification With Single-Scale Global Representations

no code implementations ICCV 2021 Cheng Yan, Guansong Pang, Jile Jiao, Xiao Bai, Xuetao Feng, Chunhua Shen

However, real-world ReID applications typically have highly diverse occlusions and involve a hybrid of occluded and non-occluded pedestrians.

Graph Matching Person Re-Identification +1

BV-Person: A Large-Scale Dataset for Bird-View Person Re-Identification

no code implementations ICCV 2021 Cheng Yan, Guansong Pang, Lei Wang, Jile Jiao, Xuetao Feng, Chunhua Shen, Jingjing Li

In this work we introduce a new ReID task, bird-view person ReID, which aims at searching for a person in a gallery of horizontal-view images with the query images taken from a bird's-eye view, i. e., an elevated view of an object from above.

Person Re-Identification

Memory-Efficient Hierarchical Neural Architecture Search for Image Restoration

1 code implementation24 Dec 2020 Haokui Zhang, Ying Li, Hao Chen, Chengrong Gong, Zongwen Bai, Chunhua Shen

For the inner search space, we propose a layer-wise architecture sharing strategy (LWAS), resulting in more flexible architectures and better performance.

Image Denoising Image Restoration +2

Learning to Recover 3D Scene Shape from a Single Image

1 code implementation CVPR 2021 Wei Yin, Jianming Zhang, Oliver Wang, Simon Niklaus, Long Mai, Simon Chen, Chunhua Shen

Despite significant progress in monocular depth estimation in the wild, recent state-of-the-art methods cannot be used to recover accurate 3D scene shape due to an unknown depth shift induced by shift-invariant reconstruction losses used in mixed-data depth prediction training, and possible unknown camera focal length.

 Ranked #1 on Indoor Monocular Depth Estimation on DIODE (using extra training data)

3D Scene Reconstruction Depth Prediction +3

Hyperspectral Classification Based on Lightweight 3-D-CNN With Transfer Learning

2 code implementations7 Dec 2020 Haokui Zhang, Ying Li, Yenan Jiang, Peng Wang, Qiang Shen, Chunhua Shen

In contrast to previous approaches, we do not impose restrictions over the source data sets, in which they do not have to be collected by the same sensors as the target data sets.

Classification General Classification +1

End-to-End Video Instance Segmentation with Transformers

2 code implementations CVPR 2021 Yuqing Wang, Zhaoliang Xu, Xinlong Wang, Chunhua Shen, Baoshan Cheng, Hao Shen, Huaxia Xia

Here, we propose a new video instance segmentation framework built upon Transformers, termed VisTR, which views the VIS task as a direct end-to-end parallel sequence decoding/prediction problem.

Instance Segmentation Segmentation +3

Fully Quantized Image Super-Resolution Networks

1 code implementation29 Nov 2020 Hu Wang, Peng Chen, Bohan Zhuang, Chunhua Shen

With the rising popularity of intelligent mobile devices, it is of great practical significance to develop accurate, realtime and energy-efficient image Super-Resolution (SR) inference methods.

Image Super-Resolution Quantization

Learning Affinity-Aware Upsampling for Deep Image Matting

1 code implementation CVPR 2021 Yutong Dai, Hao Lu, Chunhua Shen

By looking at existing upsampling operators from a unified mathematical perspective, we generalize them into a second-order form and introduce Affinity-Aware Upsampling (A2U) where upsampling kernels are generated using a light-weight lowrank bilinear model and are conditioned on second-order features.

Image Matting Image Reconstruction

Channel-wise Knowledge Distillation for Dense Prediction

3 code implementations ICCV 2021 Changyong Shu, Yifan Liu, Jianfei Gao, Zheng Yan, Chunhua Shen

Observing that in semantic segmentation, some layers' feature activations of each channel tend to encode saliency of scene categories (analogue to class activation mapping), we propose to align features channel-wise between the student and teacher networks.

Knowledge Distillation Segmentation +1

DyCo3D: Robust Instance Segmentation of 3D Point Clouds through Dynamic Convolution

1 code implementation CVPR 2021 Tong He, Chunhua Shen, Anton Van Den Hengel

Previous top-performing approaches for point cloud instance segmentation involve a bottom-up strategy, which often includes inefficient operations or complex pipelines, such as grouping over-segmented components, introducing additional steps for refining, or designing complicated loss functions.

Instance Segmentation Semantic Segmentation

PGL: Prior-Guided Local Self-supervised Learning for 3D Medical Image Segmentation

no code implementations25 Nov 2020 Yutong Xie, Jianpeng Zhang, Zehui Liao, Yong Xia, Chunhua Shen

In this paper, we propose a PriorGuided Local (PGL) self-supervised model that learns the region-wise local consistency in the latent feature space.

Image Segmentation Medical Image Segmentation +3

Graph Attention Tracking

no code implementations CVPR 2021 Dongyan Guo, Yanyan Shao, Ying Cui, Zhenhua Wang, Liyan Zhang, Chunhua Shen

We propose to establish part-to-part correspondence between the target and the search region with a complete bipartite graph, and apply the graph attention mechanism to propagate target information from the template feature to the search feature.

Graph Attention Object Tracking +1

Robust Data Hiding Using Inverse Gradient Attention

1 code implementation21 Nov 2020 Honglei Zhang, Hu Wang, Yuanzhouhan Cao, Chunhua Shen, Yidong Li

In deep data hiding models, to maximize the encoding capacity, each pixel of the cover image ought to be treated differently since they have different sensitivities w. r. t.

DoDNet: Learning to segment multi-organ and tumors from multiple partially labeled datasets

1 code implementation CVPR 2021 Jianpeng Zhang, Yutong Xie, Yong Xia, Chunhua Shen

To address this, we propose a dynamic on-demand network (DoDNet) that learns to segment multiple organs and tumors on partially labeled datasets.

Image Segmentation Medical Image Segmentation +4

Unifying Instance and Panoptic Segmentation with Dynamic Rank-1 Convolutions

no code implementations19 Nov 2020 Hao Chen, Chunhua Shen, Zhi Tian

To our knowledge, DR1Mask is the first panoptic segmentation framework that exploits a shared feature map for both instance and semantic segmentation by considering both efficacy and efficiency.

Instance Segmentation Multi-Task Learning +2

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

6 code implementations CVPR 2021 Xinlong Wang, Rufeng Zhang, Chunhua Shen, Tao Kong, Lei LI

Compared to the baseline method MoCo-v2, our method introduces negligible computation overhead (only <1% slower), but demonstrates consistently superior performance when transferring to downstream dense prediction tasks including object detection, semantic segmentation and instance segmentation; and outperforms the state-of-the-art methods by a large margin.

Contrastive Learning Image Classification +7

FATNN: Fast and Accurate Ternary Neural Networks

no code implementations ICCV 2021 Peng Chen, Bohan Zhuang, Chunhua Shen

Ternary Neural Networks (TNNs) have received much attention due to being potentially orders of magnitude faster in inference, as well as more power efficient, than full-precision counterparts.

Image Classification Quantization

Representative Graph Neural Network

no code implementations ECCV 2020 Changqian Yu, Yifan Liu, Changxin Gao, Chunhua Shen, Nong Sang

In this paper, we present a Representative Graph (RepGraph) layer to dynamically sample a few representative features, which dramatically reduces redundancy.

object-detection Object Detection +1

Pairwise Relation Learning for Semi-supervised Gland Segmentation

no code implementations6 Aug 2020 Yutong Xie, Jianpeng Zhang, Zhibin Liao, Chunhua Shen, Johan Verjans, Yong Xia

In this paper, we propose the pairwise relation-based semi-supervised (PRS^2) model for gland segmentation on histology images.

Relation Relation Network +1

AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting

2 code implementations ECCV 2020 Wenhai Wang, Xuebo Liu, Xiaozhong Ji, Enze Xie, Ding Liang, Zhibo Yang, Tong Lu, Chunhua Shen, Ping Luo

Unlike previous works that merely employed visual features for text detection, this work proposes a novel text spotter, named Ambiguity Eliminating Text Spotter (AE TextSpotter), which learns both visual and linguistic features to significantly reduce ambiguity in text detection.

Language Modelling Sentence +2

Improving Generative Adversarial Networks with Local Coordinate Coding

1 code implementation28 Jul 2020 Jiezhang Cao, Yong Guo, Qingyao Wu, Chunhua Shen, Junzhou Huang, Mingkui Tan

In this paper, rather than sampling from the predefined prior distribution, we propose an LCCGAN model with local coordinate coding (LCC) to improve the performance of generating data.

Soft Expert Reward Learning for Vision-and-Language Navigation

no code implementations ECCV 2020 Hu Wang, Qi Wu, Chunhua Shen

In this paper, we introduce a Soft Expert Reward Learning (SERL) model to overcome the reward engineering designing and generalisation problems of the VLN task.

Reinforcement Learning (RL) Vision and Language Navigation

AQD: Towards Accurate Fully-Quantized Object Detection

1 code implementation CVPR 2021 Peng Chen, Jing Liu, Bohan Zhuang, Mingkui Tan, Chunhua Shen

Network quantization allows inference to be conducted using low-precision arithmetic for improved inference efficiency of deep neural networks on edge devices.

Image Classification Object +3

Deep Learning for Anomaly Detection: A Review

no code implementations6 Jul 2020 Guansong Pang, Chunhua Shen, Longbing Cao, Anton Van Den Hengel

This paper surveys the research of deep anomaly detection with a comprehensive taxonomy, covering advancements in three high-level categories and 11 fine-grained categories of the methods.

Anomaly Detection Novelty Detection +1

FCOS: A simple and strong anchor-free object detector

no code implementations14 Jun 2020 Zhi Tian, Chunhua Shen, Hao Chen, Tong He

In computer vision, object detection is one of most important tasks, which underpins a few instance-level recognition tasks and many downstream applications.

Object Object Detection +1

A Robust Attentional Framework for License Plate Recognition in the Wild

no code implementations6 Jun 2020 Linjiang Zhang, Peng Wang, Hui Li, Zhen Li, Chunhua Shen, Yanning Zhang

On the other hand, the 2D attentional based license plate recognizer with an Xception-based CNN encoder is capable of recognizing license plates with different patterns under various scenarios accurately and robustly.

Image Generation License Plate Recognition

Auto-Rectify Network for Unsupervised Indoor Depth Estimation

1 code implementation4 Jun 2020 Jia-Wang Bian, Huangying Zhan, Naiyan Wang, Tat-Jun Chin, Chunhua Shen, Ian Reid

However, excellent results have mostly been obtained in street-scene driving scenarios, and such methods often fail in other settings, particularly indoor videos taken by handheld devices.

Monocular Depth Estimation Self-Supervised Learning +1

Scope Head for Accurate Localization in Object Detection

no code implementations11 May 2020 Geng Zhan, Dan Xu, Guo Lu, Wei Wu, Chunhua Shen, Wanli Ouyang

Existing anchor-based and anchor-free object detectors in multi-stage or one-stage pipelines have achieved very promising detection performance.

Object object-detection +2

BiSeNet V2: Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation

7 code implementations5 Apr 2020 Changqian Yu, Changxin Gao, Jingbo Wang, Gang Yu, Chunhua Shen, Nong Sang

We propose to treat these spatial details and categorical semantics separately to achieve high accuracy and high efficiency for realtime semantic segmentation.

Real-Time Semantic Segmentation Segmentation

Context Prior for Scene Segmentation

2 code implementations CVPR 2020 Changqian Yu, Jingbo Wang, Changxin Gao, Gang Yu, Chunhua Shen, Nong Sang

Given an input image and corresponding ground truth, Affinity Loss constructs an ideal affinity map to supervise the learning of Context Prior.

Scene Segmentation Scene Understanding +1

Segmenting Transparent Objects in the Wild

1 code implementation ECCV 2020 Enze Xie, Wenjia Wang, Wenhai Wang, Mingyu Ding, Chunhua Shen, Ping Luo

To address this important problem, this work proposes a large-scale dataset for transparent object segmentation, named Trans10K, consisting of 10, 428 images of real scenarios with carefully manual annotations, which are 10 times larger than the existing datasets.

Segmentation Semantic Segmentation +1

Viral Pneumonia Screening on Chest X-ray Images Using Confidence-Aware Anomaly Detection

1 code implementation27 Mar 2020 Jianpeng Zhang, Yutong Xie, Guansong Pang, Zhibin Liao, Johan Verjans, Wenxin Li, Zongji Sun, Jian He, Yi Li, Chunhua Shen, Yong Xia

In this paper, we formulate the task of differentiating viral pneumonia from non-viral pneumonia and healthy controls into an one-class classification-based anomaly detection problem, and thus propose the confidence-aware anomaly detection (CAAD) model, which consists of a shared feature extractor, an anomaly detection module, and a confidence prediction module.

Binary Classification Classification +2

SOLOv2: Dynamic and Fast Instance Segmentation

18 code implementations NeurIPS 2020 Xinlong Wang, Rufeng Zhang, Tao Kong, Lei LI, Chunhua Shen

Importantly, we take one step further by dynamically learning the mask head of the object segmenter such that the mask head is conditioned on the location.

object-detection Object Detection +4

DeepEMD: Differentiable Earth Mover's Distance for Few-Shot Learning

5 code implementations15 Mar 2020 Chi Zhang, Yujun Cai, Guosheng Lin, Chunhua Shen

We employ the Earth Mover's Distance (EMD) as a metric to compute a structural distance between dense image representations to determine image relevance.

Classification Few-Shot Image Classification +4

Self-trained Deep Ordinal Regression for End-to-End Video Anomaly Detection

no code implementations CVPR 2020 Guansong Pang, Cheng Yan, Chunhua Shen, Anton Van Den Hengel, Xiao Bai

Video anomaly detection is of critical practical importance to a variety of real applications because it allows human attention to be focused on events that are likely to be of interest, in spite of an otherwise overwhelming volume of video.

Anomaly Detection regression +2

Conditional Convolutions for Instance Segmentation

7 code implementations ECCV 2020 Zhi Tian, Chunhua Shen, Hao Chen

We propose a simple yet effective instance segmentation framework, termed CondInst (conditional convolutions for instance segmentation).

Instance Segmentation Segmentation +1

Real-Time High-Performance Semantic Image Segmentation of Urban Street Scenes

no code implementations11 Mar 2020 Genshun Dong, Yan Yan, Chunhua Shen, Hanzi Wang

Meanwhile, a Spatial detail-Preserving Network (SPN) with shallow convolutional layers is designed to generate high-resolution feature maps preserving the detailed spatial information.

Image Segmentation Segmentation +2

Efficient Semantic Video Segmentation with Per-frame Inference

1 code implementation ECCV 2020 Yifan Liu, Chunhua Shen, Changqian Yu, Jingdong Wang

For semantic segmentation, most existing real-time deep models trained with each frame independently may produce inconsistent results for a video sequence.

Knowledge Distillation Optical Flow Estimation +4

Joint Deep Learning of Facial Expression Synthesis and Recognition

no code implementations6 Feb 2020 Yan Yan, Ying Huang, Si Chen, Chunhua Shen, Hanzi Wang

Firstly, a facial expression synthesis generative adversarial network (FESGAN) is pre-trained to generate facial images with different facial expressions.

Facial Expression Recognition Facial Expression Recognition (FER) +1

DiverseDepth: Affine-invariant Depth Prediction Using Diverse Data

2 code implementations3 Feb 2020 Wei Yin, Xinlong Wang, Chunhua Shen, Yifan Liu, Zhi Tian, Songcen Xu, Changming Sun, Dou Renyin

Compared with previous learning objectives, i. e., learning metric depth or relative depth, we propose to learn the affine-invariant depth using our diverse dataset to ensure both generalization and high-quality geometric shapes of scenes.

Depth Estimation Depth Prediction

Separating Content from Style Using Adversarial Learning for Recognizing Text in the Wild

no code implementations13 Jan 2020 Canjie Luo, Qingxiang Lin, Yuliang Liu, Lianwen Jin, Chunhua Shen

Furthermore, to tackle the issue of lacking paired training samples, we design an interactive joint training scheme, which shares attention masks from the recognizer to the discriminator, and enables the discriminator to extract the features of each character for further adversarial training.

Style Transfer

Memorizing Comprehensively to Learn Adaptively: Unsupervised Cross-Domain Person Re-ID with Multi-level Memory

no code implementations13 Jan 2020 Xin-Yu Zhang, Dong Gong, Jiewei Cao, Chunhua Shen

Due to the lack of supervision in the target domain, it is crucial to identify the underlying similarity-and-dissimilarity relationships among the unlabelled samples in the target domain.

Person Re-Identification

From Open Set to Closed Set: Supervised Spatial Divide-and-Conquer for Object Counting

3 code implementations7 Jan 2020 Haipeng Xiong, Hao Lu, Chengxin Liu, Liang Liu, Chunhua Shen, Zhiguo Cao

Visual counting, a task that aims to estimate the number of objects from an image/video, is an open-set problem by nature, i. e., the number of population can vary in [0, inf) in theory.

Object Counting

BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation

9 code implementations CVPR 2020 Hao Chen, Kunyang Sun, Zhi Tian, Chunhua Shen, Yongming Huang, Youliang Yan

The proposed BlendMask can effectively predict dense per-pixel position-sensitive instance features with very few channels, and learn attention maps for each instance with merely one convolution layer, thus being fast in inference.

Real-time Instance Segmentation Segmentation +1

Ordered or Orderless: A Revisit for Video based Person Re-Identification

no code implementations24 Dec 2019 Le Zhang, Zenglin Shi, Joey Tianyi Zhou, Ming-Ming Cheng, Yun Liu, Jia-Wang Bian, Zeng Zeng, Chunhua Shen

Specifically, with a diagnostic analysis, we show that the recurrent structure may not be effective to learn temporal dependencies than what we expected and implicitly yields an orderless representation.

Video-Based Person Re-Identification

Unsupervised Representation Learning by Predicting Random Distances

2 code implementations22 Dec 2019 Hu Wang, Guansong Pang, Chunhua Shen, Congbo Ma

To enable unsupervised learning on those domains, in this work we propose to learn features without using any labelled data by training neural networks to predict data distances in a randomly projected space.

Anomaly Detection Clustering +1

Exploring the Capacity of an Orderless Box Discretization Network for Multi-orientation Scene Text Detection

1 code implementation20 Dec 2019 Yuliang Liu, Tong He, Hao Chen, Xinyu Wang, Canjie Luo, Shuaitao Zhang, Chunhua Shen, Lianwen Jin

More importantly, based on OBD, we provide a detailed analysis of the impact of a collection of refinements, which may inspire others to build state-of-the-art text detectors.

Scene Text Detection Text Detection

To Balance or Not to Balance: A Simple-yet-Effective Approach for Learning with Long-Tailed Distributions

no code implementations10 Dec 2019 Jun-Jie Zhang, Lingqiao Liu, Peng Wang, Chunhua Shen

Such imbalanced distribution causes a great challenge for learning a deep neural network, which can be boiled down into a dilemma: on the one hand, we prefer to increase the exposure of tail class samples to avoid the excessive dominance of head classes in the classifier training.

Auxiliary Learning Self-Supervised Learning

Unified Multifaceted Feature Learning for Person Re-Identification

no code implementations20 Nov 2019 Cheng Yan, Guansong Pang, Xiao Bai, Chunhua Shen

The loss structures the augmented images resulted by the two types of image erasing in a two-level hierarchy and enforces multifaceted attention to different parts.

Person Re-Identification

Deep Anomaly Detection with Deviation Networks

6 code implementations19 Nov 2019 Guansong Pang, Chunhua Shen, Anton Van Den Hengel

Instead of representation learning, our method fulfills an end-to-end learning of anomaly scores by a neural deviation learning, in which we leverage a few (e. g., multiple to dozens) labeled anomalies and a prior probability to enforce statistically significant deviations of the anomaly scores of anomalies from that of normal data objects in the upper tail.

Anomaly Detection Cyber Attack Detection +3

DirectPose: Direct End-to-End Multi-Person Pose Estimation

8 code implementations18 Nov 2019 Zhi Tian, Hao Chen, Chunhua Shen

We propose the first direct end-to-end multi-person pose estimation framework, termed DirectPose.

Multi-Person Pose Estimation

Multi-marginal Wasserstein GAN

3 code implementations NeurIPS 2019 Jiezhang Cao, Langyuan Mo, Yifan Zhang, Kui Jia, Chunhua Shen, Mingkui Tan

Multiple marginal matching problem aims at learning mappings to match a source domain to multiple target domains and it has attracted great attention in many applications, such as multi-domain image translation.

Image Generation Translation

Deep Weakly-supervised Anomaly Detection

3 code implementations30 Oct 2019 Guansong Pang, Chunhua Shen, Huidong Jin, Anton Van Den Hengel

To detect both seen and unseen anomalies, we introduce a novel deep weakly-supervised approach, namely Pairwise Relation prediction Network (PReNet), that learns pairwise relation features and anomaly scores by predicting the relation of any two randomly sampled training instances, in which the pairwise relation can be anomaly-anomaly, anomaly-unlabeled, or unlabeled-unlabeled.

Relation Semi-supervised Anomaly Detection +3

PolarMask: Single Shot Instance Segmentation with Polar Representation

2 code implementations CVPR 2020 Enze Xie, Peize Sun, Xiaoge Song, Wenhai Wang, Ding Liang, Chunhua Shen, Ping Luo

In this paper, we introduce an anchor-box free and single shot instance segmentation method, which is conceptually simple, fully convolutional and can be used as a mask prediction module for instance segmentation, by easily embedding it into most off-the-shelf detection methods.

Distance regression Instance Segmentation +4

Structured Binary Neural Networks for Image Recognition

no code implementations22 Sep 2019 Bohan Zhuang, Chunhua Shen, Mingkui Tan, Peng Chen, Lingqiao Liu, Ian Reid

Experiments on both classification, semantic segmentation and object detection tasks demonstrate the superior performance of the proposed methods over various quantized networks in the literature.

object-detection Object Detection +2

Task-Aware Monocular Depth Estimation for 3D Object Detection

1 code implementation17 Sep 2019 Xinlong Wang, Wei Yin, Tao Kong, Yuning Jiang, Lei LI, Chunhua Shen

In this paper, we first analyse the data distributions and interaction of foreground and background, then propose the foreground-background separated monocular depth estimation (ForeSeE) method, to estimate the foreground depth and background depth using separate optimization objectives and depth decoders.

3D Object Detection 3D Object Recognition +4

TextSR: Content-Aware Text Super-Resolution Guided by Recognition

1 code implementation16 Sep 2019 Wenjia Wang, Enze Xie, Peize Sun, Wenhai Wang, Lixun Tian, Chunhua Shen, Ping Luo

Nonetheless, most of the previous methods may not work well in recognizing text with low resolution which is often seen in natural scene images.

Scene Text Recognition Super-Resolution

Auxiliary Learning for Deep Multi-task Learning

no code implementations5 Sep 2019 Yifan Liu, Bohan Zhuang, Chunhua Shen, Hao Chen, Wei Yin

The most current methods can be categorized as either: (i) hard parameter sharing where a subset of the parameters is shared among tasks while other parameters are task-specific; or (ii) soft parameter sharing where all parameters are task-specific but they are jointly regularized.

Auxiliary Learning Depth Estimation +3

Unsupervised Scale-consistent Depth and Ego-motion Learning from Monocular Video

2 code implementations NeurIPS 2019 Jia-Wang Bian, Zhichao Li, Naiyan Wang, Huangying Zhan, Chunhua Shen, Ming-Ming Cheng, Ian Reid

To the best of our knowledge, this is the first work to show that deep networks trained using unlabelled monocular videos can predict globally scale-consistent camera trajectories over a long video sequence.

Depth And Camera Motion Monocular Depth Estimation +1

Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network

6 code implementations ICCV 2019 Wenhai Wang, Enze Xie, Xiaoge Song, Yuhang Zang, Wenjia Wang, Tong Lu, Gang Yu, Chunhua Shen

Recently, some methods have been proposed to tackle arbitrary-shaped text detection, but they rarely take the speed of the entire pipeline into consideration, which may fall short in practical applications. In this paper, we propose an efficient and accurate arbitrary-shaped text detector, termed Pixel Aggregation Network (PAN), which is equipped with a low computational-cost segmentation head and a learnable post-processing.

Scene Text Detection Segmentation +1

MobileFAN: Transferring Deep Hidden Representation for Face Alignment

no code implementations11 Aug 2019 Yang Zhao, Yifan Liu, Chunhua Shen, Yongsheng Gao, Shengwu Xiong

To this end, we propose an effective lightweight model, namely Mobile Face Alignment Network (MobileFAN), using a simple backbone MobileNetV2 as the encoder and three deconvolutional layers as the decoder.

Face Alignment Facial Landmark Detection

Index Network

6 code implementations11 Aug 2019 Hao Lu, Yutong Dai, Chunhua Shen, Songcen Xu

By viewing the indices as a function of the feature map, we introduce the concept of "learning to index", and present a novel index-guided encoder-decoder framework where indices are self-learned adaptively from data and are used to guide the downsampling and upsampling stages, without extra training supervision.

Grayscale Image Denoising Image Denoising +3

Effective Training of Convolutional Neural Networks with Low-bitwidth Weights and Activations

no code implementations10 Aug 2019 Bohan Zhuang, Jing Liu, Mingkui Tan, Lingqiao Liu, Ian Reid, Chunhua Shen

Furthermore, we propose a second progressive quantization scheme which gradually decreases the bit-width from high-precision to low-precision during training.

Knowledge Distillation Quantization

Regularizing Proxies with Multi-Adversarial Training for Unsupervised Domain-Adaptive Semantic Segmentation

1 code implementation29 Jul 2019 Tong Shen, Dong Gong, Wei zhang, Chunhua Shen, Tao Mei

To tackle the unsupervised domain adaptation problem, we explore the possibilities to generate high-quality labels as proxy labels to supervise the training on target data.

Semantic Segmentation Unsupervised Domain Adaptation

V-PROM: A Benchmark for Visual Reasoning Using Visual Progressive Matrices

no code implementations29 Jul 2019 Damien Teney, Peng Wang, Jiewei Cao, Lingqiao Liu, Chunhua Shen, Anton Van Den Hengel

One of the primary challenges faced by deep learning is the degree to which current methods exploit superficial statistics and dataset bias, rather than learning to generalise over the specific representations they have experienced.

Visual Reasoning

Towards End-to-End Text Spotting in Natural Scenes

no code implementations14 Jun 2019 Peng Wang, Hui Li, Chunhua Shen

Text spotting in natural scene images is of great importance for many image understanding tasks.

Image Cropping Text Detection +1

REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments

1 code implementation CVPR 2020 Yuankai Qi, Qi Wu, Peter Anderson, Xin Wang, William Yang Wang, Chunhua Shen, Anton Van Den Hengel

One of the long-term challenges of robotics is to enable robots to interact with humans in the visual world via natural language, as humans are visual animals that communicate through language.

Referring Expression Vision and Language Navigation

Template-Based Automatic Search of Compact Semantic Segmentation Architectures

1 code implementation4 Apr 2019 Vladimir Nekrasov, Chunhua Shen, Ian Reid

Automatic search of neural architectures for various vision and natural language tasks is becoming a prominent tool as it allows to discover high-performing structures on any dataset of interest.

General Classification Holdout Set +1

FCOS: Fully Convolutional One-Stage Object Detection

85 code implementations ICCV 2019 Zhi Tian, Chunhua Shen, Hao Chen, Tong He

By eliminating the predefined set of anchor boxes, FCOS completely avoids the complicated computation related to anchor boxes such as calculating overlapping during training.

Object Object Detection +2

Training Quantized Neural Networks with a Full-precision Auxiliary Module

no code implementations CVPR 2020 Bohan Zhuang, Lingqiao Liu, Mingkui Tan, Chunhua Shen, Ian Reid

In this paper, we seek to tackle a challenge in training low-precision networks: the notorious difficulty in propagating gradient through a low-precision network due to the non-differentiable quantization function.

Image Classification object-detection +2

Knowledge Adaptation for Efficient Semantic Segmentation

1 code implementation CVPR 2019 Tong He, Chunhua Shen, Zhi Tian, Dong Gong, Changming Sun, Youliang Yan

To tackle this dilemma, we propose a knowledge distillation method tailored for semantic segmentation to improve the performance of the compact FCNs with large overall stride.

Knowledge Distillation Segmentation +1

Structured Knowledge Distillation for Dense Prediction

1 code implementation CVPR 2019 Yifan Liu, Changyong Shun, Jingdong Wang, Chunhua Shen

Here we propose to distill structured knowledge from large networks to compact networks, taking into account the fact that dense prediction is a structured prediction problem.

Depth Estimation General Classification +7

A Mutual Bootstrapping Model for Automated Skin Lesion Segmentation and Classification

1 code implementation8 Mar 2019 Yutong Xie, Jianpeng Zhang, Yong Xia, Chunhua Shen

Our results suggest that it is possible to boost the performance of skin lesion segmentation and classification simultaneously via training a unified model to perform both tasks in a mutual bootstrapping way.

Classification General Classification +3

Decoders Matter for Semantic Segmentation: Data-Dependent Decoding Enables Flexible Feature Aggregation

no code implementations CVPR 2019 Zhi Tian, Tong He, Chunhua Shen, Youliang Yan

In this work, we propose a data-dependent upsampling (DUpsampling) to replace bilinear, which takes advantages of the redundancy in the label space of semantic segmentation and is able to recover the pixel-wise prediction from low-resolution outputs of CNNs.

Segmentation Semantic Segmentation

Associatively Segmenting Instances and Semantics in Point Clouds

3 code implementations CVPR 2019 Xinlong Wang, Shu Liu, Xiaoyong Shen, Chunhua Shen, Jiaya Jia

A 3D point cloud describes the real scene precisely and intuitively. To date how to segment diversified elements in such an informative 3D scene is rarely discussed.

Ranked #15 on 3D Instance Segmentation on S3DIS (mRec metric)

3D Instance Segmentation 3D Semantic Segmentation +1

Salient Object Detection with Lossless Feature Reflection and Weighted Structural Loss

no code implementations21 Jan 2019 Pingping Zhang, Wei Liu, Huchuan Lu, Chunhua Shen

Inspired by the intrinsic reflection of natural images, in this paper we propose a novel feature learning framework for large-scale salient object detection.

Object object-detection +3

Cannot find the paper you are looking for? You can Submit a new open access paper.