Search Results for author: Chunhua Shen

Found 369 papers, 159 papers with code

Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition

7 code implementations2 Nov 2018 Hui Li, Peng Wang, Chunhua Shen, Guyu Zhang

Recognizing irregular text in natural scene images is challenging due to the large variance in text appearance, such as curvature, orientation and distortion.

Irregular Text Recognition Scene Text Recognition

Twins: Revisiting the Design of Spatial Attention in Vision Transformers

8 code implementations NeurIPS 2021 Xiangxiang Chu, Zhi Tian, Yuqing Wang, Bo Zhang, Haibing Ren, Xiaolin Wei, Huaxia Xia, Chunhua Shen

Very recently, a variety of vision transformer architectures for dense prediction tasks have been proposed and they show that the design of spatial attention is critical to their success in these tasks.

Image Classification Semantic Segmentation

FCOS: Fully Convolutional One-Stage Object Detection

86 code implementations ICCV 2019 Zhi Tian, Chunhua Shen, Hao Chen, Tong He

By eliminating the predefined set of anchor boxes, FCOS completely avoids the complicated computation related to anchor boxes such as calculating overlapping during training.

Object Object Detection +2

SOLOv2: Dynamic and Fast Instance Segmentation

18 code implementations NeurIPS 2020 Xinlong Wang, Rufeng Zhang, Tao Kong, Lei LI, Chunhua Shen

Importantly, we take one step further by dynamically learning the mask head of the object segmenter such that the mask head is conditioned on the location.

object-detection Object Detection +4

BiSeNet V2: Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation

7 code implementations5 Apr 2020 Changqian Yu, Changxin Gao, Jingbo Wang, Gang Yu, Chunhua Shen, Nong Sang

We propose to treat these spatial details and categorical semantics separately to achieve high accuracy and high efficiency for realtime semantic segmentation.

Real-Time Semantic Segmentation Segmentation

Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network

6 code implementations ICCV 2019 Wenhai Wang, Enze Xie, Xiaoge Song, Yuhang Zang, Wenjia Wang, Tong Lu, Gang Yu, Chunhua Shen

Recently, some methods have been proposed to tackle arbitrary-shaped text detection, but they rarely take the speed of the entire pipeline into consideration, which may fall short in practical applications. In this paper, we propose an efficient and accurate arbitrary-shaped text detector, termed Pixel Aggregation Network (PAN), which is equipped with a low computational-cost segmentation head and a learnable post-processing.

Scene Text Detection Segmentation +1

DirectPose: Direct End-to-End Multi-Person Pose Estimation

8 code implementations18 Nov 2019 Zhi Tian, Hao Chen, Chunhua Shen

We propose the first direct end-to-end multi-person pose estimation framework, termed DirectPose.

Multi-Person Pose Estimation

BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation

9 code implementations CVPR 2020 Hao Chen, Kunyang Sun, Zhi Tian, Chunhua Shen, Yongming Huang, Youliang Yan

The proposed BlendMask can effectively predict dense per-pixel position-sensitive instance features with very few channels, and learn attention maps for each instance with merely one convolution layer, thus being fast in inference.

Real-time Instance Segmentation Segmentation +1

Conditional Convolutions for Instance Segmentation

7 code implementations ECCV 2020 Zhi Tian, Chunhua Shen, Hao Chen

We propose a simple yet effective instance segmentation framework, termed CondInst (conditional convolutions for instance segmentation).

Instance Segmentation Segmentation +1

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

6 code implementations CVPR 2021 Xinlong Wang, Rufeng Zhang, Chunhua Shen, Tao Kong, Lei LI

Compared to the baseline method MoCo-v2, our method introduces negligible computation overhead (only <1% slower), but demonstrates consistently superior performance when transferring to downstream dense prediction tasks including object detection, semantic segmentation and instance segmentation; and outperforms the state-of-the-art methods by a large margin.

Contrastive Learning Image Classification +7

Images Speak in Images: A Generalist Painter for In-Context Visual Learning

1 code implementation CVPR 2023 Xinlong Wang, Wen Wang, Yue Cao, Chunhua Shen, Tiejun Huang

In this work, we present Painter, a generalist model which addresses these obstacles with an "image"-centric solution, that is, to redefine the output of core vision tasks as images, and specify task prompts as also images.

In-Context Learning Keypoint Detection +2

SegGPT: Segmenting Everything In Context

1 code implementation6 Apr 2023 Xinlong Wang, Xiaosong Zhang, Yue Cao, Wen Wang, Chunhua Shen, Tiejun Huang

We unify various segmentation tasks into a generalist in-context learning framework that accommodates different kinds of segmentation data by transforming them into the same format of images.

 Ranked #1 on Few-Shot Semantic Segmentation on PASCAL-5i (5-Shot) (using extra training data)

Few-Shot Semantic Segmentation In-Context Learning +5

Channel-wise Knowledge Distillation for Dense Prediction

3 code implementations ICCV 2021 Changyong Shu, Yifan Liu, Jianfei Gao, Zheng Yan, Chunhua Shen

Observing that in semantic segmentation, some layers' feature activations of each channel tend to encode saliency of scene categories (analogue to class activation mapping), we propose to align features channel-wise between the student and teacher networks.

Knowledge Distillation Segmentation +1

Knowledge Adaptation for Efficient Semantic Segmentation

1 code implementation CVPR 2019 Tong He, Chunhua Shen, Zhi Tian, Dong Gong, Changming Sun, Youliang Yan

To tackle this dilemma, we propose a knowledge distillation method tailored for semantic segmentation to improve the performance of the compact FCNs with large overall stride.

Knowledge Distillation Segmentation +1

Learning to Recover 3D Scene Shape from a Single Image

1 code implementation CVPR 2021 Wei Yin, Jianming Zhang, Oliver Wang, Simon Niklaus, Long Mai, Simon Chen, Chunhua Shen

Despite significant progress in monocular depth estimation in the wild, recent state-of-the-art methods cannot be used to recover accurate 3D scene shape due to an unknown depth shift induced by shift-invariant reconstruction losses used in mixed-data depth prediction training, and possible unknown camera focal length.

 Ranked #1 on Indoor Monocular Depth Estimation on DIODE (using extra training data)

3D Scene Reconstruction Depth Prediction +3

Towards Accurate Reconstruction of 3D Scene Shape from A Single Monocular Image

1 code implementation28 Aug 2022 Wei Yin, Jianming Zhang, Oliver Wang, Simon Niklaus, Simon Chen, Yifan Liu, Chunhua Shen

To do so, we propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image, and then exploits 3D point cloud data to predict the depth shift and the camera's focal length that allow us to recover 3D scene shapes.

Depth Estimation Depth Prediction

PolarMask: Single Shot Instance Segmentation with Polar Representation

2 code implementations CVPR 2020 Enze Xie, Peize Sun, Xiaoge Song, Wenhai Wang, Ding Liang, Chunhua Shen, Ping Luo

In this paper, we introduce an anchor-box free and single shot instance segmentation method, which is conceptually simple, fully convolutional and can be used as a mask prediction module for instance segmentation, by easily embedding it into most off-the-shelf detection methods.

Distance regression Instance Segmentation +4

Image Restoration Using Convolutional Auto-encoders with Symmetric Skip Connections

17 code implementations29 Jun 2016 Xiao-Jiao Mao, Chunhua Shen, Yu-Bin Yang

In this work, we propose a very deep fully convolutional auto-encoder network for image restoration, which is a encoding-decoding framework with symmetric convolutional-deconvolutional layers.

Image Denoising JPEG Artifact Correction +1

MobileVLM V2: Faster and Stronger Baseline for Vision Language Model

1 code implementation6 Feb 2024 Xiangxiang Chu, Limeng Qiao, Xinyu Zhang, Shuang Xu, Fei Wei, Yang Yang, Xiaofei Sun, Yiming Hu, Xinyang Lin, Bo Zhang, Chunhua Shen

We introduce MobileVLM V2, a family of significantly improved vision language models upon MobileVLM, which proves that a delicate orchestration of novel architectural design, an improved training scheme tailored for mobile VLMs, and rich high-quality dataset curation can substantially benefit VLMs' performance.

AutoML Language Modelling

End-to-End Video Instance Segmentation with Transformers

2 code implementations CVPR 2021 Yuqing Wang, Zhaoliang Xu, Xinlong Wang, Chunhua Shen, Baoshan Cheng, Hao Shen, Huaxia Xia

Here, we propose a new video instance segmentation framework built upon Transformers, termed VisTR, which views the VIS task as a direct end-to-end parallel sequence decoding/prediction problem.

Instance Segmentation Segmentation +3

Unsupervised Scale-consistent Depth and Ego-motion Learning from Monocular Video

2 code implementations NeurIPS 2019 Jia-Wang Bian, Zhichao Li, Naiyan Wang, Huangying Zhan, Chunhua Shen, Ming-Ming Cheng, Ian Reid

To the best of our knowledge, this is the first work to show that deep networks trained using unlabelled monocular videos can predict globally scale-consistent camera trajectories over a long video sequence.

Depth And Camera Motion Monocular Depth Estimation +1

Unsupervised Scale-consistent Depth Learning from Video

2 code implementations25 May 2021 Jia-Wang Bian, Huangying Zhan, Naiyan Wang, Zhichao Li, Le Zhang, Chunhua Shen, Ming-Ming Cheng, Ian Reid

We propose a monocular depth estimator SC-Depth, which requires only unlabelled videos for training and enables the scale-consistent prediction at inference time.

Monocular Depth Estimation Monocular Visual Odometry +1

Structured Knowledge Distillation for Dense Prediction

1 code implementation CVPR 2019 Yifan Liu, Changyong Shun, Jingdong Wang, Chunhua Shen

Here we propose to distill structured knowledge from large networks to compact networks, taking into account the fact that dense prediction is a structured prediction problem.

Depth Estimation General Classification +7

Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image

1 code implementation ICCV 2023 Wei Yin, Chi Zhang, Hao Chen, Zhipeng Cai, Gang Yu, Kaixuan Wang, Xiaozhi Chen, Chunhua Shen

State-of-the-art (SOTA) monocular metric depth estimation methods can only handle a single camera model and are unable to perform mixed-data training due to the metric ambiguity.

Ranked #19 on Monocular Depth Estimation on NYU-Depth V2 (using extra training data)

Image Reconstruction Monocular Depth Estimation +1

Metric3D v2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation

1 code implementation Under review for Transaction 2024 Mu Hu, Wei Yin, Chi Zhang, Zhipeng Cai, Xiaoxiao Long, Hao Chen, Kaixuan Wang, Gang Yu, Chunhua Shen, Shaojie Shen

Our method benefits various applications including in-the-wild metrology monocular-SLAM, and 3D reconstruction, which highlight the versatility of Metric3D v2 models as geometric foundation models.

 Ranked #1 on Monocular Depth Estimation on NYU-Depth V2 (using extra training data)

3D Reconstruction Monocular Depth Estimation +3

RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation

13 code implementations CVPR 2017 Guosheng Lin, Anton Milan, Chunhua Shen, Ian Reid

Recently, very deep convolutional neural networks (CNNs) have shown outstanding performance in object recognition and have also been the first choice for dense classification problems such as semantic segmentation.

3D Absolute Human Pose Estimation Semantic Segmentation +1

DeepEMD: Differentiable Earth Mover's Distance for Few-Shot Learning

5 code implementations15 Mar 2020 Chi Zhang, Yujun Cai, Guosheng Lin, Chunhua Shen

We employ the Earth Mover's Distance (EMD) as a metric to compute a structural distance between dense image representations to determine image relevance.

Classification Few-Shot Image Classification +4

Visual Question Answering: A Survey of Methods and Datasets

1 code implementation20 Jul 2016 Qi Wu, Damien Teney, Peng Wang, Chunhua Shen, Anthony Dick, Anton Van Den Hengel

Visual Question Answering (VQA) is a challenging task that has received increasing attention from both the computer vision and the natural language processing communities.

General Knowledge Visual Question Answering

PAN++: Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped Text

1 code implementation2 May 2021 Wenhai Wang, Enze Xie, Xiang Li, Xuebo Liu, Ding Liang, Zhibo Yang, Tong Lu, Chunhua Shen

By systematically comparing with existing scene text representations, we show that our kernel representation can not only describe arbitrarily-shaped text but also well distinguish adjacent text.

Scene Text Detection Text Detection +1

Auto-Rectify Network for Unsupervised Indoor Depth Estimation

1 code implementation4 Jun 2020 Jia-Wang Bian, Huangying Zhan, Naiyan Wang, Tat-Jun Chin, Chunhua Shen, Ian Reid

However, excellent results have mostly been obtained in street-scene driving scenarios, and such methods often fail in other settings, particularly indoor videos taken by handheld devices.

Monocular Depth Estimation Self-Supervised Learning +1

Index Network

6 code implementations11 Aug 2019 Hao Lu, Yutong Dai, Chunhua Shen, Songcen Xu

By viewing the indices as a function of the feature map, we introduce the concept of "learning to index", and present a novel index-guided encoder-decoder framework where indices are self-learned adaptively from data and are used to guide the downsampling and upsampling stages, without extra training supervision.

Grayscale Image Denoising Image Denoising +3

TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation

3 code implementations CVPR 2022 Wenqiang Zhang, Zilong Huang, Guozhong Luo, Tao Chen, Xinggang Wang, Wenyu Liu, Gang Yu, Chunhua Shen

Although vision transformers (ViTs) have achieved great success in computer vision, the heavy computational cost hampers their applications to dense prediction tasks such as semantic segmentation on mobile devices.

Segmentation Semantic Segmentation

Matcher: Segment Anything with One Shot Using All-Purpose Feature Matching

1 code implementation22 May 2023 Yang Liu, Muzhi Zhu, Hengtao Li, Hao Chen, Xinlong Wang, Chunhua Shen

In this work, we present Matcher, a novel perception paradigm that utilizes off-the-shelf vision foundation models to address various perception tasks.

Segmentation Semantic Segmentation

Wider or Deeper: Revisiting the ResNet Model for Visual Recognition

3 code implementations30 Nov 2016 Zifeng Wu, Chunhua Shen, Anton Van Den Hengel

As a result, we are able to derive a new, shallower, architecture of residual networks which significantly outperforms much deeper models such as ResNet-200 on the ImageNet classification dataset.

Semantic Segmentation

Fast Neural Architecture Search of Compact Semantic Segmentation Models via Auxiliary Cells

4 code implementations CVPR 2019 Vladimir Nekrasov, Hao Chen, Chunhua Shen, Ian Reid

While most results in this domain have been achieved on image classification and language modelling problems, here we concentrate on dense per-pixel tasks, in particular, semantic image segmentation using fully convolutional networks.

Depth Prediction Image Classification +8

An end-to-end TextSpotter with Explicit Alignment and Attention

2 code implementations CVPR 2018 Tong He, Zhi Tian, Weilin Huang, Chunhua Shen, Yu Qiao, Changming Sun

This allows the two tasks to work collaboratively by shar- ing convolutional features, which is critical to identify challenging text instances.

Text Detection

Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models

1 code implementation30 Mar 2023 Wen Wang, Yan Jiang, Kangyang Xie, Zide Liu, Hao Chen, Yue Cao, Xinlong Wang, Chunhua Shen

Our vid2vid-zero leverages off-the-shelf image diffusion models, and doesn't require training on any video.

Image Generation Video Alignment +1

FreeSOLO: Learning to Segment Objects without Annotations

1 code implementation CVPR 2022 Xinlong Wang, Zhiding Yu, Shalini De Mello, Jan Kautz, Anima Anandkumar, Chunhua Shen, Jose M. Alvarez

FreeSOLO further demonstrates superiority as a strong pre-training method, outperforming state-of-the-art self-supervised pre-training methods by +9. 8% AP when fine-tuning instance segmentation with only 5% COCO masks.

Instance Segmentation object-detection +4

PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm

1 code implementation12 Oct 2023 Haoyi Zhu, Honghui Yang, Xiaoyang Wu, Di Huang, Sha Zhang, Xianglong He, Hengshuang Zhao, Chunhua Shen, Yu Qiao, Tong He, Wanli Ouyang

In this paper, we introduce a novel universal 3D pre-training framework designed to facilitate the acquisition of efficient 3D representation, thereby establishing a pathway to 3D foundational models.

 Ranked #1 on 3D Semantic Segmentation on ScanNet++ (using extra training data)

3D Object Detection 3D Reconstruction +5

Efficient Semantic Video Segmentation with Per-frame Inference

1 code implementation ECCV 2020 Yifan Liu, Chunhua Shen, Changqian Yu, Jingdong Wang

For semantic segmentation, most existing real-time deep models trained with each frame independently may produce inconsistent results for a video sequence.

Knowledge Distillation Optical Flow Estimation +4

Deep Weakly-supervised Anomaly Detection

3 code implementations30 Oct 2019 Guansong Pang, Chunhua Shen, Huidong Jin, Anton Van Den Hengel

To detect both seen and unseen anomalies, we introduce a novel deep weakly-supervised approach, namely Pairwise Relation prediction Network (PReNet), that learns pairwise relation features and anomaly scores by predicting the relation of any two randomly sampled training instances, in which the pairwise relation can be anomaly-anomaly, anomaly-unlabeled, or unlabeled-unlabeled.

Relation Semi-supervised Anomaly Detection +3

Deep Anomaly Detection with Deviation Networks

6 code implementations19 Nov 2019 Guansong Pang, Chunhua Shen, Anton Van Den Hengel

Instead of representation learning, our method fulfills an end-to-end learning of anomaly scores by a neural deviation learning, in which we leverage a few (e. g., multiple to dozens) labeled anomalies and a prior probability to enforce statistically significant deviations of the anomaly scores of anomalies from that of normal data objects in the upper tail.

Anomaly Detection Cyber Attack Detection +3

Unsupervised Representation Learning by Predicting Random Distances

2 code implementations22 Dec 2019 Hu Wang, Guansong Pang, Chunhua Shen, Congbo Ma

To enable unsupervised learning on those domains, in this work we propose to learn features without using any labelled data by training neural networks to predict data distances in a randomly projected space.

Anomaly Detection Clustering +1

CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image Segmentation

1 code implementation4 Mar 2021 Yutong Xie, Jianpeng Zhang, Chunhua Shen, Yong Xia

Convolutional neural networks (CNNs) have been the de facto standard for nowadays 3D medical image segmentation.

Image Segmentation Inductive Bias +4

VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

1 code implementation1 Mar 2024 Xiangxiang Chu, Jianlin Su, Bo Zhang, Chunhua Shen

Large language models are built on top of a transformer-based architecture to process textual inputs.

Image Classification Image Generation +2

DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion Models

1 code implementation NeurIPS 2023 Weijia Wu, Yuzhong Zhao, Hao Chen, YuChao Gu, Rui Zhao, Yefei He, Hong Zhou, Mike Zheng Shou, Chunhua Shen

To showcase the power of the proposed approach, we generate datasets with rich dense pixel-wise labels for a wide range of downstream tasks, including semantic segmentation, instance segmentation, and depth estimation.

Depth Estimation Domain Generalization +5

Exploring the Capacity of an Orderless Box Discretization Network for Multi-orientation Scene Text Detection

1 code implementation20 Dec 2019 Yuliang Liu, Tong He, Hao Chen, Xinyu Wang, Canjie Luo, Shuaitao Zhang, Chunhua Shen, Lianwen Jin

More importantly, based on OBD, we provide a detailed analysis of the impact of a collection of refinements, which may inspire others to build state-of-the-art text detectors.

Scene Text Detection Text Detection

FSRNet: End-to-End Learning Face Super-Resolution with Facial Priors

4 code implementations CVPR 2018 Yu Chen, Ying Tai, Xiaoming Liu, Chunhua Shen, Jian Yang

We present a novel deep end-to-end trainable Face Super-Resolution Network (FSRNet), which makes full use of the geometry prior, i. e., facial landmark heatmaps and parsing maps, to super-resolve very low-resolution (LR) face images without well-aligned requirement.

Face Alignment Generative Adversarial Network +1

Associatively Segmenting Instances and Semantics in Point Clouds

3 code implementations CVPR 2019 Xinlong Wang, Shu Liu, Xiaoyong Shen, Chunhua Shen, Jiaya Jia

A 3D point cloud describes the real scene precisely and intuitively. To date how to segment diversified elements in such an informative 3D scene is rarely discussed.

Ranked #15 on 3D Instance Segmentation on S3DIS (mRec metric)

3D Instance Segmentation 3D Semantic Segmentation +1

Context Prior for Scene Segmentation

2 code implementations CVPR 2020 Changqian Yu, Jingbo Wang, Changxin Gao, Gang Yu, Chunhua Shen, Nong Sang

Given an input image and corresponding ground truth, Affinity Loss constructs an ideal affinity map to supervise the learning of Context Prior.

Scene Segmentation Scene Understanding +1

Repulsion Loss: Detecting Pedestrians in a Crowd

2 code implementations CVPR 2018 Xinlong Wang, Tete Xiao, Yuning Jiang, Shuai Shao, Jian Sun, Chunhua Shen

In this paper, we first explore how a state-of-the-art pedestrian detector is harmed by crowd occlusion via experimentation, providing insights into the crowd occlusion problem.

Ranked #9 on Pedestrian Detection on Caltech (using extra training data)

Pedestrian Detection regression

FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation

2 code implementations8 Mar 2021 Lingtong Kong, Chunhua Shen, Jie Yang

Experiments on both synthetic Sintel data and real-world KITTI datasets demonstrate the effectiveness of the proposed approach, which needs only 1/10 computation of comparable networks to achieve on par accuracy.

Optical Flow Estimation

DiverseDepth: Affine-invariant Depth Prediction Using Diverse Data

2 code implementations3 Feb 2020 Wei Yin, Xinlong Wang, Chunhua Shen, Yifan Liu, Zhi Tian, Songcen Xu, Changming Sun, Dou Renyin

Compared with previous learning objectives, i. e., learning metric depth or relative depth, we propose to learn the affine-invariant depth using our diverse dataset to ensure both generalization and high-quality geometric shapes of scenes.

Depth Estimation Depth Prediction

NAS-FCOS: Efficient Search for Object Detection Architectures

1 code implementation24 Oct 2021 Ning Wang, Yang Gao, Hao Chen, Peng Wang, Zhi Tian, Chunhua Shen, Yanning Zhang

Neural Architecture Search (NAS) has shown great potential in effectively reducing manual effort in network design by automatically discovering optimal architectures.

Neural Architecture Search Object +2

SegViTv2: Exploring Efficient and Continual Semantic Segmentation with Plain Vision Transformers

1 code implementation9 Jun 2023 BoWen Zhang, Liyang Liu, Minh Hieu Phan, Zhi Tian, Chunhua Shen, Yifan Liu

This paper investigates the capability of plain Vision Transformers (ViTs) for semantic segmentation using the encoder-decoder framework and introduces \textbf{SegViTv2}.

Continual Learning Continual Semantic Segmentation +2

DoDNet: Learning to segment multi-organ and tumors from multiple partially labeled datasets

1 code implementation CVPR 2021 Jianpeng Zhang, Yutong Xie, Yong Xia, Chunhua Shen

To address this, we propose a dynamic on-demand network (DoDNet) that learns to segment multiple organs and tumors on partially labeled datasets.

Image Segmentation Medical Image Segmentation +4

Learning from partially labeled data for multi-organ and tumor segmentation

1 code implementation13 Nov 2022 Yutong Xie, Jianpeng Zhang, Yong Xia, Chunhua Shen

To address this, we propose a Transformer based dynamic on-demand network (TransDoDNet) that learns to segment organs and tumors on multiple partially labeled datasets.

Image Segmentation Medical Image Segmentation +4

Template-Based Automatic Search of Compact Semantic Segmentation Architectures

1 code implementation4 Apr 2019 Vladimir Nekrasov, Chunhua Shen, Ian Reid

Automatic search of neural architectures for various vision and natural language tasks is becoming a prominent tool as it allows to discover high-performing structures on any dataset of interest.

General Classification Holdout Set +1

From Open Set to Closed Set: Supervised Spatial Divide-and-Conquer for Object Counting

3 code implementations7 Jan 2020 Haipeng Xiong, Hao Lu, Chengxin Liu, Liang Liu, Chunhua Shen, Zhiguo Cao

Visual counting, a task that aims to estimate the number of objects from an image/video, is an open-set problem by nature, i. e., the number of population can vary in [0, inf) in theory.

Object Counting

SPTS: Single-Point Text Spotting

1 code implementation15 Dec 2021 Dezhi Peng, Xinyu Wang, Yuliang Liu, Jiaxin Zhang, Mingxin Huang, Songxuan Lai, Shenggao Zhu, Jing Li, Dahua Lin, Chunhua Shen, Xiang Bai, Lianwen Jin

For the first time, we demonstrate that training scene text spotting models can be achieved with an extremely low-cost annotation of a single-point for each instance.

Language Modelling Text Detection +1

SPTS v2: Single-Point Scene Text Spotting

3 code implementations4 Jan 2023 Yuliang Liu, Jiaxin Zhang, Dezhi Peng, Mingxin Huang, Xinyu Wang, Jingqun Tang, Can Huang, Dahua Lin, Chunhua Shen, Xiang Bai, Lianwen Jin

Within the context of our SPTS v2 framework, our experiments suggest a potential preference for single-point representation in scene text spotting when compared to other representations.

Text Detection Text Spotting

DyCo3D: Robust Instance Segmentation of 3D Point Clouds through Dynamic Convolution

1 code implementation CVPR 2021 Tong He, Chunhua Shen, Anton Van Den Hengel

Previous top-performing approaches for point cloud instance segmentation involve a bottom-up strategy, which often includes inefficient operations or complex pipelines, such as grouping over-segmented components, introducing additional steps for refining, or designing complicated loss functions.

Instance Segmentation Semantic Segmentation

Dynamic Convolution for 3D Point Cloud Instance Segmentation

1 code implementation18 Jul 2021 Tong He, Chunhua Shen, Anton Van Den Hengel

The proposed approach is proposal-free, and instead exploits a convolution process that adapts to the spatial and semantic characteristics of each instance.

Instance Segmentation Semantic Segmentation

SegPrompt: Boosting Open-world Segmentation via Category-level Prompt Learning

1 code implementation ICCV 2023 Muzhi Zhu, Hengtao Li, Hao Chen, Chengxiang Fan, Weian Mao, Chenchen Jing, Yifan Liu, Chunhua Shen

In this work, we propose a novel training mechanism termed SegPrompt that uses category information to improve the model's class-agnostic segmentation ability for both known and unknown categories.

Open-World Instance Segmentation Segmentation +1

REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments

1 code implementation CVPR 2020 Yuankai Qi, Qi Wu, Peter Anderson, Xin Wang, William Yang Wang, Chunhua Shen, Anton Van Den Hengel

One of the long-term challenges of robotics is to enable robots to interact with humans in the visual world via natural language, as humans are visual animals that communicate through language.

Referring Expression Vision and Language Navigation

Image Restoration Using Very Deep Convolutional Encoder-Decoder Networks with Symmetric Skip Connections

3 code implementations NeurIPS 2016 Xiao-Jiao Mao, Chunhua Shen, Yu-Bin Yang

We propose to symmetrically link convolutional and de-convolutional layers with skip-layer connections, with which the training converges much faster and attains a higher-quality local optimum.

Denoising Image Restoration +1

End-to-End Video Text Spotting with Transformer

1 code implementation20 Mar 2022 Weijia Wu, Yuanqiang Cai, Chunhua Shen, Debing Zhang, Ying Fu, Hong Zhou, Ping Luo

Recent video text spotting methods usually require the three-staged pipeline, i. e., detecting text in individual images, recognizing localized text, tracking text streams with post-processing to generate final results.

Text Detection Text Spotting

Segmenting Transparent Objects in the Wild

1 code implementation ECCV 2020 Enze Xie, Wenjia Wang, Wenhai Wang, Mingyu Ding, Chunhua Shen, Ping Luo

To address this important problem, this work proposes a large-scale dataset for transparent object segmentation, named Trans10K, consisting of 10, 428 images of real scenarios with carefully manual annotations, which are 10 times larger than the existing datasets.

Segmentation Semantic Segmentation +1

DisCo: Remedy Self-supervised Learning on Lightweight Models with Distilled Contrastive Learning

2 code implementations19 Apr 2021 Yuting Gao, Jia-Xin Zhuang, Shaohui Lin, Hao Cheng, Xing Sun, Ke Li, Chunhua Shen

Specifically, we find the final embedding obtained by the mainstream SSL methods contains the most fruitful information, and propose to distill the final embedding to maximally transmit a teacher's knowledge to a lightweight model by constraining the last embedding of the student to be consistent with that of the teacher.

Contrastive Learning Representation Learning +1

TextSR: Content-Aware Text Super-Resolution Guided by Recognition

1 code implementation16 Sep 2019 Wenjia Wang, Enze Xie, Peize Sun, Wenhai Wang, Lixun Tian, Chunhua Shen, Ping Luo

Nonetheless, most of the previous methods may not work well in recognizing text with low resolution which is often seen in natural scene images.

Scene Text Recognition Super-Resolution

Deepfake Generation and Detection: A Benchmark and Survey

1 code implementation26 Mar 2024 Gan Pei, Jiangning Zhang, Menghan Hu, Zhenyu Zhang, Chengjie Wang, Yunsheng Wu, Guangtao Zhai, Jian Yang, Chunhua Shen, DaCheng Tao

Deepfake is a technology dedicated to creating highly realistic facial images and videos under specific conditions, which has significant application potential in fields such as entertainment, movie production, digital human creation, to name a few.

Attribute Face Reenactment +2

Paragraph-to-Image Generation with Information-Enriched Diffusion Model

1 code implementation24 Nov 2023 Weijia Wu, Zhuang Li, Yefei He, Mike Zheng Shou, Chunhua Shen, Lele Cheng, Yan Li, Tingting Gao, Di Zhang, Zhongyuan Wang

In this paper, we introduce an information-enriched diffusion model for paragraph-to-image generation task, termed ParaDiffusion, which delves into the transference of the extensive semantic comprehension capabilities of large language models to the task of image generation.

Image Generation Language Modelling +1

Zolly: Zoom Focal Length Correctly for Perspective-Distorted Human Mesh Reconstruction

1 code implementation ICCV 2023 Wenjia Wang, Yongtao Ge, Haiyi Mei, Zhongang Cai, Qingping Sun, Yanjun Wang, Chunhua Shen, Lei Yang, Taku Komura

As it is hard to calibrate single-view RGB images in the wild, existing 3D human mesh reconstruction (3DHMR) methods either use a constant large focal length or estimate one based on the background environment context, which can not tackle the problem of the torso, limb, hand or face distortion caused by perspective camera projection when the camera is close to the human body.

3D Human Pose Estimation 3D Reconstruction

Efficient Decoder-free Object Detection with Transformers

2 code implementations14 Jun 2022 Peixian Chen, Mengdan Zhang, Yunhang Shen, Kekai Sheng, Yuting Gao, Xing Sun, Ke Li, Chunhua Shen

A natural usage of ViTs in detection is to replace the CNN-based backbone with a transformer-based backbone, which is straightforward and effective, with the price of bringing considerable computation burden for inference.

Object Object Detection

Explainable Deep Few-shot Anomaly Detection with Deviation Networks

1 code implementation1 Aug 2021 Guansong Pang, Choubo Ding, Chunhua Shen, Anton Van Den Hengel

Here, we study the problem of few-shot anomaly detection, in which we aim at using a few labeled anomaly examples to train sample-efficient discriminative detection models.

Ranked #5 on Supervised Anomaly Detection on MVTec AD (using extra training data)

Multiple Instance Learning Supervised Anomaly Detection +1

AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting

2 code implementations ECCV 2020 Wenhai Wang, Xuebo Liu, Xiaozhong Ji, Enze Xie, Ding Liang, Zhibo Yang, Tong Lu, Chunhua Shen, Ping Luo

Unlike previous works that merely employed visual features for text detection, this work proposes a novel text spotter, named Ambiguity Eliminating Text Spotter (AE TextSpotter), which learns both visual and linguistic features to significantly reduce ambiguity in text detection.

Language Modelling Sentence +2

Catching Both Gray and Black Swans: Open-set Supervised Anomaly Detection

1 code implementation CVPR 2022 Choubo Ding, Guansong Pang, Chunhua Shen

Despite most existing anomaly detection studies assume the availability of normal training samples only, a few labeled anomaly examples are often available in many real-world applications, such as defect samples identified during random quality inspection, lesion images confirmed by radiologists in daily medical screening, etc.

Ranked #4 on Supervised Anomaly Detection on MVTec AD (using extra training data)

Supervised Anomaly Detection

RGM: A Robust Generalizable Matching Model

1 code implementation18 Oct 2023 Songyan Zhang, Xinyu Sun, Hao Chen, Bo Li, Chunhua Shen

Finding corresponding pixels within a pair of images is a fundamental computer vision task with various applications.

Optical Flow Estimation

AQD: Towards Accurate Fully-Quantized Object Detection

1 code implementation CVPR 2021 Peng Chen, Jing Liu, Bohan Zhuang, Mingkui Tan, Chunhua Shen

Network quantization allows inference to be conducted using low-precision arithmetic for improved inference efficiency of deep neural networks on edge devices.

Image Classification Object +3

Task-Aware Monocular Depth Estimation for 3D Object Detection

1 code implementation17 Sep 2019 Xinlong Wang, Wei Yin, Tao Kong, Yuning Jiang, Lei LI, Chunhua Shen

In this paper, we first analyse the data distributions and interaction of foreground and background, then propose the foreground-background separated monocular depth estimation (ForeSeE) method, to estimate the foreground depth and background depth using separate optimization objectives and depth decoders.

3D Object Detection 3D Object Recognition +4

When Unsupervised Domain Adaptation Meets Tensor Representations

1 code implementation ICCV 2017 Hao Lu, Lei Zhang, Zhiguo Cao, Wei Wei, Ke Xian, Chunhua Shen, Anton Van Den Hengel

Domain adaption (DA) allows machine learning methods trained on data sampled from one distribution to be applied to data sampled from another.

Unsupervised Domain Adaptation

Target before Shooting: Accurate Anomaly Detection and Localization under One Millisecond via Cascade Patch Retrieval

1 code implementation13 Aug 2023 Hanxi Li, Jianfei Hu, Bo Li, Hao Chen, Yongbin Zheng, Chunhua Shen

In this framework, the anomaly detection problem is solved via a cascade patch retrieval procedure that retrieves the nearest neighbors for each test image patch in a coarse-to-fine fashion.

Supervised Anomaly Detection

A Simple Baseline for Semi-supervised Semantic Segmentation with Strong Data Augmentation

1 code implementation ICCV 2021 Jianlong Yuan, Yifan Liu, Chunhua Shen, Zhibin Wang, Hao Li

Previous works [3, 27] fail to employ strong augmentation in pseudo label learning efficiently, as the large distribution change caused by strong augmentation harms the batch normalisation statistics.

Data Augmentation Image Classification +3

PointAttN: You Only Need Attention for Point Cloud Completion

1 code implementation16 Mar 2022 Jun Wang, Ying Cui, Dongyan Guo, Junxia Li, Qingshan Liu, Chunhua Shen

To solve the problems, we leverage the cross-attention and self-attention mechanisms to design novel neural network for processing point cloud in a per-point manner to eliminate kNNs.

Point Cloud Completion

Multi-marginal Wasserstein GAN

3 code implementations NeurIPS 2019 Jiezhang Cao, Langyuan Mo, Yifan Zhang, Kui Jia, Chunhua Shen, Mingkui Tan

Multiple marginal matching problem aims at learning mappings to match a source domain to multiple target domains and it has attracted great attention in many applications, such as multi-domain image translation.

Image Generation Translation

ABCNet v2: Adaptive Bezier-Curve Network for Real-time End-to-end Text Spotting

1 code implementation8 May 2021 Yuliang Liu, Chunhua Shen, Lianwen Jin, Tong He, Peng Chen, Chongyu Liu, Hao Chen

Previous methods can be roughly categorized into two groups: character-based and segmentation-based, which often require character-level annotations and/or complex post-processing due to the unstructured output.

Text Spotting

Generative Prompt Model for Weakly Supervised Object Localization

1 code implementation ICCV 2023 Yuzhong Zhao, Qixiang Ye, Weijia Wu, Chunhua Shen, Fang Wan

During training, GenPromp converts image category labels to learnable prompt embeddings which are fed to a generative model to conditionally recover the input image with noise and learn representative embeddings.

 Ranked #1 on Weakly-Supervised Object Localization on CUB-200-2011 (Top-1 Localization Accuracy metric, using extra training data)

Image Denoising Language Modelling +2

A Mutual Bootstrapping Model for Automated Skin Lesion Segmentation and Classification

1 code implementation8 Mar 2019 Yutong Xie, Jianpeng Zhang, Yong Xia, Chunhua Shen

Our results suggest that it is possible to boost the performance of skin lesion segmentation and classification simultaneously via training a unified model to perform both tasks in a mutual bootstrapping way.

Classification General Classification +3

Towards Effective Low-bitwidth Convolutional Neural Networks

2 code implementations CVPR 2018 Bohan Zhuang, Chunhua Shen, Mingkui Tan, Lingqiao Liu, Ian Reid

This paper tackles the problem of training a deep convolutional neural network with both low-precision weights and low-bitwidth activations.

Quantization

Super Vision Transformer

1 code implementation23 May 2022 Mingbao Lin, Mengzhao Chen, Yuxin Zhang, Chunhua Shen, Rongrong Ji, Liujuan Cao

Experimental results on ImageNet demonstrate that our SuperViT can considerably reduce the computational costs of ViT models with even performance increase.

Mining Mid-level Visual Patterns with Deep CNN Activations

1 code implementation21 Jun 2015 Yao Li, Lingqiao Liu, Chunhua Shen, Anton Van Den Hengel

The purpose of mid-level visual element discovery is to find clusters of image patches that are both representative and discriminative.

Learning Affinity-Aware Upsampling for Deep Image Matting

1 code implementation CVPR 2021 Yutong Dai, Hao Lu, Chunhua Shen

By looking at existing upsampling operators from a unified mathematical perspective, we generalize them into a second-order form and introduce Affinity-Aware Upsampling (A2U) where upsampling kernels are generated using a light-weight lowrank bilinear model and are conditioned on second-order features.

Image Matting Image Reconstruction

Reading Car License Plates Using Deep Convolutional Neural Networks and LSTMs

1 code implementation21 Jan 2016 Hui Li, Chunhua Shen

Inspired by the success of deep neural networks (DNNs) in various vision applications, here we leverage DNNs to learn high-level features in a cascade framework, which lead to improved performance on both detection and recognition.

License Plate Detection Segmentation

Learning Deep Gradient Descent Optimization for Image Deconvolution

1 code implementation10 Apr 2018 Dong Gong, Zhen Zhang, Qinfeng Shi, Anton Van Den Hengel, Chunhua Shen, Yanning Zhang

Extensive experiments on synthetic benchmarks and challenging real-world images demonstrate that the proposed deep optimization method is effective and robust to produce favorable results as well as practical for real-world image deblurring applications.

Blind Image Deblurring Image Deblurring +1

Training Protocol Matters: Towards Accurate Scene Text Recognition via Training Protocol Searching

2 code implementations13 Mar 2022 Xiaojie Chu, Yongtao Wang, Chunhua Shen, Jingdong Chen, Wei Chu

The development of scene text recognition (STR) in the era of deep learning has been mainly focused on novel architectures of STR models.

Scene Text Recognition

Viral Pneumonia Screening on Chest X-ray Images Using Confidence-Aware Anomaly Detection

1 code implementation27 Mar 2020 Jianpeng Zhang, Yutong Xie, Guansong Pang, Zhibin Liao, Johan Verjans, Wenxin Li, Zongji Sun, Jian He, Yi Li, Chunhua Shen, Yong Xia

In this paper, we formulate the task of differentiating viral pneumonia from non-viral pneumonia and healthy controls into an one-class classification-based anomaly detection problem, and thus propose the confidence-aware anomaly detection (CAAD) model, which consists of a shared feature extractor, an anomaly detection module, and a confidence prediction module.

Binary Classification Classification +2

DENSE: Data-Free One-Shot Federated Learning

1 code implementation23 Dec 2021 Jie Zhang, Chen Chen, Bo Li, Lingjuan Lyu, Shuang Wu, Shouhong Ding, Chunhua Shen, Chao Wu

One-shot Federated Learning (FL) has recently emerged as a promising approach, which allows the central server to learn a model in a single communication round.

Federated Learning

Fully Quantized Image Super-Resolution Networks

1 code implementation29 Nov 2020 Hu Wang, Peng Chen, Bohan Zhuang, Chunhua Shen

With the rising popularity of intelligent mobile devices, it is of great practical significance to develop accurate, realtime and energy-efficient image Super-Resolution (SR) inference methods.

Image Super-Resolution Quantization

Dynamic Neural Representational Decoders for High-Resolution Semantic Segmentation

1 code implementation NeurIPS 2021 BoWen Zhang, Yifan Liu, Zhi Tian, Chunhua Shen

This neural representation enables our decoder to leverage the smoothness prior in the semantic label space, and thus makes our decoder more efficient.

Segmentation Semantic Segmentation +1

Multi-dataset Training of Transformers for Robust Action Recognition

1 code implementation26 Sep 2022 Junwei Liang, Enwei Zhang, Jun Zhang, Chunhua Shen

We study the task of robust feature representations, aiming to generalize well on multiple datasets for action recognition.

Action Recognition Temporal Action Localization

Hyperspectral Classification Based on Lightweight 3-D-CNN With Transfer Learning

2 code implementations7 Dec 2020 Haokui Zhang, Ying Li, Yenan Jiang, Peng Wang, Qiang Shen, Chunhua Shen

In contrast to previous approaches, we do not impose restrictions over the source data sets, in which they do not have to be collected by the same sensors as the target data sets.

Classification General Classification +1

Bootstrapping the Performance of Webly Supervised Semantic Segmentation

1 code implementation CVPR 2018 Tong Shen, Guosheng Lin, Chunhua Shen, Ian Reid

In this work, we focus on weak supervision, developing a method for training a high-quality pixel-level classifier for semantic segmentation, using only image-level class labels as the provided ground-truth.

Segmentation Transfer Learning +2

Memory-Efficient Hierarchical Neural Architecture Search for Image Restoration

1 code implementation24 Dec 2020 Haokui Zhang, Ying Li, Hao Chen, Chengrong Gong, Zongwen Bai, Chunhua Shen

For the inner search space, we propose a layer-wise architecture sharing strategy (LWAS), resulting in more flexible architectures and better performance.

Image Denoising Image Restoration +2

Learning Conditional Attributes for Compositional Zero-Shot Learning

1 code implementation CVPR 2023 Qingsheng Wang, Lingqiao Liu, Chenchen Jing, Hao Chen, Guoqiang Liang, Peng Wang, Chunhua Shen

Compositional Zero-Shot Learning (CZSL) aims to train models to recognize novel compositional concepts based on learned concepts such as attribute-object combinations.

Attribute Compositional Zero-Shot Learning

Robust Data Hiding Using Inverse Gradient Attention

1 code implementation21 Nov 2020 Honglei Zhang, Hu Wang, Yuanzhouhan Cao, Chunhua Shen, Yidong Li

In deep data hiding models, to maximize the encoding capacity, each pixel of the cover image ought to be treated differently since they have different sensitivities w. r. t.

Learning Deep Representations Using Convolutional Auto-encoders with Symmetric Skip Connections

1 code implementation28 Nov 2016 Jianfeng Dong, Xiao-Jiao Mao, Chunhua Shen, Yu-Bin Yang

In this paper, we investigate convolutional denoising auto-encoders to show that unsupervised pre-training can still improve the performance of high-level image related tasks such as image classification and semantic segmentation.

Denoising General Classification +4

Deep CNNs With Spatially Weighted Pooling for Fine-Grained Car Recognition

1 code implementation4 Apr 2017 Qichang Hu, Huibing Wang, Teng Li, Chunhua Shen

By applying our method to several fine-grained car recognition data sets, we demonstrate that the proposed method can achieve better performance than recent approaches in the literature.

Fine-Grained Image Classification

Learning to predict crisp boundaries

1 code implementation ECCV 2018 Ruoxi Deng, Chunhua Shen, Shengjun Liu, Huibing Wang, Xinru Liu

Recent methods for boundary or edge detection built on Deep Convolutional Neural Networks (CNNs) typically suffer from the issue of predicted edges being thick and need post-processing to obtain crisp boundaries.

Boundary Detection Edge Detection

TSGB: Target-Selective Gradient Backprop for Probing CNN Visual Saliency

1 code implementation11 Oct 2021 Lin Cheng, Pengfei Fang, Yanjie Liang, Liao Zhang, Chunhua Shen, Hanzi Wang

Inspired by those observations, we propose a novel visual saliency method, termed Target-Selective Gradient Backprop (TSGB), which leverages rectification operations to effectively emphasize target classes and further efficiently propagate the saliency to the image space, thereby generating target-selective and fine-grained saliency maps.

FoPro: Few-Shot Guided Robust Webly-Supervised Prototypical Learning

1 code implementation1 Dec 2022 Yulei Qin, Xingyu Chen, Chao Chen, Yunhang Shen, Bo Ren, Yun Gu, Jie Yang, Chunhua Shen

Most existing methods focus on learning noise-robust models from web images while neglecting the performance drop caused by the differences between web domain and real-world domain.

Contrastive Learning Representation Learning

DA-STC: Domain Adaptive Video Semantic Segmentation via Spatio-Temporal Consistency

1 code implementation22 Nov 2023 Zhe Zhang, Gaochang Wu, Jing Zhang, Chunhua Shen, DaCheng Tao, Tianyou Chai

To solve the challenge, we propose a novel DA-STC method for domain adaptive video semantic segmentation, which incorporates a bidirectional multi-level spatio-temporal fusion module and a category-aware spatio-temporal feature alignment module to facilitate consistent learning for domain-invariant features.

Representation Learning Segmentation +2

Regularizing Proxies with Multi-Adversarial Training for Unsupervised Domain-Adaptive Semantic Segmentation

1 code implementation29 Jul 2019 Tong Shen, Dong Gong, Wei zhang, Chunhua Shen, Tao Mei

To tackle the unsupervised domain adaptation problem, we explore the possibilities to generate high-quality labels as proxy labels to supervise the training on target data.

Semantic Segmentation Unsupervised Domain Adaptation

Piecewise classifier mappings: Learning fine-grained learners for novel categories with few examples

1 code implementation11 May 2018 Xiu-Shen Wei, Peng Wang, Lingqiao Liu, Chunhua Shen, Jianxin Wu

To solve this problem, we propose an end-to-end trainable deep network which is inspired by the state-of-the-art fine-grained recognition model and is tailored for the FSFG task.

Few-Shot Learning Fine-Grained Image Recognition

Real-time End-to-End Video Text Spotter with Contrastive Representation Learning

1 code implementation18 Jul 2022 Wejia Wu, Zhuang Li, Jiahong Li, Chunhua Shen, Hong Zhou, Size Li, Zhongyuan Wang, Ping Luo

Our contributions are three-fold: 1) CoText simultaneously address the three tasks (e. g., text detection, tracking, recognition) in a real-time end-to-end trainable framework.

Contrastive Learning Representation Learning +2

Learning Depth from Single Monocular Images Using Deep Convolutional Neural Fields

1 code implementation26 Feb 2015 Fayao Liu, Chunhua Shen, Guosheng Lin, Ian Reid

Therefore, here we present a deep convolutional neural field model for estimating depths from single monocular images, aiming to jointly explore the capacity of deep CNN and continuous CRF.

Depth Estimation

The Treasure beneath Convolutional Layers: Cross-convolutional-layer Pooling for Image Classification

1 code implementation CVPR 2015 Lingqiao Liu, Chunhua Shen, Anton Van Den Hengel

This paper, however, advocates that if used appropriately convolutional layer activations can be turned into a powerful image representation which enjoys many advantages over fully-connected layer activations.

General Classification Image Classification

Improving Generative Adversarial Networks with Local Coordinate Coding

1 code implementation28 Jul 2020 Jiezhang Cao, Yong Guo, Qingyao Wu, Chunhua Shen, Junzhou Huang, Mingkui Tan

In this paper, rather than sampling from the predefined prior distribution, we propose an LCCGAN model with local coordinate coding (LCC) to improve the performance of generating data.

What value do explicit high level concepts have in vision to language problems?

1 code implementation CVPR 2016 Qi Wu, Chunhua Shen, Lingqiao Liu, Anthony Dick, Anton Van Den Hengel

Much of the recent progress in Vision-to-Language (V2L) problems has been achieved through a combination of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs).

Image Captioning Question Answering +1

Adversarial Learning with Local Coordinate Coding

no code implementations ICML 2018 Jiezhang Cao, Yong Guo, Qingyao Wu, Chunhua Shen, Junzhou Huang, Mingkui Tan

Generative adversarial networks (GANs) aim to generate realistic data from some prior distribution (e. g., Gaussian noises).

Adaptive Importance Learning for Improving Lightweight Image Super-resolution Network

no code implementations5 Jun 2018 Lei Zhang, Peng Wang, Chunhua Shen, Lingqiao Liu, Wei Wei, Yanning Zhang, Anton Van Den Hengel

In this study, we revisit this problem from an orthog- onal view, and propose a novel learning strategy to maxi- mize the pixel-wise fitting capacity of a given lightweight network architecture.

Image Super-Resolution

Adversarial Learning of Structure-Aware Fully Convolutional Networks for Landmark Localization

no code implementations1 Nov 2017 Yu Chen, Chunhua Shen, Hao Chen, Xiu-Shen Wei, Lingqiao Liu, Jian Yang

In contrast, human vision is able to predict poses by exploiting geometric constraints of landmark point inter-connectivity.

Pose Estimation

Monocular Depth Estimation with Augmented Ordinal Depth Relationships

no code implementations2 Jun 2018 Yuanzhouhan Cao, Tianqi Zhao, Ke Xian, Chunhua Shen, Zhiguo Cao, Shugong Xu

In this paper, we propose to improve the performance of metric depth estimation with relative depths collected from stereo movie videos using existing stereo matching algorithm.

Depth Prediction Monocular Depth Estimation +2

Multi-label Learning Based Deep Transfer Neural Network for Facial Attribute Classification

no code implementations3 May 2018 Ni Zhuang, Yan Yan, Si Chen, Hanzi Wang, Chunhua Shen

To address the above problem, we propose a novel deep transfer neural network method based on multi-label learning for facial attribute classification, termed FMTNet, which consists of three sub-networks: the Face detection Network (FNet), the Multi-label learning Network (MNet) and the Transfer learning Network (TNet).

Attribute Classification +6

Salient Object Detection by Lossless Feature Reflection

no code implementations19 Feb 2018 Pingping Zhang, Wei Liu, Huchuan Lu, Chunhua Shen

Inspired by the intrinsic reflection of natural images, in this paper we propose a novel feature learning framework for large-scale salient object detection.

Object object-detection +3

HyperFusion-Net: Densely Reflective Fusion for Salient Object Detection

no code implementations14 Apr 2018 Pingping Zhang, Huchuan Lu, Chunhua Shen

Salient object detection (SOD), which aims to find the most important region of interest and segment the relevant object/item in that area, is an important yet challenging vision task.

object-detection RGB Salient Object Detection +1

VITAL: VIsual Tracking via Adversarial Learning

no code implementations CVPR 2018 Yibing Song, Chao Ma, Xiaohe Wu, Lijun Gong, Linchao Bao, WangMeng Zuo, Chunhua Shen, Rynson Lau, Ming-Hsuan Yang

To augment positive samples, we use a generative network to randomly generate masks, which are applied to adaptively dropout input features to capture a variety of appearance changes.

General Classification Visual Tracking

Visual Question Answering with Memory-Augmented Networks

no code implementations CVPR 2018 Chao Ma, Chunhua Shen, Anthony Dick, Qi Wu, Peng Wang, Anton Van Den Hengel, Ian Reid

In this paper, we exploit a memory-augmented neural network to predict accurate answers to visual questions, even when those answers occur rarely in the training set.

Question Answering Visual Question Answering

Non-rigid Object Tracking via Deep Multi-scale Spatial-temporal Discriminative Saliency Maps

no code implementations22 Feb 2018 Pingping Zhang, Wei Liu, Dong Wang, Yinjie Lei, Hongyu Wang, Chunhua Shen, Huchuan Lu

Extensive experiments demonstrate that the proposed algorithm achieves competitive performance in both saliency detection and visual tracking, especially outperforming other related trackers on the non-rigid object tracking datasets.

Object Object Tracking +2

Automatic Image Cropping for Visual Aesthetic Enhancement Using Deep Neural Networks and Cascaded Regression

no code implementations25 Dec 2017 Guanjun Guo, Hanzi Wang, Chunhua Shen, Yan Yan, Hong-Yuan Mark Liao

The deep CNN model is then designed to extract features from several image cropping datasets, upon which the cropping bounding boxes are predicted by the proposed CCR method.

Image Cropping regression

Real-time Semantic Image Segmentation via Spatial Sparsity

no code implementations1 Dec 2017 Zifeng Wu, Chunhua Shen, Anton Van Den Hengel

We propose an approach to semantic (image) segmentation that reduces the computational costs by a factor of 25 with limited impact on the quality of results.

Image Segmentation Segmentation +1

Asking the Difficult Questions: Goal-Oriented Visual Question Generation via Intermediate Rewards

no code implementations21 Nov 2017 Jun-Jie Zhang, Qi Wu, Chunhua Shen, Jian Zhang, Jianfeng Lu, Anton Van Den Hengel

Despite significant progress in a variety of vision-and-language problems, developing a method capable of asking intelligent, goal-oriented questions about images is proven to be an inscrutable challenge.

Informativeness Question Generation +2

Kill Two Birds with One Stone: Weakly-Supervised Neural Network for Image Annotation and Tag Refinement

no code implementations19 Nov 2017 Jun-Jie Zhang, Qi Wu, Jian Zhang, Chunhua Shen, Jianfeng Lu

These comments can be a description of the image, or some objects, attributes, scenes in it, which are normally used as the user-provided tags.

Retrieval TAG

Parallel Attention: A Unified Framework for Visual Object Discovery through Dialogs and Queries

no code implementations CVPR 2018 Bohan Zhuang, Qi Wu, Chunhua Shen, Ian Reid, Anton Van Den Hengel

To this end we propose a unified framework, the ParalleL AttentioN (PLAN) network, to discover the object in an image that is being referred to in variable length natural expression descriptions, from short phrases query to long multi-round dialogs.

Object Object Discovery +2

Adversarial Generation of Training Examples: Applications to Moving Vehicle License Plate Recognition

no code implementations11 Jul 2017 Xinlong Wang, Zhipeng Man, Mingyu You, Chunhua Shen

Our experimental results on a few data sets demonstrate the effectiveness of using GAN images: an improvement of 7. 5% over a strong baseline with moderate-sized real data being available.

Image Generation License Plate Recognition

Towards End-to-End Car License Plates Detection and Recognition with Deep Neural Networks

no code implementations26 Sep 2017 Hui Li, Peng Wang, Chunhua Shen

In contrast to existing approaches which take license plate detection and recognition as two separate tasks and settle them step by step, our method jointly solves these two tasks by a single network.

License Plate Detection

FVQA: Fact-based Visual Question Answering

no code implementations17 Jun 2016 Peng Wang, Qi Wu, Chunhua Shen, Anton Van Den Hengel, Anthony Dick

We evaluate several baseline models on the FVQA dataset, and describe a novel model which is capable of reasoning about an image on the basis of supporting facts.

Common Sense Reasoning Question Answering +1

Weakly Supervised Semantic Segmentation Based on Web Image Co-segmentation

no code implementations25 May 2017 Tong Shen, Guosheng Lin, Lingqiao Liu, Chunhua Shen, Ian Reid

Training a Fully Convolutional Network (FCN) for semantic segmentation requires a large number of masks with pixel level labelling, which involves a large amount of human labour and time for annotation.

Segmentation Weakly supervised Semantic Segmentation +1

Beyond Low Rank: A Data-Adaptive Tensor Completion Method

no code implementations3 Aug 2017 Lei Zhang, Wei Wei, Qinfeng Shi, Chunhua Shen, Anton Van Den Hengel, Yanning Zhang

The prior for the non-low-rank structure is established based on a mixture of Gaussians which is shown to be flexible enough, and powerful enough, to inform the completion process for a variety of real tensor data.

Relative Depth Order Estimation Using Multi-scale Densely Connected Convolutional Networks

no code implementations25 Jul 2017 Ruoxi Deng, Tianqi Zhao, Chunhua Shen, Shengjun Liu

We study the problem of estimating the relative depth order of point pairs in a monocular image.

Visually Aligned Word Embeddings for Improving Zero-shot Learning

no code implementations18 Jul 2017 Ruizhi Qiao, Lingqiao Liu, Chunhua Shen, Anton Van Den Hengel

To overcome this visual-semantic discrepancy, this work proposes an objective function to re-align the distributed word embeddings with visual information by learning a neural network to map it into a new representation called visually aligned word embedding (VAWE).

Semantic Similarity Semantic Textual Similarity +2

Towards End-to-end Text Spotting with Convolutional Recurrent Neural Networks

no code implementations ICCV 2017 Hui Li, Peng Wang, Chunhua Shen

In this work, we jointly address the problem of text detection and recognition in natural scene images based on convolutional recurrent neural networks.

Image Cropping Text Detection +1

TasselNet: Counting maize tassels in the wild via local counts regression network

no code implementations7 Jul 2017 Hao Lu, Zhiguo Cao, Yang Xiao, Bohan Zhuang, Chunhua Shen

To our knowledge, this is the first time that a plant-related counting problem is considered using computer vision technologies under unconstrained field-based environment.

Plant Phenotyping regression

Care about you: towards large-scale human-centric visual relationship detection

no code implementations28 May 2017 Bohan Zhuang, Qi Wu, Chunhua Shen, Ian Reid, Anton Van Den Hengel

In addressing this problem we first construct a large-scale human-centric visual relationship detection dataset (HCVRD), which provides many more types of relationship annotation (nearly 10K categories) than the previous released datasets.

Human-Object Interaction Detection Relationship Detection +1

Deep Descriptor Transforming for Image Co-Localization

no code implementations8 May 2017 Xiu-Shen Wei, Chen-Lin Zhang, Yao Li, Chen-Wei Xie, Jianxin Wu, Chunhua Shen, Zhi-Hua Zhou

Reusable model design becomes desirable with the rapid expansion of machine learning applications.

Exploring Context with Deep Structured models for Semantic Segmentation

no code implementations10 Mar 2016 Guosheng Lin, Chunhua Shen, Anton Van Den Hengel, Ian Reid

We formulate deep structured models by combining CNNs and Conditional Random Fields (CRFs) for learning the patch-patch context between image regions.

Image Segmentation Segmentation +1

Towards Context-aware Interaction Recognition

no code implementations18 Mar 2017 Bohan Zhuang, Lingqiao Liu, Chunhua Shen, Ian Reid

Recognizing how objects interact with each other is a crucial task in visual recognition.

Robust Guided Image Filtering

no code implementations28 Mar 2017 Wei Liu, Xiaogang Chen, Chunhua Shen, Jingyi Yu, Qiang Wu, Jie Yang

In this paper, we propose a general framework for Robust Guided Image Filtering (RGIF), which contains a data term and a smoothness term, to solve the two issues mentioned above.

Structured Learning of Tree Potentials in CRF for Image Segmentation

no code implementations26 Mar 2017 Fayao Liu, Guosheng Lin, Ruizhi Qiao, Chunhua Shen

In this fashion, we easily achieve nonlinear learning of potential functions on both unary and pairwise terms in CRFs.

Image Segmentation Semantic Segmentation

Multi-Label Image Classification with Regional Latent Semantic Dependencies

no code implementations4 Dec 2016 Jun-Jie Zhang, Qi Wu, Chunhua Shen, Jian Zhang, Jianfeng Lu

Recent state-of-the-art approaches to multi-label image classification exploit the label dependencies in an image, at global level, largely improving the labeling capacity.

Classification General Classification +1

Deep Learning Features at Scale for Visual Place Recognition

no code implementations18 Jan 2017 Zetao Chen, Adam Jacobson, Niko Sunderhauf, Ben Upcroft, Lingqiao Liu, Chunhua Shen, Ian Reid, Michael Milford

The success of deep learning techniques in the computer vision domain has triggered a range of initial investigations into their utility for visual place recognition, all using generic features from networks that were trained for other types of recognition tasks.

Visual Place Recognition

Compositional Model based Fisher Vector Coding for Image Classification

1 code implementation16 Jan 2016 Lingqiao Liu, Peng Wang, Chunhua Shen, Lei Wang, Anton Van Den Hengel, Chao Wang, Heng Tao Shen

To handle this limitation, in this paper we break the convention which assumes that a local feature is drawn from one of few Gaussian distributions.

Classification General Classification +1

Cross-convolutional-layer Pooling for Image Recognition

no code implementations4 Oct 2015 Lingqiao Liu, Chunhua Shen, Anton Van Den Hengel

Most of these studies adopt activations from a single DCNN layer, usually the fully-connected layer, as the image representation.

General Classification Image Classification

The VQA-Machine: Learning How to Use Existing Vision Algorithms to Answer New Questions

no code implementations CVPR 2017 Peng Wang, Qi Wu, Chunhua Shen, Anton Van Den Hengel

To train a method to perform even one of these operations accurately from {image, question, answer} tuples would be challenging, but to aim to achieve them all with a limited set of such training data seems ambitious at best.

BIG-bench Machine Learning Question Answering +1

From Motion Blur to Motion Flow: a Deep Learning Solution for Removing Heterogeneous Motion Blur

no code implementations CVPR 2017 Dong Gong, Jie Yang, Lingqiao Liu, Yanning Zhang, Ian Reid, Chunhua Shen, Anton Van Den Hengel, Qinfeng Shi

The critical observation underpinning our approach is thus that learning the motion flow instead allows the model to focus on the cause of the blur, irrespective of the image content.

Exploiting Depth from Single Monocular Images for Object Detection and Semantic Segmentation

no code implementations6 Oct 2016 Yuanzhouhan Cao, Chunhua Shen, Heng Tao Shen

Augmenting RGB data with measured depth has been shown to improve the performance of a range of tasks in computer vision including object detection and semantic segmentation.

Depth Estimation Object +4

Fast Training of Triplet-based Deep Binary Embedding Networks

no code implementations CVPR 2016 Bohan Zhuang, Guosheng Lin, Chunhua Shen, Ian Reid

To solve the first stage, we design a large-scale high-order binary codes inference algorithm to reduce the high-order objective to a standard binary quadratic problem such that graph cuts can be used to efficiently infer the binary code which serve as the label of each training datum.

Image Retrieval Multi-Label Classification +1

Image Co-localization by Mimicking a Good Detector's Confidence Score Distribution

no code implementations15 Mar 2016 Yao Li, Linqiao Liu, Chunhua Shen, Anton Van Den Hengel

More specifically, we observe that given a set of object proposals extracted from an image that contains the object of interest, an accurate strongly supervised object detector should give high scores to only a small minority of proposals, and low scores to most of them.

Object

Where to Focus: Query Adaptive Matching for Instance Retrieval Using Convolutional Feature Maps

no code implementations22 Jun 2016 Jiewei Cao, Lingqiao Liu, Peng Wang, Zi Huang, Chunhua Shen, Heng Tao Shen

Instance retrieval requires one to search for images that contain a particular object within a large corpus.

Retrieval

Cannot find the paper you are looking for? You can Submit a new open access paper.