Search Results for author: Chunhua Shen

Found 369 papers, 159 papers with code

Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition

7 code implementations • 2 Nov 2018 • Hui Li, Peng Wang, Chunhua Shen, Guyu Zhang

Recognizing irregular text in natural scene images is challenging due to the large variance in text appearance, such as curvature, orientation and distortion.

Ranked #26 on Scene Text Recognition on ICDAR2015

Irregular Text Recognition Scene Text Recognition

38,490

Paper
Code

Twins: Revisiting the Design of Spatial Attention in Vision Transformers

8 code implementations • NeurIPS 2021 • Xiangxiang Chu, Zhi Tian, Yuqing Wang, Bo Zhang, Haibing Ren, Xiaolin Wei, Huaxia Xia, Chunhua Shen

Very recently, a variety of vision transformer architectures for dense prediction tasks have been proposed and they show that the design of spatial attention is critical to their success in these tasks.

Ranked #48 on Semantic Segmentation on ADE20K val

Image Classification Semantic Segmentation

29,758

Paper
Code

FCOS: Fully Convolutional One-Stage Object Detection

86 code implementations • ICCV 2019 • Zhi Tian, Chunhua Shen, Hao Chen, Tong He

By eliminating the predefined set of anchor boxes, FCOS completely avoids the complicated computation related to anchor boxes such as calculating overlapping during training.

Ranked #4 on Pedestrian Detection on TJU-Ped-campus

Object Object Detection +2

27,790

Paper
Code

NAS-FCOS: Fast Neural Architecture Search for Object Detection

3 code implementations • CVPR 2020 • Ning Wang, Yang Gao, Hao Chen, Peng Wang, Zhi Tian, Chunhua Shen, Yanning Zhang

The success of deep neural networks relies on significant architecture engineering.

Ranked #113 on Object Detection on COCO test-dev

Neural Architecture Search Object +2

27,790

Paper
Code

SOLO: Segmenting Objects by Locations

24 code implementations • ECCV 2020 • Xinlong Wang, Tao Kong, Chunhua Shen, Yuning Jiang, Lei LI

We present a new, embarrassingly simple approach to instance segmentation in images.

Ranked #67 on Instance Segmentation on COCO test-dev

Clustering General Classification +3

27,790

Paper
Code

SOLOv2: Dynamic and Fast Instance Segmentation

18 code implementations • NeurIPS 2020 • Xinlong Wang, Rufeng Zhang, Tao Kong, Lei LI, Chunhua Shen

Importantly, we take one step further by dynamically learning the mask head of the object segmenter such that the mask head is conditioned on the location.

Ranked #10 on Real-time Instance Segmentation on MSCOCO

object-detection Object Detection +4

27,790

Paper
Code

BiSeNet V2: Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation

7 code implementations • 5 Apr 2020 • Changqian Yu, Changxin Gao, Jingbo Wang, Gang Yu, Chunhua Shen, Nong Sang

We propose to treat these spatial details and categorical semantics separately to achieve high accuracy and high efficiency for realtime semantic segmentation.

Ranked #1 on Real-Time Semantic Segmentation on COCO-Stuff

Real-Time Semantic Segmentation Segmentation

8,256

Paper
Code

Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network

6 code implementations • ICCV 2019 • Wenhai Wang, Enze Xie, Xiaoge Song, Yuhang Zang, Wenjia Wang, Tong Lu, Gang Yu, Chunhua Shen

Recently, some methods have been proposed to tackle arbitrary-shaped text detection, but they rarely take the speed of the entire pipeline into consideration, which may fall short in practical applications. In this paper, we propose an efficient and accurate arbitrary-shaped text detector, termed Pixel Aggregation Network (PAN), which is equipped with a low computational-cost segmentation head and a learnable post-processing.

Ranked #8 on Scene Text Detection on SCUT-CTW1500

Scene Text Detection Segmentation +1

4,075

Paper
Code

DirectPose: Direct End-to-End Multi-Person Pose Estimation

8 code implementations • 18 Nov 2019 • Zhi Tian, Hao Chen, Chunhua Shen

We propose the first direct end-to-end multi-person pose estimation framework, termed DirectPose.

Ranked #13 on Keypoint Detection on COCO test-dev

Multi-Person Pose Estimation

3,324

Paper
Code

BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation

9 code implementations • CVPR 2020 • Hao Chen, Kunyang Sun, Zhi Tian, Chunhua Shen, Yongming Huang, Youliang Yan

The proposed BlendMask can effectively predict dense per-pixel position-sensitive instance features with very few channels, and learn attention maps for each instance with merely one convolution layer, thus being fast in inference.

Ranked #13 on Real-time Instance Segmentation on MSCOCO

Real-time Instance Segmentation Segmentation +1

3,324

Paper
Code

ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network

15 code implementations • CVPR 2020 • Yuliang Liu, Hao Chen, Chunhua Shen, Tong He, Lianwen Jin, Liangwei Wang

Our contributions are three-fold: 1) For the first time, we adaptively fit arbitrarily-shaped text by a parameterized Bezier curve.

Ranked #9 on Text Spotting on Inverse-Text

Scene Text Detection Text Detection +1

3,324

Paper
Code

Conditional Convolutions for Instance Segmentation

7 code implementations • ECCV 2020 • Zhi Tian, Chunhua Shen, Hao Chen

We propose a simple yet effective instance segmentation framework, termed CondInst (conditional convolutions for instance segmentation).

Instance Segmentation Segmentation +1

3,324

Paper
Code

Mask Encoding for Single Shot Instance Segmentation

7 code implementations • CVPR 2020 • Rufeng Zhang, Zhi Tian, Chunhua Shen, Mingyu You, Youliang Yan

To date, instance segmentation is dominated by twostage methods, as pioneered by Mask R-CNN.

Instance Segmentation Segmentation +1

3,324

Paper
Code

BoxInst: High-Performance Instance Segmentation with Box Annotations

2 code implementations • CVPR 2021 • Zhi Tian, Chunhua Shen, Xinlong Wang, Hao Chen

We present a high-performance method that can achieve mask-level instance segmentation with only bounding-box annotations for training.

Ranked #2 on Box-supervised Instance Segmentation on PASCAL VOC 2012 val

Box-supervised Instance Segmentation Segmentation +3

3,324

Paper
Code

FCPose: Fully Convolutional Multi-Person Pose Estimation with Dynamic Instance-Aware Convolutions

3 code implementations • CVPR 2021 • Weian Mao, Zhi Tian, Xinlong Wang, Chunhua Shen

We propose a fully convolutional multi-person pose estimation framework using dynamic instance-aware convolutions, termed FCPose.

Keypoint Estimation Multi-Person Pose Estimation

3,324

Paper
Code

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

6 code implementations • CVPR 2021 • Xinlong Wang, Rufeng Zhang, Chunhua Shen, Tao Kong, Lei LI

Compared to the baseline method MoCo-v2, our method introduces negligible computation overhead (only <1% slower), but demonstrates consistently superior performance when transferring to downstream dense prediction tasks including object detection, semantic segmentation and instance segmentation; and outperforms the state-of-the-art methods by a large margin.

Contrastive Learning Image Classification +7

3,083

Paper
Code

Images Speak in Images: A Generalist Painter for In-Context Visual Learning

1 code implementation • CVPR 2023 • Xinlong Wang, Wen Wang, Yue Cao, Chunhua Shen, Tiejun Huang

In this work, we present Painter, a generalist model which addresses these obstacles with an "image"-centric solution, that is, to redefine the output of core vision tasks as images, and specify task prompts as also images.

Ranked #6 on Personalized Segmentation on PerSeg

In-Context Learning Keypoint Detection +2

2,424

Paper
Code

SegGPT: Segmenting Everything In Context

1 code implementation • 6 Apr 2023 • Xinlong Wang, Xiaosong Zhang, Yue Cao, Wen Wang, Chunhua Shen, Tiejun Huang

We unify various segmentation tasks into a generalist in-context learning framework that accommodates different kinds of segmentation data by transforming them into the same format of images.

Ranked #1 on Few-Shot Semantic Segmentation on PASCAL-5i (5-Shot) (using extra training data)

Few-Shot Semantic Segmentation In-Context Learning +5

2,424

Paper
Code

Channel-wise Knowledge Distillation for Dense Prediction

3 code implementations • ICCV 2021 • Changyong Shu, Yifan Liu, Jianfei Gao, Zheng Yan, Chunhua Shen

Observing that in semantic segmentation, some layers' feature activations of each channel tend to encode saliency of scene categories (analogue to class activation mapping), we propose to align features channel-wise between the student and teacher networks.

Knowledge Distillation Segmentation +1

1,365

Paper
Code

Knowledge Adaptation for Efficient Semantic Segmentation

1 code implementation • CVPR 2019 • Tong He, Chunhua Shen, Zhi Tian, Dong Gong, Changming Sun, Youliang Yan

To tackle this dilemma, we propose a knowledge distillation method tailored for semantic segmentation to improve the performance of the compact FCNs with large overall stride.

Knowledge Distillation Segmentation +1

1,272

Paper
Code

Enforcing geometric constraints of virtual normal for depth prediction

3 code implementations • ICCV 2019 • Wei Yin, Yifan Liu, Chunhua Shen, Youliang Yan

Monocular depth prediction plays a crucial role in understanding 3D scene geometry.

Ranked #10 on Depth Estimation on NYU-Depth V2

Depth Prediction Monocular Depth Estimation

1,028

Paper
Code

Learning to Recover 3D Scene Shape from a Single Image

1 code implementation • CVPR 2021 • Wei Yin, Jianming Zhang, Oliver Wang, Simon Niklaus, Long Mai, Simon Chen, Chunhua Shen

Despite significant progress in monocular depth estimation in the wild, recent state-of-the-art methods cannot be used to recover accurate 3D scene shape due to an unknown depth shift induced by shift-invariant reconstruction losses used in mixed-data depth prediction training, and possible unknown camera focal length.

Ranked #1 on Indoor Monocular Depth Estimation on DIODE (using extra training data)

3D Scene Reconstruction Depth Prediction +3

1,028

Paper
Code

Virtual Normal: Enforcing Geometric Constraints for Accurate and Robust Depth Prediction

3 code implementations • 7 Mar 2021 • Wei Yin, Yifan Liu, Chunhua Shen

In this work, we show the importance of the high-order 3D geometric constraints for depth prediction.

Depth Prediction Monocular Depth Estimation

1,028

Paper
Code

Towards Accurate Reconstruction of 3D Scene Shape from A Single Monocular Image

1 code implementation • 28 Aug 2022 • Wei Yin, Jianming Zhang, Oliver Wang, Simon Niklaus, Simon Chen, Yifan Liu, Chunhua Shen

To do so, we propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image, and then exploits 3D point cloud data to predict the depth shift and the camera's focal length that allow us to recover 3D scene shapes.

Depth Estimation Depth Prediction

1,028

Paper
Code

PolarMask: Single Shot Instance Segmentation with Polar Representation

2 code implementations • CVPR 2020 • Enze Xie, Peize Sun, Xiaoge Song, Wenhai Wang, Ding Liang, Chunhua Shen, Ping Luo

In this paper, we introduce an anchor-box free and single shot instance segmentation method, which is conceptually simple, fully convolutional and can be used as a mask prediction module for instance segmentation, by easily embedding it into most off-the-shelf detection methods.

Ranked #100 on Instance Segmentation on COCO test-dev

Distance regression Instance Segmentation +4

869

Paper
Code

Image Restoration Using Convolutional Auto-encoders with Symmetric Skip Connections

17 code implementations • 29 Jun 2016 • Xiao-Jiao Mao, Chunhua Shen, Yu-Bin Yang

In this work, we propose a very deep fully convolutional auto-encoder network for image restoration, which is a encoding-decoding framework with symmetric convolutional-deconvolutional layers.

Ranked #2 on Grayscale Image Denoising on BSD200 sigma10

Image Denoising JPEG Artifact Correction +1

826

Paper
Code

MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices

1 code implementation • 28 Dec 2023 • Xiangxiang Chu, Limeng Qiao, Xinyang Lin, Shuang Xu, Yang Yang, Yiming Hu, Fei Wei, Xinyu Zhang, Bo Zhang, Xiaolin Wei, Chunhua Shen

We present MobileVLM, a competent multimodal vision language model (MMVLM) targeted to run on mobile devices.

AutoML Language Modelling

771

Paper
Code

MobileVLM V2: Faster and Stronger Baseline for Vision Language Model

1 code implementation • 6 Feb 2024 • Xiangxiang Chu, Limeng Qiao, Xinyu Zhang, Shuang Xu, Fei Wei, Yang Yang, Xiaofei Sun, Yiming Hu, Xinyang Lin, Bo Zhang, Chunhua Shen

We introduce MobileVLM V2, a family of significantly improved vision language models upon MobileVLM, which proves that a delicate orchestration of novel architectural design, an improved training scheme tailored for mobile VLMs, and rich high-quality dataset curation can substantially benefit VLMs' performance.

AutoML Language Modelling

771

Paper
Code

End-to-End Video Instance Segmentation with Transformers

2 code implementations • CVPR 2021 • Yuqing Wang, Zhaoliang Xu, Xinlong Wang, Chunhua Shen, Baoshan Cheng, Hao Shen, Huaxia Xia

Here, we propose a new video instance segmentation framework built upon Transformers, termed VisTR, which views the VIS task as a direct end-to-end parallel sequence decoding/prediction problem.

Ranked #33 on Video Instance Segmentation on YouTube-VIS validation

Instance Segmentation Segmentation +3

734

Paper
Code

Light-Weight RefineNet for Real-Time Semantic Segmentation

2 code implementations • 8 Oct 2018 • Vladimir Nekrasov, Chunhua Shen, Ian Reid

We consider an important task of effective and efficient semantic image segmentation.

Ranked #2 on Real-Time Semantic Segmentation on NYU Depth v2

Image Segmentation Real-Time Semantic Segmentation +1

726

Paper
Code

Unsupervised Scale-consistent Depth and Ego-motion Learning from Monocular Video

2 code implementations • NeurIPS 2019 • Jia-Wang Bian, Zhichao Li, Naiyan Wang, Huangying Zhan, Chunhua Shen, Ming-Ming Cheng, Ian Reid

To the best of our knowledge, this is the first work to show that deep networks trained using unlabelled monocular videos can predict globally scale-consistent camera trajectories over a long video sequence.

Ranked #61 on Monocular Depth Estimation on KITTI Eigen split

Depth And Camera Motion Monocular Depth Estimation +1

714

Paper
Code

Unsupervised Scale-consistent Depth Learning from Video

2 code implementations • 25 May 2021 • Jia-Wang Bian, Huangying Zhan, Naiyan Wang, Zhichao Li, Le Zhang, Chunhua Shen, Ming-Ming Cheng, Ian Reid

We propose a monocular depth estimator SC-Depth, which requires only unlabelled videos for training and enables the scale-consistent prediction at inference time.

Ranked #6 on Monocular Depth Estimation on NYU-Depth V2 self-supervised

Monocular Depth Estimation Monocular Visual Odometry +1

714

Paper
Code

Structured Knowledge Distillation for Dense Prediction

1 code implementation • CVPR 2019 • Yifan Liu, Changyong Shun, Jingdong Wang, Chunhua Shen

Here we propose to distill structured knowledge from large networks to compact networks, taking into account the fact that dense prediction is a structured prediction problem.

Depth Estimation General Classification +7

687

Paper
Code

Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image

1 code implementation • ICCV 2023 • Wei Yin, Chi Zhang, Hao Chen, Zhipeng Cai, Gang Yu, Kaixuan Wang, Xiaozhi Chen, Chunhua Shen

State-of-the-art (SOTA) monocular metric depth estimation methods can only handle a single camera model and are unable to perform mixed-data training due to the metric ambiguity.

Ranked #19 on Monocular Depth Estimation on NYU-Depth V2 (using extra training data)

Image Reconstruction Monocular Depth Estimation +1

604

Paper
Code

Metric3D v2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation

1 code implementation • Under review for Transaction 2024 • Mu Hu, Wei Yin, Chi Zhang, Zhipeng Cai, Xiaoxiao Long, Hao Chen, Kaixuan Wang, Gang Yu, Chunhua Shen, Shaojie Shen

Our method benefits various applications including in-the-wild metrology monocular-SLAM, and 3D reconstruction, which highlight the versatility of Metric3D v2 models as geometric foundation models.

Ranked #1 on Monocular Depth Estimation on NYU-Depth V2 (using extra training data)

3D Reconstruction Monocular Depth Estimation +3

604

Paper
Code

RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation

13 code implementations • CVPR 2017 • Guosheng Lin, Anton Milan, Chunhua Shen, Ian Reid

Recently, very deep convolutional neural networks (CNNs) have shown outstanding performance in object recognition and have also been the first choice for dense classification problems such as semantic segmentation.

Ranked #13 on Semantic Segmentation on Trans10K

3D Absolute Human Pose Estimation Semantic Segmentation +1

585

Paper
Code

DeepEMD: Differentiable Earth Mover's Distance for Few-Shot Learning

5 code implementations • 15 Mar 2020 • Chi Zhang, Yujun Cai, Guosheng Lin, Chunhua Shen

We employ the Earth Mover's Distance (EMD) as a metric to compute a structural distance between dense image representations to determine image relevance.

Classification Few-Shot Image Classification +4

561

Paper
Code

StyleAvatar3D: Leveraging Image-Text Diffusion Models for High-Fidelity 3D Avatar Generation

2 code implementations • 30 May 2023 • Chi Zhang, YiWen Chen, Yijun Fu, Zhenglin Zhou, Gang Yu, Billzb Wang, Bin Fu, Tao Chen, Guosheng Lin, Chunhua Shen

The recent advancements in image-text diffusion models have stimulated research interest in large-scale 3D generative models.

3D Generation Attribute +1

499

Paper
Code

Visual Question Answering: A Survey of Methods and Datasets

1 code implementation • 20 Jul 2016 • Qi Wu, Damien Teney, Peng Wang, Chunhua Shen, Anthony Dick, Anton Van Den Hengel

Visual Question Answering (VQA) is a challenging task that has received increasing attention from both the computer vision and the natural language processing communities.

General Knowledge Visual Question Answering

436

Paper
Code

PAN++: Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped Text

1 code implementation • 2 May 2021 • Wenhai Wang, Enze Xie, Xiang Li, Xuebo Liu, Ding Liang, Zhibo Yang, Tong Lu, Chunhua Shen

By systematically comparing with existing scene text representations, we show that our kernel representation can not only describe arbitrarily-shaped text but also well distinguish adjacent text.

Scene Text Detection Text Detection +1

433

Paper
Code

Scene Text Image Super-Resolution in the Wild

4 code implementations • ECCV 2020 • Wenjia Wang, Enze Xie, Xuebo Liu, Wenhai Wang, Ding Liang, Chunhua Shen, Xiang Bai

For example, it outperforms LapSRN by over 5% and 8%on the recognition accuracy of ASTER and CRNN.

Image Super-Resolution

415

Paper
Code

Auto-Rectify Network for Unsupervised Indoor Depth Estimation

1 code implementation • 4 Jun 2020 • Jia-Wang Bian, Huangying Zhan, Naiyan Wang, Tat-Jun Chin, Chunhua Shen, Ian Reid

However, excellent results have mostly been obtained in street-scene driving scenarios, and such methods often fail in other settings, particularly indoor videos taken by handheld devices.

Ranked #62 on Monocular Depth Estimation on NYU-Depth V2

Monocular Depth Estimation Self-Supervised Learning +1

394

Paper
Code

SC-DepthV3: Robust Self-supervised Monocular Depth Estimation for Dynamic Scenes

2 code implementations • 7 Nov 2022 • Libo Sun, Jia-Wang Bian, Huangying Zhan, Wei Yin, Ian Reid, Chunhua Shen

Self-supervised monocular depth estimation has shown impressive results in static scenes.

Indoor Monocular Depth Estimation Monocular Depth Estimation +1

394

Paper
Code

Indices Matter: Learning to Index for Deep Image Matting

1 code implementation • ICCV 2019 • Hao Lu, Yutong Dai, Chunhua Shen, Songcen Xu

We show that existing upsampling operators can be unified with the notion of the index function.

Ranked #4 on Semantic Image Matting on Semantic Image Matting Dataset

Semantic Image Matting

382

Paper
Code

Index Network

6 code implementations • 11 Aug 2019 • Hao Lu, Yutong Dai, Chunhua Shen, Songcen Xu

By viewing the indices as a function of the feature map, we introduce the concept of "learning to index", and present a novel index-guided encoder-decoder framework where indices are self-learned adaptively from data and are used to guide the downsampling and upsampling stages, without extra training supervision.

Ranked #2 on Grayscale Image Denoising on Set12 sigma30

Grayscale Image Denoising Image Denoising +3

382

Paper
Code

TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation

3 code implementations • CVPR 2022 • Wenqiang Zhang, Zilong Huang, Guozhong Luo, Tao Chen, Xinggang Wang, Wenyu Liu, Gang Yu, Chunhua Shen

Although vision transformers (ViTs) have achieved great success in computer vision, the heavy computational cost hampers their applications to dense prediction tasks such as semantic segmentation on mobile devices.

Segmentation Semantic Segmentation

373

Paper
Code

Matcher: Segment Anything with One Shot Using All-Purpose Feature Matching

1 code implementation • 22 May 2023 • Yang Liu, Muzhi Zhu, Hengtao Li, Hao Chen, Xinlong Wang, Chunhua Shen

In this work, we present Matcher, a novel perception paradigm that utilizes off-the-shelf vision foundation models to address various perception tasks.

Segmentation Semantic Segmentation

358

Paper
Code

Wider or Deeper: Revisiting the ResNet Model for Visual Recognition

3 code implementations • 30 Nov 2016 • Zifeng Wu, Chunhua Shen, Anton Van Den Hengel

As a result, we are able to derive a new, shallower, architecture of residual networks which significantly outperforms much deeper models such as ResNet-200 on the ImageNet classification dataset.

Ranked #11 on Semantic Segmentation on PASCAL VOC 2012 test

Semantic Segmentation

336

Paper
Code

Fast Neural Architecture Search of Compact Semantic Segmentation Models via Auxiliary Cells

4 code implementations • CVPR 2019 • Vladimir Nekrasov, Hao Chen, Chunhua Shen, Ian Reid

While most results in this domain have been achieved on image classification and language modelling problems, here we concentrate on dense per-pixel tasks, in particular, semantic image segmentation using fully convolutional networks.

Ranked #13 on Semantic Segmentation on PASCAL VOC 2012 val

Depth Prediction Image Classification +8

334

Paper
Code

An end-to-end TextSpotter with Explicit Alignment and Attention

2 code implementations • CVPR 2018 • Tong He, Zhi Tian, Weilin Huang, Chunhua Shen, Yu Qiao, Changming Sun

This allows the two tasks to work collaboratively by shar- ing convolutional features, which is critical to identify challenging text instances.

Text Detection

323

Paper
Code

Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models

1 code implementation • 30 Mar 2023 • Wen Wang, Yan Jiang, Kangyang Xie, Zide Liu, Hao Chen, Yue Cao, Xinlong Wang, Chunhua Shen

Our vid2vid-zero leverages off-the-shelf image diffusion models, and doesn't require training on any video.

Image Generation Video Alignment +1

319

Paper
Code

FreeSOLO: Learning to Segment Objects without Annotations

1 code implementation • CVPR 2022 • Xinlong Wang, Zhiding Yu, Shalini De Mello, Jan Kautz, Anima Anandkumar, Chunhua Shen, Jose M. Alvarez

FreeSOLO further demonstrates superiority as a strong pre-training method, outperforming state-of-the-art self-supervised pre-training methods by +9. 8% AP when fine-tuning instance segmentation with only 5% COCO masks.

Instance Segmentation object-detection +4

309

Paper
Code

PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm

1 code implementation • 12 Oct 2023 • Haoyi Zhu, Honghui Yang, Xiaoyang Wu, Di Huang, Sha Zhang, Xianglong He, Hengshuang Zhao, Chunhua Shen, Yu Qiao, Tong He, Wanli Ouyang

In this paper, we introduce a novel universal 3D pre-training framework designed to facilitate the acquisition of efficient 3D representation, thereby establishing a pathway to 3D foundational models.

Ranked #1 on 3D Semantic Segmentation on ScanNet++ (using extra training data)

3D Object Detection 3D Reconstruction +5

298

Paper
Code

Efficient Semantic Video Segmentation with Per-frame Inference

1 code implementation • ECCV 2020 • Yifan Liu, Chunhua Shen, Changqian Yu, Jingdong Wang

For semantic segmentation, most existing real-time deep models trained with each frame independently may produce inconsistent results for a video sequence.

Ranked #2 on Video Semantic Segmentation on CamVid

Knowledge Distillation Optical Flow Estimation +4

293

Paper
Code

Deep Weakly-supervised Anomaly Detection

3 code implementations • 30 Oct 2019 • Guansong Pang, Chunhua Shen, Huidong Jin, Anton Van Den Hengel

To detect both seen and unseen anomalies, we introduce a novel deep weakly-supervised approach, namely Pairwise Relation prediction Network (PReNet), that learns pairwise relation features and anomaly scores by predicting the relation of any two randomly sampled training instances, in which the pairwise relation can be anomaly-anomaly, anomaly-unlabeled, or unlabeled-unlabeled.

Relation Semi-supervised Anomaly Detection +3

282

Paper
Code

Deep Anomaly Detection with Deviation Networks

6 code implementations • 19 Nov 2019 • Guansong Pang, Chunhua Shen, Anton Van Den Hengel

Instead of representation learning, our method fulfills an end-to-end learning of anomaly scores by a neural deviation learning, in which we leverage a few (e. g., multiple to dozens) labeled anomalies and a prior probability to enforce statistically significant deviations of the anomaly scores of anomalies from that of normal data objects in the upper tail.

Ranked #1 on Network Intrusion Detection on NB15-Backdoor

Anomaly Detection Cyber Attack Detection +3

282

Paper
Code

Unsupervised Representation Learning by Predicting Random Distances

2 code implementations • 22 Dec 2019 • Hu Wang, Guansong Pang, Chunhua Shen, Congbo Ma

To enable unsupervised learning on those domains, in this work we propose to learn features without using any labelled data by training neural networks to predict data distances in a randomly projected space.

Anomaly Detection Clustering +1

282

Paper
Code

CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image Segmentation

1 code implementation • 4 Mar 2021 • Yutong Xie, Jianpeng Zhang, Chunhua Shen, Yong Xia

Convolutional neural networks (CNNs) have been the de facto standard for nowadays 3D medical image segmentation.

Image Segmentation Inductive Bias +4

282

Paper
Code

VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

1 code implementation • 1 Mar 2024 • Xiangxiang Chu, Jianlin Su, Bo Zhang, Chunhua Shen

Large language models are built on top of a transformer-based architecture to process textual inputs.

Image Classification Image Generation +2

282

Paper
Code

DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion Models

1 code implementation • NeurIPS 2023 • Weijia Wu, Yuzhong Zhao, Hao Chen, YuChao Gu, Rui Zhao, Yefei He, Hong Zhou, Mike Zheng Shou, Chunhua Shen

To showcase the power of the proposed approach, we generate datasets with rich dense pixel-wise labels for a wide range of downstream tasks, including semantic segmentation, instance segmentation, and depth estimation.

Depth Estimation Domain Generalization +5

281

Paper
Code

Exploring the Capacity of an Orderless Box Discretization Network for Multi-orientation Scene Text Detection

1 code implementation • 20 Dec 2019 • Yuliang Liu, Tong He, Hao Chen, Xinyu Wang, Canjie Luo, Shuaitao Zhang, Chunhua Shen, Lianwen Jin

More importantly, based on OBD, we provide a detailed analysis of the impact of a collection of refinements, which may inspire others to build state-of-the-art text detectors.

Ranked #3 on Scene Text Detection on ICDAR 2017 MLT

Scene Text Detection Text Detection

271

Paper
Code

FSRNet: End-to-End Learning Face Super-Resolution with Facial Priors

4 code implementations • CVPR 2018 • Yu Chen, Ying Tai, Xiaoming Liu, Chunhua Shen, Jian Yang

We present a novel deep end-to-end trainable Face Super-Resolution Network (FSRNet), which makes full use of the geometry prior, i. e., facial landmark heatmaps and parsing maps, to super-resolve very low-resolution (LR) face images without well-aligned requirement.

Face Alignment Generative Adversarial Network +1

255

Paper
Code

Associatively Segmenting Instances and Semantics in Point Clouds

3 code implementations • CVPR 2019 • Xinlong Wang, Shu Liu, Xiaoyong Shen, Chunhua Shen, Jiaya Jia

A 3D point cloud describes the real scene precisely and intuitively. To date how to segment diversified elements in such an informative 3D scene is rarely discussed.

Ranked #15 on 3D Instance Segmentation on S3DIS (mRec metric)

3D Instance Segmentation 3D Semantic Segmentation +1

248

Paper
Code

Context Prior for Scene Segmentation

2 code implementations • CVPR 2020 • Changqian Yu, Jingbo Wang, Changxin Gao, Gang Yu, Chunhua Shen, Nong Sang

Given an input image and corresponding ground truth, Affinity Loss constructs an ideal affinity map to supervise the learning of Context Prior.

Ranked #1 on Scene Understanding on ADE20K val

Scene Segmentation Scene Understanding +1

245

Paper
Code

Repulsion Loss: Detecting Pedestrians in a Crowd

2 code implementations • CVPR 2018 • Xinlong Wang, Tete Xiao, Yuning Jiang, Shuai Shao, Jian Sun, Chunhua Shen

In this paper, we first explore how a state-of-the-art pedestrian detector is harmed by crowd occlusion via experimentation, providing insights into the crowd occlusion problem.

Ranked #9 on Pedestrian Detection on Caltech (using extra training data)

Pedestrian Detection regression

233

Paper
Code

FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation

2 code implementations • 8 Mar 2021 • Lingtong Kong, Chunhua Shen, Jie Yang

Experiments on both synthetic Sintel data and real-world KITTI datasets demonstrate the effectiveness of the proposed approach, which needs only 1/10 computation of comparable networks to achieve on par accuracy.

Ranked #12 on Optical Flow Estimation on KITTI 2012

Optical Flow Estimation

232

Paper
Code

DiverseDepth: Affine-invariant Depth Prediction Using Diverse Data

2 code implementations • 3 Feb 2020 • Wei Yin, Xinlong Wang, Chunhua Shen, Yifan Liu, Zhi Tian, Songcen Xu, Changming Sun, Dou Renyin

Compared with previous learning objectives, i. e., learning metric depth or relative depth, we propose to learn the affine-invariant depth using our diverse dataset to ensure both generalization and high-quality geometric shapes of scenes.

Depth Estimation Depth Prediction

217

Paper
Code

Real-Time Joint Semantic Segmentation and Depth Estimation Using Asymmetric Annotations

4 code implementations • 13 Sep 2018 • Vladimir Nekrasov, Thanuja Dharmasiri, Andrew Spek, Tom Drummond, Chunhua Shen, Ian Reid

Deployment of deep learning models in robotics as sensory information extractors can be a daunting task to handle, even using generic GPU cards.

Ranked #6 on Real-Time Semantic Segmentation on NYU Depth v2

Knowledge Distillation Monocular Depth Estimation +3

197

Paper
Code

NAS-FCOS: Efficient Search for Object Detection Architectures

1 code implementation • 24 Oct 2021 • Ning Wang, Yang Gao, Hao Chen, Peng Wang, Zhi Tian, Chunhua Shen, Yanning Zhang

Neural Architecture Search (NAS) has shown great potential in effectively reducing manual effort in network design by automatically discovering optimal architectures.

Neural Architecture Search Object +2

187

Paper
Code

CANet: Class-Agnostic Segmentation Networks with Iterative Refinement and Attentive Few-Shot Learning

1 code implementation • CVPR 2019 • Chi Zhang, Guosheng Lin, Fayao Liu, Rui Yao, Chunhua Shen

Recent progress in semantic segmentation is driven by deep Convolutional Neural Networks and large-scale labeled image datasets.

Ranked #86 on Few-Shot Semantic Segmentation on PASCAL-5i (5-Shot)

Few-Shot Semantic Segmentation Segmentation +1

185

Paper
Code

Conditional Positional Encodings for Vision Transformers

2 code implementations • 22 Feb 2021 • Xiangxiang Chu, Zhi Tian, Bo Zhang, Xinlong Wang, Chunhua Shen

Built on PEG, we present Conditional Position encoding Vision Transformer (CPVT).

AutoML Classification +4

177

Paper
Code

SegViT: Semantic Segmentation with Plain Vision Transformers

1 code implementation • 12 Oct 2022 • BoWen Zhang, Zhi Tian, Quan Tang, Xiangxiang Chu, Xiaolin Wei, Chunhua Shen, Yifan Liu

We explore the capability of plain Vision Transformers (ViTs) for semantic segmentation and propose the SegVit.

Ranked #4 on Semantic Segmentation on COCO-Stuff test

Segmentation Semantic Segmentation

176

Paper
Code

SegViTv2: Exploring Efficient and Continual Semantic Segmentation with Plain Vision Transformers

1 code implementation • 9 Jun 2023 • BoWen Zhang, Liyang Liu, Minh Hieu Phan, Zhi Tian, Chunhua Shen, Yifan Liu

This paper investigates the capability of plain Vision Transformers (ViTs) for semantic segmentation using the encoder-decoder framework and introduces \textbf{SegViTv2}.

Ranked #16 on Semantic Segmentation on ADE20K

Continual Learning Continual Semantic Segmentation +2

176

Paper
Code

Poseur: Direct Human Pose Regression with Transformers

1 code implementation • 19 Jan 2022 • Weian Mao, Yongtao Ge, Chunhua Shen, Zhi Tian, Xinlong Wang, Zhibin Wang, Anton Van Den Hengel

We propose a direct, regression-based approach to 2D human pose estimation from single images.

Ranked #2 on Keypoint Detection on MS COCO

2D Human Pose Estimation Keypoint Detection +1

168

Paper
Code

DoDNet: Learning to segment multi-organ and tumors from multiple partially labeled datasets

1 code implementation • CVPR 2021 • Jianpeng Zhang, Yutong Xie, Yong Xia, Chunhua Shen

To address this, we propose a dynamic on-demand network (DoDNet) that learns to segment multiple organs and tumors on partially labeled datasets.

Image Segmentation Medical Image Segmentation +4

164

Paper
Code

Learning from partially labeled data for multi-organ and tumor segmentation

1 code implementation • 13 Nov 2022 • Yutong Xie, Jianpeng Zhang, Yong Xia, Chunhua Shen

To address this, we propose a Transformer based dynamic on-demand network (TransDoDNet) that learns to segment organs and tumors on multiple partially labeled datasets.

Image Segmentation Medical Image Segmentation +4

164

Paper
Code

Template-Based Automatic Search of Compact Semantic Segmentation Architectures

1 code implementation • 4 Apr 2019 • Vladimir Nekrasov, Chunhua Shen, Ian Reid

Automatic search of neural architectures for various vision and natural language tasks is becoming a prominent tool as it allows to discover high-performing structures on any dataset of interest.

Ranked #13 on Semantic Segmentation on CamVid

General Classification Holdout Set +1

149

Paper
Code

Attention-guided Network for Ghost-free High Dynamic Range Imaging

5 code implementations • CVPR 2019 • Qingsen Yan, Dong Gong, Qinfeng Shi, Anton Van Den Hengel, Chunhua Shen, Ian Reid, Yanning Zhang

Ghosting artifacts caused by moving objects or misalignments is a key challenge in high dynamic range (HDR) imaging for dynamic scenes.

Optical Flow Estimation Vocal Bursts Intensity Prediction

143

Paper
Code

From Open Set to Closed Set: Counting Objects by Spatial Divide-and-Conquer

5 code implementations • ICCV 2019 • Haipeng Xiong, Hao Lu, Chengxin Liu, Liang Liu, Zhiguo Cao, Chunhua Shen

A dense region can always be divided until sub-region counts are within the previously observed closed set.

Ranked #3 on Crowd Counting on TRANCOS

Crowd Counting

132

Paper
Code

From Open Set to Closed Set: Supervised Spatial Divide-and-Conquer for Object Counting

3 code implementations • 7 Jan 2020 • Haipeng Xiong, Hao Lu, Chengxin Liu, Liang Liu, Chunhua Shen, Zhiguo Cao

Visual counting, a task that aims to estimate the number of objects from an image/video, is an open-set problem by nature, i. e., the number of population can vary in [0, inf) in theory.

Object Counting

132

Paper
Code

SPTS: Single-Point Text Spotting

1 code implementation • 15 Dec 2021 • Dezhi Peng, Xinyu Wang, Yuliang Liu, Jiaxin Zhang, Mingxin Huang, Songxuan Lai, Shenggao Zhu, Jing Li, Dahua Lin, Chunhua Shen, Xiang Bai, Lianwen Jin

For the first time, we demonstrate that training scene text spotting models can be achieved with an extremely low-cost annotation of a single-point for each instance.

Ranked #3 on Text Spotting on SCUT-CTW1500

Language Modelling Text Detection +1

128

Paper
Code

SPTS v2: Single-Point Scene Text Spotting

3 code implementations • 4 Jan 2023 • Yuliang Liu, Jiaxin Zhang, Dezhi Peng, Mingxin Huang, Xinyu Wang, Jingqun Tang, Can Huang, Dahua Lin, Chunhua Shen, Xiang Bai, Lianwen Jin

Within the context of our SPTS v2 framework, our experiments suggest a potential preference for single-point representation in scene text spotting when compared to other representations.

Ranked #15 on Text Spotting on ICDAR 2015

Text Detection Text Spotting

128

Paper
Code

DyCo3D: Robust Instance Segmentation of 3D Point Clouds through Dynamic Convolution

1 code implementation • CVPR 2021 • Tong He, Chunhua Shen, Anton Van Den Hengel

Previous top-performing approaches for point cloud instance segmentation involve a bottom-up strategy, which often includes inefficient operations or complex pipelines, such as grouping over-segmented components, introducing additional steps for refining, or designing complicated loss functions.

Instance Segmentation Semantic Segmentation

114

Paper
Code

Dynamic Convolution for 3D Point Cloud Instance Segmentation

1 code implementation • 18 Jul 2021 • Tong He, Chunhua Shen, Anton Van Den Hengel

The proposed approach is proposal-free, and instead exploits a convolution process that adapts to the spatial and semantic characteristics of each instance.

Instance Segmentation Semantic Segmentation

114

Paper
Code

SegPrompt: Boosting Open-world Segmentation via Category-level Prompt Learning

1 code implementation • ICCV 2023 • Muzhi Zhu, Hengtao Li, Hao Chen, Chengxiang Fan, Weian Mao, Chenchen Jing, Yifan Liu, Chunhua Shen

In this work, we propose a novel training mechanism termed SegPrompt that uses category information to improve the model's class-agnostic segmentation ability for both known and unknown categories.

Open-World Instance Segmentation Segmentation +1

108

Paper
Code

REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments

1 code implementation • CVPR 2020 • Yuankai Qi, Qi Wu, Peter Anderson, Xin Wang, William Yang Wang, Chunhua Shen, Anton Van Den Hengel

One of the long-term challenges of robotics is to enable robots to interact with humans in the visual world via natural language, as humans are visual animals that communicate through language.

Referring Expression Vision and Language Navigation

105

Paper
Code

Self-training with progressive augmentation for unsupervised cross-domain person re-identification

1 code implementation • ICCV 2019 • Xin-Yu Zhang, Jiewei Cao, Chunhua Shen, Mingyu You

In this work, we develop a self-training method with progressive augmentation framework (PAST) to promote the model performance progressively on the target dataset.

Ranked #11 on Unsupervised Domain Adaptation on Market to Duke

Person Re-Identification Unsupervised Domain Adaptation

Paper
Code

Image Restoration Using Very Deep Convolutional Encoder-Decoder Networks with Symmetric Skip Connections

3 code implementations • NeurIPS 2016 • Xiao-Jiao Mao, Chunhua Shen, Yu-Bin Yang

We propose to symmetrically link convolutional and de-convolutional layers with skip-layer connections, with which the training converges much faster and attains a higher-quality local optimum.

Ranked #37 on Image Super-Resolution on BSD100 - 4x upscaling

Denoising Image Restoration +1

Paper
Code

End-to-End Video Text Spotting with Transformer

1 code implementation • 20 Mar 2022 • Weijia Wu, Yuanqiang Cai, Chunhua Shen, Debing Zhang, Ying Fu, Hong Zhou, Ping Luo

Recent video text spotting methods usually require the three-staged pipeline, i. e., detecting text in individual images, recognizing localized text, tracking text streams with post-processing to generate final results.

Text Detection Text Spotting

Paper
Code

Segmenting Transparent Objects in the Wild

1 code implementation • ECCV 2020 • Enze Xie, Wenjia Wang, Wenhai Wang, Mingyu Ding, Chunhua Shen, Ping Luo

To address this important problem, this work proposes a large-scale dataset for transparent object segmentation, named Trans10K, consisting of 10, 428 images of real scenarios with carefully manual annotations, which are 10 times larger than the existing datasets.

Ranked #4 on Semantic Segmentation on Trans10K

Segmentation Semantic Segmentation +1

Paper
Code

DisCo: Remedy Self-supervised Learning on Lightweight Models with Distilled Contrastive Learning

2 code implementations • 19 Apr 2021 • Yuting Gao, Jia-Xin Zhuang, Shaohui Lin, Hao Cheng, Xing Sun, Ke Li, Chunhua Shen

Specifically, we find the final embedding obtained by the mainstream SSL methods contains the most fruitful information, and propose to distill the final embedding to maximally transmit a teacher's knowledge to a lightweight model by constraining the last embedding of the student to be consistent with that of the teacher.

Contrastive Learning Representation Learning +1

Paper
Code

TextSR: Content-Aware Text Super-Resolution Guided by Recognition

1 code implementation • 16 Sep 2019 • Wenjia Wang, Enze Xie, Peize Sun, Wenhai Wang, Lixun Tian, Chunhua Shen, Ping Luo

Nonetheless, most of the previous methods may not work well in recognizing text with low resolution which is often seen in natural scene images.

Scene Text Recognition Super-Resolution

Paper
Code

Deepfake Generation and Detection: A Benchmark and Survey

1 code implementation • 26 Mar 2024 • Gan Pei, Jiangning Zhang, Menghan Hu, Zhenyu Zhang, Chengjie Wang, Yunsheng Wu, Guangtao Zhai, Jian Yang, Chunhua Shen, DaCheng Tao

Deepfake is a technology dedicated to creating highly realistic facial images and videos under specific conditions, which has significant application potential in fields such as entertainment, movie production, digital human creation, to name a few.

Attribute Face Reenactment +2

Paper
Code

Paragraph-to-Image Generation with Information-Enriched Diffusion Model

1 code implementation • 24 Nov 2023 • Weijia Wu, Zhuang Li, Yefei He, Mike Zheng Shou, Chunhua Shen, Lele Cheng, Yan Li, Tingting Gao, Di Zhang, Zhongyuan Wang

In this paper, we introduce an information-enriched diffusion model for paragraph-to-image generation task, termed ParaDiffusion, which delves into the transference of the extensive semantic comprehension capabilities of large language models to the task of image generation.

Image Generation Language Modelling +1

Paper
Code

StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized Image-Dialogue Data

1 code implementation • 20 Aug 2023 • Yanda Li, Chi Zhang, Gang Yu, Zhibin Wang, Bin Fu, Guosheng Lin, Chunhua Shen, Ling Chen, Yunchao Wei

However, these datasets often exhibit domain bias, potentially constraining the generative capabilities of the models.

Ranked #53 on Visual Question Answering on MM-Vet

Visual Question Answering

Paper
Code

Zolly: Zoom Focal Length Correctly for Perspective-Distorted Human Mesh Reconstruction

1 code implementation • ICCV 2023 • Wenjia Wang, Yongtao Ge, Haiyi Mei, Zhongang Cai, Qingping Sun, Yanjun Wang, Chunhua Shen, Lei Yang, Taku Komura

As it is hard to calibrate single-view RGB images in the wild, existing 3D human mesh reconstruction (3DHMR) methods either use a constant large focal length or estimate one based on the background environment context, which can not tackle the problem of the torso, limb, hand or face distortion caused by perspective camera projection when the camera is close to the human body.

Ranked #5 on 3D Human Pose Estimation on 3DPW

3D Human Pose Estimation 3D Reconstruction

Paper
Code

Efficient Decoder-free Object Detection with Transformers

2 code implementations • 14 Jun 2022 • Peixian Chen, Mengdan Zhang, Yunhang Shen, Kekai Sheng, Yuting Gao, Xing Sun, Ke Li, Chunhua Shen

A natural usage of ViTs in detection is to replace the CNN-based backbone with a transformer-based backbone, which is straightforward and effective, with the price of bringing considerable computation burden for inference.

Object Object Detection

Paper
Code

Explainable Deep Few-shot Anomaly Detection with Deviation Networks

1 code implementation • 1 Aug 2021 • Guansong Pang, Choubo Ding, Chunhua Shen, Anton Van Den Hengel

Here, we study the problem of few-shot anomaly detection, in which we aim at using a few labeled anomaly examples to train sample-efficient discriminative detection models.

Ranked #5 on Supervised Anomaly Detection on MVTec AD (using extra training data)

Multiple Instance Learning Supervised Anomaly Detection +1

Paper
Code

AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting

2 code implementations • ECCV 2020 • Wenhai Wang, Xuebo Liu, Xiaozhong Ji, Enze Xie, Ding Liang, Zhibo Yang, Tong Lu, Chunhua Shen, Ping Luo

Unlike previous works that merely employed visual features for text detection, this work proposes a novel text spotter, named Ambiguity Eliminating Text Spotter (AE TextSpotter), which learns both visual and linguistic features to significantly reduce ambiguity in text detection.

Language Modelling Sentence +2

Paper
Code

Catching Both Gray and Black Swans: Open-set Supervised Anomaly Detection

1 code implementation • CVPR 2022 • Choubo Ding, Guansong Pang, Chunhua Shen

Despite most existing anomaly detection studies assume the availability of normal training samples only, a few labeled anomaly examples are often available in many real-world applications, such as defect samples identified during random quality inspection, lesion images confirmed by radiologists in daily medical screening, etc.

Ranked #4 on Supervised Anomaly Detection on MVTec AD (using extra training data)

Supervised Anomaly Detection

Paper
Code

RGM: A Robust Generalizable Matching Model

1 code implementation • 18 Oct 2023 • Songyan Zhang, Xinyu Sun, Hao Chen, Bo Li, Chunhua Shen

Finding corresponding pixels within a pair of images is a fundamental computer vision task with various applications.

Optical Flow Estimation

Paper
Code

AQD: Towards Accurate Fully-Quantized Object Detection

1 code implementation • CVPR 2021 • Peng Chen, Jing Liu, Bohan Zhuang, Mingkui Tan, Chunhua Shen

Network quantization allows inference to be conducted using low-precision arithmetic for improved inference efficiency of deep neural networks on edge devices.

Image Classification Object +3

Paper
Code

CTVIS: Consistent Training for Online Video Instance Segmentation

1 code implementation • ICCV 2023 • Kaining Ying, Qing Zhong, Weian Mao, Zhenhua Wang, Hao Chen, Lin Yuanbo Wu, Yifan Liu, Chengxiang Fan, Yunzhi Zhuge, Chunhua Shen

The discrimination of instance embeddings plays a vital role in associating instances across time for online video instance segmentation (VIS).

Ranked #2 on Video Instance Segmentation on Youtube-VIS 2022 Validation (using extra training data)

Instance Segmentation Semantic Segmentation +1

Paper
Code

Task-Aware Monocular Depth Estimation for 3D Object Detection

1 code implementation • 17 Sep 2019 • Xinlong Wang, Wei Yin, Tao Kong, Yuning Jiang, Lei LI, Chunhua Shen

In this paper, we first analyse the data distributions and interaction of foreground and background, then propose the foreground-background separated monocular depth estimation (ForeSeE) method, to estimate the foreground depth and background depth using separate optimization objectives and depth decoders.

3D Object Detection 3D Object Recognition +4

Paper
Code

When Unsupervised Domain Adaptation Meets Tensor Representations

1 code implementation • ICCV 2017 • Hao Lu, Lei Zhang, Zhiguo Cao, Wei Wei, Ke Xian, Chunhua Shen, Anton Van Den Hengel

Domain adaption (DA) allows machine learning methods trained on data sampled from one distribution to be applied to data sampled from another.

Unsupervised Domain Adaptation

Paper
Code

Target before Shooting: Accurate Anomaly Detection and Localization under One Millisecond via Cascade Patch Retrieval

1 code implementation • 13 Aug 2023 • Hanxi Li, Jianfei Hu, Bo Li, Hao Chen, Yongbin Zheng, Chunhua Shen

In this framework, the anomaly detection problem is solved via a cascade patch retrieval procedure that retrieves the nearest neighbors for each test image patch in a coarse-to-fine fashion.

Ranked #1 on Supervised Anomaly Detection on BTAD

Supervised Anomaly Detection

Paper
Code

A Simple Baseline for Semi-supervised Semantic Segmentation with Strong Data Augmentation

1 code implementation • ICCV 2021 • Jianlong Yuan, Yifan Liu, Chunhua Shen, Zhibin Wang, Hao Li

Previous works [3, 27] fail to employ strong augmentation in pseudo label learning efficiently, as the large distribution change caused by strong augmentation harms the batch normalisation statistics.

Ranked #12 on Semi-Supervised Semantic Segmentation on Cityscapes 25% labeled

Data Augmentation Image Classification +3

Paper
Code

PointAttN: You Only Need Attention for Point Cloud Completion

1 code implementation • 16 Mar 2022 • Jun Wang, Ying Cui, Dongyan Guo, Junxia Li, Qingshan Liu, Chunhua Shen

To solve the problems, we leverage the cross-attention and self-attention mechanisms to design novel neural network for processing point cloud in a per-point manner to eliminate kNNs.

Point Cloud Completion

Paper
Code

Supervised Discrete Hashing

1 code implementation • CVPR 2015 • Fumin Shen, Chunhua Shen, Wei Liu, Heng Tao Shen

This paper has been withdrawn by the authour.

Paper
Code

Multi-marginal Wasserstein GAN

3 code implementations • NeurIPS 2019 • Jiezhang Cao, Langyuan Mo, Yifan Zhang, Kui Jia, Chunhua Shen, Mingkui Tan

Multiple marginal matching problem aims at learning mappings to match a source domain to multiple target domains and it has attracted great attention in many applications, such as multi-domain image translation.

Image Generation Translation

Paper
Code

ABCNet v2: Adaptive Bezier-Curve Network for Real-time End-to-end Text Spotting

1 code implementation • 8 May 2021 • Yuliang Liu, Chunhua Shen, Lianwen Jin, Tong He, Peng Chen, Chongyu Liu, Hao Chen

Previous methods can be roughly categorized into two groups: character-based and segmentation-based, which often require character-level annotations and/or complex post-processing due to the unstructured output.

Ranked #7 on Text Spotting on Inverse-Text

Text Spotting

Paper
Code

Generative Prompt Model for Weakly Supervised Object Localization

1 code implementation • ICCV 2023 • Yuzhong Zhao, Qixiang Ye, Weijia Wu, Chunhua Shen, Fang Wan

During training, GenPromp converts image category labels to learnable prompt embeddings which are fed to a generative model to conditionally recover the input image with noise and learn representative embeddings.

Ranked #1 on Weakly-Supervised Object Localization on CUB-200-2011 (Top-1 Localization Accuracy metric, using extra training data)

Image Denoising Language Modelling +2

Paper
Code

A Mutual Bootstrapping Model for Automated Skin Lesion Segmentation and Classification

1 code implementation • 8 Mar 2019 • Yutong Xie, Jianpeng Zhang, Yong Xia, Chunhua Shen

Our results suggest that it is possible to boost the performance of skin lesion segmentation and classification simultaneously via training a unified model to perform both tasks in a mutual bootstrapping way.

Classification General Classification +3

Paper
Code

Towards Effective Low-bitwidth Convolutional Neural Networks

2 code implementations • CVPR 2018 • Bohan Zhuang, Chunhua Shen, Mingkui Tan, Lingqiao Liu, Ian Reid

This paper tackles the problem of training a deep convolutional neural network with both low-precision weights and low-bitwidth activations.

Quantization

Paper
Code

Super Vision Transformer

1 code implementation • 23 May 2022 • Mingbao Lin, Mengzhao Chen, Yuxin Zhang, Chunhua Shen, Rongrong Ji, Liujuan Cao

Experimental results on ImageNet demonstrate that our SuperViT can considerably reduce the computational costs of ViT models with even performance increase.

Paper
Code

Adversarial PoseNet: A Structure-aware Convolutional Network for Human Pose Estimation

2 code implementations • ICCV 2017 • Yu Chen, Chunhua Shen, Xiu-Shen Wei, Lingqiao Liu, Jian Yang

In contrast, human vision is able to predict poses by exploiting geometric constraints of joint inter-connectivity.

Ranked #15 on Pose Estimation on MPII Human Pose

Pose Estimation

Paper
Code

Mining Mid-level Visual Patterns with Deep CNN Activations

1 code implementation • 21 Jun 2015 • Yao Li, Lingqiao Liu, Chunhua Shen, Anton Van Den Hengel

The purpose of mid-level visual element discovery is to find clusters of image patches that are both representative and discriminative.

Paper
Code

Learning Affinity-Aware Upsampling for Deep Image Matting

1 code implementation • CVPR 2021 • Yutong Dai, Hao Lu, Chunhua Shen

By looking at existing upsampling operators from a unified mathematical perspective, we generalize them into a second-order form and introduce Affinity-Aware Upsampling (A2U) where upsampling kernels are generated using a light-weight lowrank bilinear model and are conditioned on second-order features.

Image Matting Image Reconstruction

Paper
Code

Reading Car License Plates Using Deep Convolutional Neural Networks and LSTMs

1 code implementation • 21 Jan 2016 • Hui Li, Chunhua Shen

Inspired by the success of deep neural networks (DNNs) in various vision applications, here we leverage DNNs to learn high-level features in a cascade framework, which lead to improved performance on both detection and recognition.

License Plate Detection Segmentation

Paper
Code

Learning Deep Gradient Descent Optimization for Image Deconvolution

1 code implementation • 10 Apr 2018 • Dong Gong, Zhen Zhang, Qinfeng Shi, Anton Van Den Hengel, Chunhua Shen, Yanning Zhang

Extensive experiments on synthetic benchmarks and challenging real-world images demonstrate that the proposed deep optimization method is effective and robust to produce favorable results as well as practical for real-world image deblurring applications.

Blind Image Deblurring Image Deblurring +1

Paper
Code

Exploiting temporal consistency for real-time video depth estimation

2 code implementations • ICCV 2019 • Haokui Zhang, Chunhua Shen, Ying Li, Yuanzhouhan Cao, Yu Liu, Youliang Yan

The temporal consistency loss is combined with the spatial loss to update the model in an end-to-end fashion.

Ranked #5 on Monocular Depth Estimation on Mid-Air Dataset

Monocular Depth Estimation

Paper
Code

Part-Guided Attention Learning for Vehicle Instance Retrieval

1 code implementation • 13 Sep 2019 • Xin-Yu Zhang, Rufeng Zhang, Jiewei Cao, Dong Gong, Mingyu You, Chunhua Shen

Finally, we aggregate the global appearance and part features to improve the feature performance further.

Fine-Grained Image Classification Retrieval +1

Paper
Code

Weighing Counts: Sequential Crowd Counting by Reinforcement Learning

1 code implementation • ECCV 2020 • Liang Liu, Hao Lu, Hongwei Zou, Haipeng Xiong, Zhiguo Cao, Chunhua Shen

Inspired by scale weighing, we propose a novel 'counting scale' termed LibraNet where the count value is analogized by weight.

Crowd Counting reinforcement-learning +1

Paper
Code

Training Protocol Matters: Towards Accurate Scene Text Recognition via Training Protocol Searching

2 code implementations • 13 Mar 2022 • Xiaojie Chu, Yongtao Wang, Chunhua Shen, Jingdong Chen, Wei Chu

The development of scene text recognition (STR) in the era of deep learning has been mainly focused on novel architectures of STR models.

Scene Text Recognition

Paper
Code

Viral Pneumonia Screening on Chest X-ray Images Using Confidence-Aware Anomaly Detection

1 code implementation • 27 Mar 2020 • Jianpeng Zhang, Yutong Xie, Guansong Pang, Zhibin Liao, Johan Verjans, Wenxin Li, Zongji Sun, Jian He, Yi Li, Chunhua Shen, Yong Xia

In this paper, we formulate the task of differentiating viral pneumonia from non-viral pneumonia and healthy controls into an one-class classification-based anomaly detection problem, and thus propose the confidence-aware anomaly detection (CAAD) model, which consists of a shared feature extractor, an anomaly detection module, and a confidence prediction module.

Binary Classification Classification +2

Paper
Code

Toward Deep Supervised Anomaly Detection: Reinforcement Learning from Partially Labeled Anomaly Data

2 code implementations • 15 Sep 2020 • Guansong Pang, Anton Van Den Hengel, Chunhua Shen, Longbing Cao

We consider the problem of anomaly detection with a small set of partially labeled anomaly examples and a large-scale unlabeled dataset.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Code

DENSE: Data-Free One-Shot Federated Learning

1 code implementation • 23 Dec 2021 • Jie Zhang, Chen Chen, Bo Li, Lingjuan Lyu, Shuang Wu, Shouhong Ding, Chunhua Shen, Chao Wu

One-shot Federated Learning (FL) has recently emerged as a promising approach, which allows the central server to learn a model in a single communication round.

Federated Learning

Paper
Code

Fully Quantized Image Super-Resolution Networks

1 code implementation • 29 Nov 2020 • Hu Wang, Peng Chen, Bohan Zhuang, Chunhua Shen

With the rising popularity of intelligent mobile devices, it is of great practical significance to develop accurate, realtime and energy-efficient image Super-Resolution (SR) inference methods.

Image Super-Resolution Quantization

Paper
Code

Dynamic Neural Representational Decoders for High-Resolution Semantic Segmentation

1 code implementation • NeurIPS 2021 • BoWen Zhang, Yifan Liu, Zhi Tian, Chunhua Shen

This neural representation enables our decoder to leverage the smoothness prior in the semantic label space, and thus makes our decoder more efficient.

Segmentation Semantic Segmentation +1

Paper
Code

Multi-dataset Training of Transformers for Robust Action Recognition

1 code implementation • 26 Sep 2022 • Junwei Liang, Enwei Zhang, Jun Zhang, Chunhua Shen

We study the task of robust feature representations, aiming to generalize well on multiple datasets for action recognition.

Action Recognition Temporal Action Localization

Paper
Code

Hyperspectral Classification Based on Lightweight 3-D-CNN With Transfer Learning

2 code implementations • 7 Dec 2020 • Haokui Zhang, Ying Li, Yenan Jiang, Peng Wang, Qiang Shen, Chunhua Shen

In contrast to previous approaches, we do not impose restrictions over the source data sets, in which they do not have to be collected by the same sensors as the target data sets.

Classification General Classification +1

Paper
Code

Bootstrapping the Performance of Webly Supervised Semantic Segmentation

1 code implementation • CVPR 2018 • Tong Shen, Guosheng Lin, Chunhua Shen, Ian Reid

In this work, we focus on weak supervision, developing a method for training a high-quality pixel-level classifier for semantic segmentation, using only image-level class labels as the provided ground-truth.

Segmentation Transfer Learning +2

Paper
Code

Memory-Efficient Hierarchical Neural Architecture Search for Image Denoising

1 code implementation • CVPR 2020 • Haokui Zhang, Ying Li, Hao Chen, Chunhua Shen

We also present analysis on the architectures found by NAS.

Image Denoising Image Restoration +1

Paper
Code

Memory-Efficient Hierarchical Neural Architecture Search for Image Restoration

1 code implementation • 24 Dec 2020 • Haokui Zhang, Ying Li, Hao Chen, Chengrong Gong, Zongwen Bai, Chunhua Shen

For the inner search space, we propose a layer-wise architecture sharing strategy (LWAS), resulting in more flexible architectures and better performance.

Image Denoising Image Restoration +2

Paper
Code

Learning Conditional Attributes for Compositional Zero-Shot Learning

1 code implementation • CVPR 2023 • Qingsheng Wang, Lingqiao Liu, Chenchen Jing, Hao Chen, Guoqiang Liang, Peng Wang, Chunhua Shen

Compositional Zero-Shot Learning (CZSL) aims to train models to recognize novel compositional concepts based on learned concepts such as attribute-object combinations.

Ranked #1 on Compositional Zero-Shot Learning on MIT-States

Attribute Compositional Zero-Shot Learning

Paper
Code

Robust Data Hiding Using Inverse Gradient Attention

1 code implementation • 21 Nov 2020 • Honglei Zhang, Hu Wang, Yuanzhouhan Cao, Chunhua Shen, Yidong Li

In deep data hiding models, to maximize the encoding capacity, each pixel of the cover image ought to be treated differently since they have different sensitivities w. r. t.

Paper
Code

Learning Deep Representations Using Convolutional Auto-encoders with Symmetric Skip Connections

1 code implementation • 28 Nov 2016 • Jianfeng Dong, Xiao-Jiao Mao, Chunhua Shen, Yu-Bin Yang

In this paper, we investigate convolutional denoising auto-encoders to show that unsupervised pre-training can still improve the performance of high-level image related tasks such as image classification and semantic segmentation.

Denoising General Classification +4

Paper
Code

Deep CNNs With Spatially Weighted Pooling for Fine-Grained Car Recognition

1 code implementation • 4 Apr 2017 • Qichang Hu, Huibing Wang, Teng Li, Chunhua Shen

By applying our method to several fine-grained car recognition data sets, we demonstrate that the proposed method can achieve better performance than recent approaches in the literature.

Ranked #1 on Fine-Grained Image Classification on CarFlag-563

Fine-Grained Image Classification

Paper
Code

Traffic Scene Parsing through the TSP6K Dataset

1 code implementation • 6 Mar 2023 • Peng-Tao Jiang, YuQi Yang, Yang Cao, Qibin Hou, Ming-Ming Cheng, Chunhua Shen

To date, most existing datasets focus on autonomous driving scenes.

Autonomous Driving Domain Adaptation +3

Paper
Code

Learning to predict crisp boundaries

1 code implementation • ECCV 2018 • Ruoxi Deng, Chunhua Shen, Shengjun Liu, Huibing Wang, Xinru Liu

Recent methods for boundary or edge detection built on Deep Convolutional Neural Networks (CNNs) typically suffer from the issue of predicted edges being thick and need post-processing to obtain crisp boundaries.

Boundary Detection Edge Detection

Paper
Code

TSGB: Target-Selective Gradient Backprop for Probing CNN Visual Saliency

1 code implementation • 11 Oct 2021 • Lin Cheng, Pengfei Fang, Yanjie Liang, Liao Zhang, Chunhua Shen, Hanzi Wang

Inspired by those observations, we propose a novel visual saliency method, termed Target-Selective Gradient Backprop (TSGB), which leverages rectification operations to effectively emphasize target classes and further efficiently propagate the saliency to the image space, thereby generating target-selective and fine-grained saliency maps.

Paper
Code

FoPro: Few-Shot Guided Robust Webly-Supervised Prototypical Learning

1 code implementation • 1 Dec 2022 • Yulei Qin, Xingyu Chen, Chao Chen, Yunhang Shen, Bo Ren, Yun Gu, Jie Yang, Chunhua Shen

Most existing methods focus on learning noise-robust models from web images while neglecting the performance drop caused by the differences between web domain and real-world domain.

Contrastive Learning Representation Learning

Paper
Code

DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic Segmentation Using Diffusion Models

1 code implementation • ICCV 2023 • Weijia Wu, Yuzhong Zhao, Mike Zheng Shou, Hong Zhou, Chunhua Shen

In contrast, synthetic data can be freely available using a generative model (e. g., DALL-E, Stable Diffusion).

Image Generation Semantic Segmentation

Paper
Code

DA-STC: Domain Adaptive Video Semantic Segmentation via Spatio-Temporal Consistency

1 code implementation • 22 Nov 2023 • Zhe Zhang, Gaochang Wu, Jing Zhang, Chunhua Shen, DaCheng Tao, Tianyou Chai

To solve the challenge, we propose a novel DA-STC method for domain adaptive video semantic segmentation, which incorporates a bidirectional multi-level spatio-temporal fusion module and a category-aware spatio-temporal feature alignment module to facilitate consistent learning for domain-invariant features.

Representation Learning Segmentation +2

Paper
Code

Regularizing Proxies with Multi-Adversarial Training for Unsupervised Domain-Adaptive Semantic Segmentation

1 code implementation • 29 Jul 2019 • Tong Shen, Dong Gong, Wei zhang, Chunhua Shen, Tao Mei

To tackle the unsupervised domain adaptation problem, we explore the possibilities to generate high-quality labels as proxy labels to supervise the training on target data.

Semantic Segmentation Unsupervised Domain Adaptation

Paper
Code

Piecewise classifier mappings: Learning fine-grained learners for novel categories with few examples

1 code implementation • 11 May 2018 • Xiu-Shen Wei, Peng Wang, Lingqiao Liu, Chunhua Shen, Jianxin Wu

To solve this problem, we propose an end-to-end trainable deep network which is inspired by the state-of-the-art fine-grained recognition model and is tailored for the FSFG task.

Few-Shot Learning Fine-Grained Image Recognition

Paper
Code

Real-time End-to-End Video Text Spotter with Contrastive Representation Learning

1 code implementation • 18 Jul 2022 • Wejia Wu, Zhuang Li, Jiahong Li, Chunhua Shen, Hong Zhou, Size Li, Zhongyuan Wang, Ping Luo

Our contributions are three-fold: 1) CoText simultaneously address the three tasks (e. g., text detection, tracking, recognition) in a real-time end-to-end trainable framework.

Contrastive Learning Representation Learning +2

Paper
Code

Learning Depth from Single Monocular Images Using Deep Convolutional Neural Fields

1 code implementation • 26 Feb 2015 • Fayao Liu, Chunhua Shen, Guosheng Lin, Ian Reid

Therefore, here we present a deep convolutional neural field model for estimating depths from single monocular images, aiming to jointly explore the capacity of deep CNN and continuous CRF.

Depth Estimation

Paper
Code

The Treasure beneath Convolutional Layers: Cross-convolutional-layer Pooling for Image Classification

1 code implementation • CVPR 2015 • Lingqiao Liu, Chunhua Shen, Anton Van Den Hengel

This paper, however, advocates that if used appropriately convolutional layer activations can be turned into a powerful image representation which enjoys many advantages over fully-connected layer activations.

General Classification Image Classification

Paper
Code

Improving Generative Adversarial Networks with Local Coordinate Coding

1 code implementation • 28 Jul 2020 • Jiezhang Cao, Yong Guo, Qingyao Wu, Chunhua Shen, Junzhou Huang, Mingkui Tan

In this paper, rather than sampling from the predefined prior distribution, we propose an LCCGAN model with local coordinate coding (LCC) to improve the performance of generating data.

Paper
Code

What value do explicit high level concepts have in vision to language problems?

1 code implementation • CVPR 2016 • Qi Wu, Chunhua Shen, Lingqiao Liu, Anthony Dick, Anton Van Den Hengel

Much of the recent progress in Vision-to-Language (V2L) problems has been achieved through a combination of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs).

Image Captioning Question Answering +1

Paper
Code

Adversarial Learning with Local Coordinate Coding

no code implementations • ICML 2018 • Jiezhang Cao, Yong Guo, Qingyao Wu, Chunhua Shen, Junzhou Huang, Mingkui Tan

Generative adversarial networks (GANs) aim to generate realistic data from some prior distribution (e. g., Gaussian noises).

Paper
Add Code

Adaptive Importance Learning for Improving Lightweight Image Super-resolution Network

no code implementations • 5 Jun 2018 • Lei Zhang, Peng Wang, Chunhua Shen, Lingqiao Liu, Wei Wei, Yanning Zhang, Anton Van Den Hengel

In this study, we revisit this problem from an orthog- onal view, and propose a novel learning strategy to maxi- mize the pixel-wise fitting capacity of a given lightweight network architecture.

Image Super-Resolution

Paper
Add Code

Adversarial Learning of Structure-Aware Fully Convolutional Networks for Landmark Localization

no code implementations • 1 Nov 2017 • Yu Chen, Chunhua Shen, Hao Chen, Xiu-Shen Wei, Lingqiao Liu, Jian Yang

In contrast, human vision is able to predict poses by exploiting geometric constraints of landmark point inter-connectivity.

Pose Estimation

Paper
Add Code

Monocular Depth Estimation with Augmented Ordinal Depth Relationships

no code implementations • 2 Jun 2018 • Yuanzhouhan Cao, Tianqi Zhao, Ke Xian, Chunhua Shen, Zhiguo Cao, Shugong Xu

In this paper, we propose to improve the performance of metric depth estimation with relative depths collected from stereo movie videos using existing stereo matching algorithm.

Depth Prediction Monocular Depth Estimation +2

Paper
Add Code

Multi-label Learning Based Deep Transfer Neural Network for Facial Attribute Classification

no code implementations • 3 May 2018 • Ni Zhuang, Yan Yan, Si Chen, Hanzi Wang, Chunhua Shen

To address the above problem, we propose a novel deep transfer neural network method based on multi-label learning for facial attribute classification, termed FMTNet, which consists of three sub-networks: the Face detection Network (FNet), the Multi-label learning Network (MNet) and the Transfer learning Network (TNet).

Attribute Classification +6

Paper
Add Code

Salient Object Detection by Lossless Feature Reflection

no code implementations • 19 Feb 2018 • Pingping Zhang, Wei Liu, Huchuan Lu, Chunhua Shen

Inspired by the intrinsic reflection of natural images, in this paper we propose a novel feature learning framework for large-scale salient object detection.

Object object-detection +3

Paper
Add Code

HyperFusion-Net: Densely Reflective Fusion for Salient Object Detection

no code implementations • 14 Apr 2018 • Pingping Zhang, Huchuan Lu, Chunhua Shen

Salient object detection (SOD), which aims to find the most important region of interest and segment the relevant object/item in that area, is an important yet challenging vision task.

object-detection RGB Salient Object Detection +1

Paper
Add Code

VITAL: VIsual Tracking via Adversarial Learning

no code implementations • CVPR 2018 • Yibing Song, Chao Ma, Xiaohe Wu, Lijun Gong, Linchao Bao, WangMeng Zuo, Chunhua Shen, Rynson Lau, Ming-Hsuan Yang

To augment positive samples, we use a generative network to randomly generate masks, which are applied to adaptively dropout input features to capture a variety of appearance changes.

General Classification Visual Tracking

Paper
Add Code

Visual Question Answering with Memory-Augmented Networks

no code implementations • CVPR 2018 • Chao Ma, Chunhua Shen, Anthony Dick, Qi Wu, Peng Wang, Anton Van Den Hengel, Ian Reid

In this paper, we exploit a memory-augmented neural network to predict accurate answers to visual questions, even when those answers occur rarely in the training set.

Question Answering Visual Question Answering

Paper
Add Code

Decoupled Spatial Neural Attention for Weakly Supervised Semantic Segmentation

no code implementations • 7 Mar 2018 • Tianyi Zhang, Guosheng Lin, Jianfei Cai, Tong Shen, Chunhua Shen, Alex C. Kot

In our work, we focus on the weakly supervised semantic segmentation with image label annotations.

Image Captioning Segmentation +2

Paper
Add Code

Non-rigid Object Tracking via Deep Multi-scale Spatial-temporal Discriminative Saliency Maps

no code implementations • 22 Feb 2018 • Pingping Zhang, Wei Liu, Dong Wang, Yinjie Lei, Hongyu Wang, Chunhua Shen, Huchuan Lu

Extensive experiments demonstrate that the proposed algorithm achieves competitive performance in both saliency detection and visual tracking, especially outperforming other related trackers on the non-rigid object tracking datasets.

Object Object Tracking +2

Paper
Add Code

Agile Amulet: Real-Time Salient Object Detection with Contextual Attention

no code implementations • 20 Feb 2018 • Pingping Zhang, Luyao Wang, Dong Wang, Huchuan Lu, Chunhua Shen

This paper proposes an Agile Aggregating Multi-Level feaTure framework (Agile Amulet) for salient object detection.

object-detection RGB Salient Object Detection +1

Paper
Add Code

Automatic Image Cropping for Visual Aesthetic Enhancement Using Deep Neural Networks and Cascaded Regression

no code implementations • 25 Dec 2017 • Guanjun Guo, Hanzi Wang, Chunhua Shen, Yan Yan, Hong-Yuan Mark Liao

The deep CNN model is then designed to extract features from several image cropping datasets, upon which the cropping bounding boxes are predicted by the proposed CCR method.

Image Cropping regression

Paper
Add Code

Real-time Semantic Image Segmentation via Spatial Sparsity

no code implementations • 1 Dec 2017 • Zifeng Wu, Chunhua Shen, Anton Van Den Hengel

We propose an approach to semantic (image) segmentation that reduces the computational costs by a factor of 25 with limited impact on the quality of results.

Image Segmentation Segmentation +1

Paper
Add Code

Asking the Difficult Questions: Goal-Oriented Visual Question Generation via Intermediate Rewards

no code implementations • 21 Nov 2017 • Jun-Jie Zhang, Qi Wu, Chunhua Shen, Jian Zhang, Jianfeng Lu, Anton Van Den Hengel

Despite significant progress in a variety of vision-and-language problems, developing a method capable of asking intelligent, goal-oriented questions about images is proven to be an inscrutable challenge.

Informativeness Question Generation +2

Paper
Add Code

Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning

no code implementations • CVPR 2018 • Qi Wu, Peng Wang, Chunhua Shen, Ian Reid, Anton Van Den Hengel

The Visual Dialogue task requires an agent to engage in a conversation about an image with a human.

Ranked #4 on Visual Dialog on VisDial v0.9 val

Question Answering Visual Dialog +1

Paper
Add Code

Kill Two Birds with One Stone: Weakly-Supervised Neural Network for Image Annotation and Tag Refinement

no code implementations • 19 Nov 2017 • Jun-Jie Zhang, Qi Wu, Jian Zhang, Chunhua Shen, Jianfeng Lu

These comments can be a description of the image, or some objects, attributes, scenes in it, which are normally used as the user-provided tags.

Retrieval TAG

Paper
Add Code

Parallel Attention: A Unified Framework for Visual Object Discovery through Dialogs and Queries

no code implementations • CVPR 2018 • Bohan Zhuang, Qi Wu, Chunhua Shen, Ian Reid, Anton Van Den Hengel

To this end we propose a unified framework, the ParalleL AttentioN (PLAN) network, to discover the object in an image that is being referred to in variable length natural expression descriptions, from short phrases query to long multi-round dialogs.

Object Object Discovery +2

Paper
Add Code

Adversarial Generation of Training Examples: Applications to Moving Vehicle License Plate Recognition

no code implementations • 11 Jul 2017 • Xinlong Wang, Zhipeng Man, Mingyu You, Chunhua Shen

Our experimental results on a few data sets demonstrate the effectiveness of using GAN images: an improvement of 7. 5% over a strong baseline with moderate-sized real data being available.

Image Generation License Plate Recognition

Paper
Add Code

Towards End-to-End Car License Plates Detection and Recognition with Deep Neural Networks

no code implementations • 26 Sep 2017 • Hui Li, Peng Wang, Chunhua Shen

In contrast to existing approaches which take license plate detection and recognition as two separate tasks and settle them step by step, our method jointly solves these two tasks by a single network.

License Plate Detection

Paper
Add Code

Estimating Depth from Monocular Images as Classification Using Deep Fully Convolutional Residual Networks

no code implementations • 8 May 2016 • Yuanzhouhan Cao, Zifeng Wu, Chunhua Shen

Then we train fully convolutional deep residual networks to predict the depth label of each pixel.

Depth Estimation General Classification +1

Paper
Add Code

FVQA: Fact-based Visual Question Answering

no code implementations • 17 Jun 2016 • Peng Wang, Qi Wu, Chunhua Shen, Anton Van Den Hengel, Anthony Dick

We evaluate several baseline models on the FVQA dataset, and describe a novel model which is capable of reasoning about an image on the basis of supporting facts.

Ranked #2 on Visual Question Answering (VQA) on F-VQA

Common Sense Reasoning Question Answering +1

Paper
Add Code

Weakly Supervised Semantic Segmentation Based on Web Image Co-segmentation

no code implementations • 25 May 2017 • Tong Shen, Guosheng Lin, Lingqiao Liu, Chunhua Shen, Ian Reid

Training a Fully Convolutional Network (FCN) for semantic segmentation requires a large number of masks with pixel level labelling, which involves a large amount of human labour and time for annotation.

Segmentation Weakly supervised Semantic Segmentation +1

Paper
Add Code

Beyond Low Rank: A Data-Adaptive Tensor Completion Method

no code implementations • 3 Aug 2017 • Lei Zhang, Wei Wei, Qinfeng Shi, Chunhua Shen, Anton Van Den Hengel, Yanning Zhang

The prior for the non-low-rank structure is established based on a mixture of Gaussians which is shown to be flexible enough, and powerful enough, to inform the completion process for a variety of real tensor data.

Paper
Add Code

Relative Depth Order Estimation Using Multi-scale Densely Connected Convolutional Networks

no code implementations • 25 Jul 2017 • Ruoxi Deng, Tianqi Zhao, Chunhua Shen, Shengjun Liu

We study the problem of estimating the relative depth order of point pairs in a monocular image.

Paper
Add Code

Unsupervised Object Discovery and Co-Localization by Deep Descriptor Transforming

no code implementations • 20 Jul 2017 • Xiu-Shen Wei, Chen-Lin Zhang, Jianxin Wu, Chunhua Shen, Zhi-Hua Zhou

Reusable model design becomes desirable with the rapid expansion of computer vision and machine learning applications.

Ranked #11 on Single-object discovery on COCO_20k

Object object-detection +4

Paper
Add Code

Visually Aligned Word Embeddings for Improving Zero-shot Learning

no code implementations • 18 Jul 2017 • Ruizhi Qiao, Lingqiao Liu, Chunhua Shen, Anton Van Den Hengel

To overcome this visual-semantic discrepancy, this work proposes an objective function to re-align the distributed word embeddings with visual information by learning a neural network to map it into a new representation called visually aligned word embedding (VAWE).

Paper
Add Code

Towards End-to-end Text Spotting with Convolutional Recurrent Neural Networks

no code implementations • ICCV 2017 • Hui Li, Peng Wang, Chunhua Shen

In this work, we jointly address the problem of text detection and recognition in natural scene images based on convolutional recurrent neural networks.

Image Cropping Text Detection +1

Paper
Add Code

TasselNet: Counting maize tassels in the wild via local counts regression network

no code implementations • 7 Jul 2017 • Hao Lu, Zhiguo Cao, Yang Xiao, Bohan Zhuang, Chunhua Shen

To our knowledge, this is the first time that a plant-related counting problem is considered using computer vision technologies under unconstrained field-based environment.

Plant Phenotyping regression

Paper
Add Code

Care about you: towards large-scale human-centric visual relationship detection

no code implementations • 28 May 2017 • Bohan Zhuang, Qi Wu, Chunhua Shen, Ian Reid, Anton Van Den Hengel

In addressing this problem we first construct a large-scale human-centric visual relationship detection dataset (HCVRD), which provides many more types of relationship annotation (nearly 10K categories) than the previous released datasets.

Human-Object Interaction Detection Relationship Detection +1

Paper
Add Code

Deep Descriptor Transforming for Image Co-Localization

no code implementations • 8 May 2017 • Xiu-Shen Wei, Chen-Lin Zhang, Yao Li, Chen-Wei Xie, Jianxin Wu, Chunhua Shen, Zhi-Hua Zhou

Reusable model design becomes desirable with the rapid expansion of machine learning applications.

Paper
Add Code

Exploring Context with Deep Structured models for Semantic Segmentation

no code implementations • 10 Mar 2016 • Guosheng Lin, Chunhua Shen, Anton Van Den Hengel, Ian Reid

We formulate deep structured models by combining CNNs and Conditional Random Fields (CRFs) for learning the patch-patch context between image regions.

Image Segmentation Segmentation +1

Paper
Add Code

Towards Context-aware Interaction Recognition

no code implementations • 18 Mar 2017 • Bohan Zhuang, Lingqiao Liu, Chunhua Shen, Ian Reid

Recognizing how objects interact with each other is a crucial task in visual recognition.

Paper
Add Code

Robust Guided Image Filtering

no code implementations • 28 Mar 2017 • Wei Liu, Xiaogang Chen, Chunhua Shen, Jingyi Yu, Qiang Wu, Jie Yang

In this paper, we propose a general framework for Robust Guided Image Filtering (RGIF), which contains a data term and a smoothness term, to solve the two issues mentioned above.

Paper
Add Code

Structured Learning of Tree Potentials in CRF for Image Segmentation

no code implementations • 26 Mar 2017 • Fayao Liu, Guosheng Lin, Ruizhi Qiao, Chunhua Shen

In this fashion, we easily achieve nonlinear learning of potential functions on both unary and pairwise terms in CRFs.

Image Segmentation Semantic Segmentation

Paper
Add Code

Multi-Label Image Classification with Regional Latent Semantic Dependencies

no code implementations • 4 Dec 2016 • Jun-Jie Zhang, Qi Wu, Chunhua Shen, Jian Zhang, Jianfeng Lu

Recent state-of-the-art approaches to multi-label image classification exploit the label dependencies in an image, at global level, largely improving the labeling capacity.

Classification General Classification +1

Paper
Add Code

Learning Multi-level Region Consistency with Dense Multi-label Networks for Semantic Segmentation

no code implementations • 25 Jan 2017 • Tong Shen, Guosheng Lin, Chunhua Shen, Ian Reid

Semantic image segmentation is a fundamental task in image understanding.

Image Segmentation Segmentation +1

Paper
Add Code

Deep Learning Features at Scale for Visual Place Recognition

no code implementations • 18 Jan 2017 • Zetao Chen, Adam Jacobson, Niko Sunderhauf, Ben Upcroft, Lingqiao Liu, Chunhua Shen, Ian Reid, Michael Milford

The success of deep learning techniques in the computer vision domain has triggered a range of initial investigations into their utility for visual place recognition, all using generic features from networks that were trained for other types of recognition tasks.

Visual Place Recognition

Paper
Add Code

Compositional Model based Fisher Vector Coding for Image Classification

1 code implementation • 16 Jan 2016 • Lingqiao Liu, Peng Wang, Chunhua Shen, Lei Wang, Anton Van Den Hengel, Chao Wang, Heng Tao Shen

To handle this limitation, in this paper we break the convention which assumes that a local feature is drawn from one of few Gaussian distributions.

Classification General Classification +1

Paper
Code

Cross-convolutional-layer Pooling for Image Recognition

no code implementations • 4 Oct 2015 • Lingqiao Liu, Chunhua Shen, Anton Van Den Hengel

Most of these studies adopt activations from a single DCNN layer, usually the fully-connected layer, as the image representation.

General Classification Image Classification

Paper
Add Code

Image Captioning and Visual Question Answering Based on Attributes and External Knowledge

no code implementations • 9 Mar 2016 • Qi Wu, Chunhua Shen, Anton Van Den Hengel, Peng Wang, Anthony Dick

Much recent progress in Vision-to-Language problems has been achieved through a combination of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs).

Ranked #9 on Visual Question Answering (VQA) on COCO Visual Question Answering (VQA) real images 1.0 open ended

General Knowledge Image Captioning +2

Paper
Add Code

The VQA-Machine: Learning How to Use Existing Vision Algorithms to Answer New Questions

no code implementations • CVPR 2017 • Peng Wang, Qi Wu, Chunhua Shen, Anton Van Den Hengel

To train a method to perform even one of these operations accurately from {image, question, answer} tuples would be challenging, but to aim to achieve them all with a limited set of such training data seems ambitious at best.

BIG-bench Machine Learning Question Answering +1

Paper
Add Code

From Motion Blur to Motion Flow: a Deep Learning Solution for Removing Heterogeneous Motion Blur

no code implementations • CVPR 2017 • Dong Gong, Jie Yang, Lingqiao Liu, Yanning Zhang, Ian Reid, Chunhua Shen, Anton Van Den Hengel, Qinfeng Shi

The critical observation underpinning our approach is thus that learning the motion flow instead allows the model to focus on the cause of the blur, irrespective of the image content.

Paper
Add Code

Sequential Person Recognition in Photo Albums with a Recurrent Network

no code implementations • CVPR 2017 • Yao Li, Guosheng Lin, Bohan Zhuang, Lingqiao Liu, Chunhua Shen, Anton Van Den Hengel

In this work, we propose to model the relational information between people as a sequence prediction task.

Person Recognition

Paper
Add Code

Attend in groups: a weakly-supervised deep learning framework for learning from web data

no code implementations • CVPR 2017 • Bohan Zhuang, Lingqiao Liu, Yao Li, Chunhua Shen, Ian Reid

Large-scale datasets have driven the rapid development of deep neural networks for visual recognition.

Paper
Add Code

Exploiting Depth from Single Monocular Images for Object Detection and Semantic Segmentation

no code implementations • 6 Oct 2016 • Yuanzhouhan Cao, Chunhua Shen, Heng Tao Shen

Augmenting RGB data with measured depth has been shown to improve the performance of a range of tasks in computer vision including object detection and semantic segmentation.

Depth Estimation Object +4

Paper
Add Code

Fast Training of Triplet-based Deep Binary Embedding Networks

no code implementations • CVPR 2016 • Bohan Zhuang, Guosheng Lin, Chunhua Shen, Ian Reid

To solve the first stage, we design a large-scale high-order binary codes inference algorithm to reduce the high-order objective to a standard binary quadratic problem such that graph cuts can be used to efficiently infer the binary code which serve as the label of each training datum.

Image Retrieval Multi-Label Classification +1

Paper
Add Code

Image Co-localization by Mimicking a Good Detector's Confidence Score Distribution

no code implementations • 15 Mar 2016 • Yao Li, Linqiao Liu, Chunhua Shen, Anton Van Den Hengel

More specifically, we observe that given a set of object proposals extracted from an image that contains the object of interest, an accurate strongly supervised object detector should give high scores to only a small minority of proposals, and low scores to most of them.

Object

Paper
Add Code

Where to Focus: Query Adaptive Matching for Instance Retrieval Using Convolutional Feature Maps

no code implementations • 22 Jun 2016 • Jiewei Cao, Lingqiao Liu, Peng Wang, Zi Huang, Chunhua Shen, Heng Tao Shen

Instance retrieval requires one to search for images that contain a particular object within a large corpus.

Retrieval

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.