7 code implementations • 2 Nov 2018 • Hui Li, Peng Wang, Chunhua Shen, Guyu Zhang
Recognizing irregular text in natural scene images is challenging due to the large variance in text appearance, such as curvature, orientation and distortion.
Ranked #26 on Scene Text Recognition on ICDAR2015
8 code implementations • NeurIPS 2021 • Xiangxiang Chu, Zhi Tian, Yuqing Wang, Bo Zhang, Haibing Ren, Xiaolin Wei, Huaxia Xia, Chunhua Shen
Very recently, a variety of vision transformer architectures for dense prediction tasks have been proposed and they show that the design of spatial attention is critical to their success in these tasks.
Ranked #48 on Semantic Segmentation on ADE20K val
86 code implementations • ICCV 2019 • Zhi Tian, Chunhua Shen, Hao Chen, Tong He
By eliminating the predefined set of anchor boxes, FCOS completely avoids the complicated computation related to anchor boxes such as calculating overlapping during training.
Ranked #4 on Pedestrian Detection on TJU-Ped-campus
3 code implementations • CVPR 2020 • Ning Wang, Yang Gao, Hao Chen, Peng Wang, Zhi Tian, Chunhua Shen, Yanning Zhang
The success of deep neural networks relies on significant architecture engineering.
Ranked #113 on Object Detection on COCO test-dev
24 code implementations • ECCV 2020 • Xinlong Wang, Tao Kong, Chunhua Shen, Yuning Jiang, Lei LI
We present a new, embarrassingly simple approach to instance segmentation in images.
Ranked #67 on Instance Segmentation on COCO test-dev
18 code implementations • NeurIPS 2020 • Xinlong Wang, Rufeng Zhang, Tao Kong, Lei LI, Chunhua Shen
Importantly, we take one step further by dynamically learning the mask head of the object segmenter such that the mask head is conditioned on the location.
Ranked #10 on Real-time Instance Segmentation on MSCOCO
7 code implementations • 5 Apr 2020 • Changqian Yu, Changxin Gao, Jingbo Wang, Gang Yu, Chunhua Shen, Nong Sang
We propose to treat these spatial details and categorical semantics separately to achieve high accuracy and high efficiency for realtime semantic segmentation.
Ranked #1 on Real-Time Semantic Segmentation on COCO-Stuff
6 code implementations • ICCV 2019 • Wenhai Wang, Enze Xie, Xiaoge Song, Yuhang Zang, Wenjia Wang, Tong Lu, Gang Yu, Chunhua Shen
Recently, some methods have been proposed to tackle arbitrary-shaped text detection, but they rarely take the speed of the entire pipeline into consideration, which may fall short in practical applications. In this paper, we propose an efficient and accurate arbitrary-shaped text detector, termed Pixel Aggregation Network (PAN), which is equipped with a low computational-cost segmentation head and a learnable post-processing.
Ranked #8 on Scene Text Detection on SCUT-CTW1500
8 code implementations • 18 Nov 2019 • Zhi Tian, Hao Chen, Chunhua Shen
We propose the first direct end-to-end multi-person pose estimation framework, termed DirectPose.
Ranked #13 on Keypoint Detection on COCO test-dev
9 code implementations • CVPR 2020 • Hao Chen, Kunyang Sun, Zhi Tian, Chunhua Shen, Yongming Huang, Youliang Yan
The proposed BlendMask can effectively predict dense per-pixel position-sensitive instance features with very few channels, and learn attention maps for each instance with merely one convolution layer, thus being fast in inference.
Ranked #13 on Real-time Instance Segmentation on MSCOCO
15 code implementations • CVPR 2020 • Yuliang Liu, Hao Chen, Chunhua Shen, Tong He, Lianwen Jin, Liangwei Wang
Our contributions are three-fold: 1) For the first time, we adaptively fit arbitrarily-shaped text by a parameterized Bezier curve.
Ranked #9 on Text Spotting on Inverse-Text
7 code implementations • ECCV 2020 • Zhi Tian, Chunhua Shen, Hao Chen
We propose a simple yet effective instance segmentation framework, termed CondInst (conditional convolutions for instance segmentation).
7 code implementations • CVPR 2020 • Rufeng Zhang, Zhi Tian, Chunhua Shen, Mingyu You, Youliang Yan
To date, instance segmentation is dominated by twostage methods, as pioneered by Mask R-CNN.
2 code implementations • CVPR 2021 • Zhi Tian, Chunhua Shen, Xinlong Wang, Hao Chen
We present a high-performance method that can achieve mask-level instance segmentation with only bounding-box annotations for training.
3 code implementations • CVPR 2021 • Weian Mao, Zhi Tian, Xinlong Wang, Chunhua Shen
We propose a fully convolutional multi-person pose estimation framework using dynamic instance-aware convolutions, termed FCPose.
6 code implementations • CVPR 2021 • Xinlong Wang, Rufeng Zhang, Chunhua Shen, Tao Kong, Lei LI
Compared to the baseline method MoCo-v2, our method introduces negligible computation overhead (only <1% slower), but demonstrates consistently superior performance when transferring to downstream dense prediction tasks including object detection, semantic segmentation and instance segmentation; and outperforms the state-of-the-art methods by a large margin.
1 code implementation • CVPR 2023 • Xinlong Wang, Wen Wang, Yue Cao, Chunhua Shen, Tiejun Huang
In this work, we present Painter, a generalist model which addresses these obstacles with an "image"-centric solution, that is, to redefine the output of core vision tasks as images, and specify task prompts as also images.
Ranked #6 on Personalized Segmentation on PerSeg
1 code implementation • 6 Apr 2023 • Xinlong Wang, Xiaosong Zhang, Yue Cao, Wen Wang, Chunhua Shen, Tiejun Huang
We unify various segmentation tasks into a generalist in-context learning framework that accommodates different kinds of segmentation data by transforming them into the same format of images.
Ranked #1 on Few-Shot Semantic Segmentation on PASCAL-5i (5-Shot) (using extra training data)
3 code implementations • ICCV 2021 • Changyong Shu, Yifan Liu, Jianfei Gao, Zheng Yan, Chunhua Shen
Observing that in semantic segmentation, some layers' feature activations of each channel tend to encode saliency of scene categories (analogue to class activation mapping), we propose to align features channel-wise between the student and teacher networks.
1 code implementation • CVPR 2019 • Tong He, Chunhua Shen, Zhi Tian, Dong Gong, Changming Sun, Youliang Yan
To tackle this dilemma, we propose a knowledge distillation method tailored for semantic segmentation to improve the performance of the compact FCNs with large overall stride.
3 code implementations • ICCV 2019 • Wei Yin, Yifan Liu, Chunhua Shen, Youliang Yan
Monocular depth prediction plays a crucial role in understanding 3D scene geometry.
Ranked #10 on Depth Estimation on NYU-Depth V2
1 code implementation • CVPR 2021 • Wei Yin, Jianming Zhang, Oliver Wang, Simon Niklaus, Long Mai, Simon Chen, Chunhua Shen
Despite significant progress in monocular depth estimation in the wild, recent state-of-the-art methods cannot be used to recover accurate 3D scene shape due to an unknown depth shift induced by shift-invariant reconstruction losses used in mixed-data depth prediction training, and possible unknown camera focal length.
Ranked #1 on Indoor Monocular Depth Estimation on DIODE (using extra training data)
3 code implementations • 7 Mar 2021 • Wei Yin, Yifan Liu, Chunhua Shen
In this work, we show the importance of the high-order 3D geometric constraints for depth prediction.
1 code implementation • 28 Aug 2022 • Wei Yin, Jianming Zhang, Oliver Wang, Simon Niklaus, Simon Chen, Yifan Liu, Chunhua Shen
To do so, we propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image, and then exploits 3D point cloud data to predict the depth shift and the camera's focal length that allow us to recover 3D scene shapes.
2 code implementations • CVPR 2020 • Enze Xie, Peize Sun, Xiaoge Song, Wenhai Wang, Ding Liang, Chunhua Shen, Ping Luo
In this paper, we introduce an anchor-box free and single shot instance segmentation method, which is conceptually simple, fully convolutional and can be used as a mask prediction module for instance segmentation, by easily embedding it into most off-the-shelf detection methods.
Ranked #100 on Instance Segmentation on COCO test-dev
17 code implementations • 29 Jun 2016 • Xiao-Jiao Mao, Chunhua Shen, Yu-Bin Yang
In this work, we propose a very deep fully convolutional auto-encoder network for image restoration, which is a encoding-decoding framework with symmetric convolutional-deconvolutional layers.
Ranked #2 on Grayscale Image Denoising on BSD200 sigma10
1 code implementation • 28 Dec 2023 • Xiangxiang Chu, Limeng Qiao, Xinyang Lin, Shuang Xu, Yang Yang, Yiming Hu, Fei Wei, Xinyu Zhang, Bo Zhang, Xiaolin Wei, Chunhua Shen
We present MobileVLM, a competent multimodal vision language model (MMVLM) targeted to run on mobile devices.
1 code implementation • 6 Feb 2024 • Xiangxiang Chu, Limeng Qiao, Xinyu Zhang, Shuang Xu, Fei Wei, Yang Yang, Xiaofei Sun, Yiming Hu, Xinyang Lin, Bo Zhang, Chunhua Shen
We introduce MobileVLM V2, a family of significantly improved vision language models upon MobileVLM, which proves that a delicate orchestration of novel architectural design, an improved training scheme tailored for mobile VLMs, and rich high-quality dataset curation can substantially benefit VLMs' performance.
2 code implementations • CVPR 2021 • Yuqing Wang, Zhaoliang Xu, Xinlong Wang, Chunhua Shen, Baoshan Cheng, Hao Shen, Huaxia Xia
Here, we propose a new video instance segmentation framework built upon Transformers, termed VisTR, which views the VIS task as a direct end-to-end parallel sequence decoding/prediction problem.
Ranked #33 on Video Instance Segmentation on YouTube-VIS validation
2 code implementations • 8 Oct 2018 • Vladimir Nekrasov, Chunhua Shen, Ian Reid
We consider an important task of effective and efficient semantic image segmentation.
Ranked #2 on Real-Time Semantic Segmentation on NYU Depth v2
2 code implementations • NeurIPS 2019 • Jia-Wang Bian, Zhichao Li, Naiyan Wang, Huangying Zhan, Chunhua Shen, Ming-Ming Cheng, Ian Reid
To the best of our knowledge, this is the first work to show that deep networks trained using unlabelled monocular videos can predict globally scale-consistent camera trajectories over a long video sequence.
Ranked #61 on Monocular Depth Estimation on KITTI Eigen split
2 code implementations • 25 May 2021 • Jia-Wang Bian, Huangying Zhan, Naiyan Wang, Zhichao Li, Le Zhang, Chunhua Shen, Ming-Ming Cheng, Ian Reid
We propose a monocular depth estimator SC-Depth, which requires only unlabelled videos for training and enables the scale-consistent prediction at inference time.
1 code implementation • CVPR 2019 • Yifan Liu, Changyong Shun, Jingdong Wang, Chunhua Shen
Here we propose to distill structured knowledge from large networks to compact networks, taking into account the fact that dense prediction is a structured prediction problem.
1 code implementation • ICCV 2023 • Wei Yin, Chi Zhang, Hao Chen, Zhipeng Cai, Gang Yu, Kaixuan Wang, Xiaozhi Chen, Chunhua Shen
State-of-the-art (SOTA) monocular metric depth estimation methods can only handle a single camera model and are unable to perform mixed-data training due to the metric ambiguity.
Ranked #19 on Monocular Depth Estimation on NYU-Depth V2 (using extra training data)
1 code implementation • Under review for Transaction 2024 • Mu Hu, Wei Yin, Chi Zhang, Zhipeng Cai, Xiaoxiao Long, Hao Chen, Kaixuan Wang, Gang Yu, Chunhua Shen, Shaojie Shen
Our method benefits various applications including in-the-wild metrology monocular-SLAM, and 3D reconstruction, which highlight the versatility of Metric3D v2 models as geometric foundation models.
Ranked #1 on Monocular Depth Estimation on NYU-Depth V2 (using extra training data)
13 code implementations • CVPR 2017 • Guosheng Lin, Anton Milan, Chunhua Shen, Ian Reid
Recently, very deep convolutional neural networks (CNNs) have shown outstanding performance in object recognition and have also been the first choice for dense classification problems such as semantic segmentation.
Ranked #13 on Semantic Segmentation on Trans10K
5 code implementations • 15 Mar 2020 • Chi Zhang, Yujun Cai, Guosheng Lin, Chunhua Shen
We employ the Earth Mover's Distance (EMD) as a metric to compute a structural distance between dense image representations to determine image relevance.
2 code implementations • 30 May 2023 • Chi Zhang, YiWen Chen, Yijun Fu, Zhenglin Zhou, Gang Yu, Billzb Wang, Bin Fu, Tao Chen, Guosheng Lin, Chunhua Shen
The recent advancements in image-text diffusion models have stimulated research interest in large-scale 3D generative models.
1 code implementation • 20 Jul 2016 • Qi Wu, Damien Teney, Peng Wang, Chunhua Shen, Anthony Dick, Anton Van Den Hengel
Visual Question Answering (VQA) is a challenging task that has received increasing attention from both the computer vision and the natural language processing communities.
1 code implementation • 2 May 2021 • Wenhai Wang, Enze Xie, Xiang Li, Xuebo Liu, Ding Liang, Zhibo Yang, Tong Lu, Chunhua Shen
By systematically comparing with existing scene text representations, we show that our kernel representation can not only describe arbitrarily-shaped text but also well distinguish adjacent text.
4 code implementations • ECCV 2020 • Wenjia Wang, Enze Xie, Xuebo Liu, Wenhai Wang, Ding Liang, Chunhua Shen, Xiang Bai
For example, it outperforms LapSRN by over 5% and 8%on the recognition accuracy of ASTER and CRNN.
1 code implementation • 4 Jun 2020 • Jia-Wang Bian, Huangying Zhan, Naiyan Wang, Tat-Jun Chin, Chunhua Shen, Ian Reid
However, excellent results have mostly been obtained in street-scene driving scenarios, and such methods often fail in other settings, particularly indoor videos taken by handheld devices.
Ranked #62 on Monocular Depth Estimation on NYU-Depth V2
2 code implementations • 7 Nov 2022 • Libo Sun, Jia-Wang Bian, Huangying Zhan, Wei Yin, Ian Reid, Chunhua Shen
Self-supervised monocular depth estimation has shown impressive results in static scenes.
Indoor Monocular Depth Estimation Monocular Depth Estimation +1
1 code implementation • ICCV 2019 • Hao Lu, Yutong Dai, Chunhua Shen, Songcen Xu
We show that existing upsampling operators can be unified with the notion of the index function.
6 code implementations • 11 Aug 2019 • Hao Lu, Yutong Dai, Chunhua Shen, Songcen Xu
By viewing the indices as a function of the feature map, we introduce the concept of "learning to index", and present a novel index-guided encoder-decoder framework where indices are self-learned adaptively from data and are used to guide the downsampling and upsampling stages, without extra training supervision.
Ranked #2 on Grayscale Image Denoising on Set12 sigma30
3 code implementations • CVPR 2022 • Wenqiang Zhang, Zilong Huang, Guozhong Luo, Tao Chen, Xinggang Wang, Wenyu Liu, Gang Yu, Chunhua Shen
Although vision transformers (ViTs) have achieved great success in computer vision, the heavy computational cost hampers their applications to dense prediction tasks such as semantic segmentation on mobile devices.
1 code implementation • 22 May 2023 • Yang Liu, Muzhi Zhu, Hengtao Li, Hao Chen, Xinlong Wang, Chunhua Shen
In this work, we present Matcher, a novel perception paradigm that utilizes off-the-shelf vision foundation models to address various perception tasks.
3 code implementations • 30 Nov 2016 • Zifeng Wu, Chunhua Shen, Anton Van Den Hengel
As a result, we are able to derive a new, shallower, architecture of residual networks which significantly outperforms much deeper models such as ResNet-200 on the ImageNet classification dataset.
Ranked #11 on Semantic Segmentation on PASCAL VOC 2012 test
4 code implementations • CVPR 2019 • Vladimir Nekrasov, Hao Chen, Chunhua Shen, Ian Reid
While most results in this domain have been achieved on image classification and language modelling problems, here we concentrate on dense per-pixel tasks, in particular, semantic image segmentation using fully convolutional networks.
Ranked #13 on Semantic Segmentation on PASCAL VOC 2012 val
2 code implementations • CVPR 2018 • Tong He, Zhi Tian, Weilin Huang, Chunhua Shen, Yu Qiao, Changming Sun
This allows the two tasks to work collaboratively by shar- ing convolutional features, which is critical to identify challenging text instances.
1 code implementation • 30 Mar 2023 • Wen Wang, Yan Jiang, Kangyang Xie, Zide Liu, Hao Chen, Yue Cao, Xinlong Wang, Chunhua Shen
Our vid2vid-zero leverages off-the-shelf image diffusion models, and doesn't require training on any video.
1 code implementation • CVPR 2022 • Xinlong Wang, Zhiding Yu, Shalini De Mello, Jan Kautz, Anima Anandkumar, Chunhua Shen, Jose M. Alvarez
FreeSOLO further demonstrates superiority as a strong pre-training method, outperforming state-of-the-art self-supervised pre-training methods by +9. 8% AP when fine-tuning instance segmentation with only 5% COCO masks.
1 code implementation • 12 Oct 2023 • Haoyi Zhu, Honghui Yang, Xiaoyang Wu, Di Huang, Sha Zhang, Xianglong He, Hengshuang Zhao, Chunhua Shen, Yu Qiao, Tong He, Wanli Ouyang
In this paper, we introduce a novel universal 3D pre-training framework designed to facilitate the acquisition of efficient 3D representation, thereby establishing a pathway to 3D foundational models.
Ranked #1 on 3D Semantic Segmentation on ScanNet++ (using extra training data)
1 code implementation • ECCV 2020 • Yifan Liu, Chunhua Shen, Changqian Yu, Jingdong Wang
For semantic segmentation, most existing real-time deep models trained with each frame independently may produce inconsistent results for a video sequence.
Ranked #2 on Video Semantic Segmentation on CamVid
3 code implementations • 30 Oct 2019 • Guansong Pang, Chunhua Shen, Huidong Jin, Anton Van Den Hengel
To detect both seen and unseen anomalies, we introduce a novel deep weakly-supervised approach, namely Pairwise Relation prediction Network (PReNet), that learns pairwise relation features and anomaly scores by predicting the relation of any two randomly sampled training instances, in which the pairwise relation can be anomaly-anomaly, anomaly-unlabeled, or unlabeled-unlabeled.
6 code implementations • 19 Nov 2019 • Guansong Pang, Chunhua Shen, Anton Van Den Hengel
Instead of representation learning, our method fulfills an end-to-end learning of anomaly scores by a neural deviation learning, in which we leverage a few (e. g., multiple to dozens) labeled anomalies and a prior probability to enforce statistically significant deviations of the anomaly scores of anomalies from that of normal data objects in the upper tail.
Ranked #1 on Network Intrusion Detection on NB15-Backdoor
2 code implementations • 22 Dec 2019 • Hu Wang, Guansong Pang, Chunhua Shen, Congbo Ma
To enable unsupervised learning on those domains, in this work we propose to learn features without using any labelled data by training neural networks to predict data distances in a randomly projected space.
1 code implementation • 4 Mar 2021 • Yutong Xie, Jianpeng Zhang, Chunhua Shen, Yong Xia
Convolutional neural networks (CNNs) have been the de facto standard for nowadays 3D medical image segmentation.
1 code implementation • 1 Mar 2024 • Xiangxiang Chu, Jianlin Su, Bo Zhang, Chunhua Shen
Large language models are built on top of a transformer-based architecture to process textual inputs.
1 code implementation • NeurIPS 2023 • Weijia Wu, Yuzhong Zhao, Hao Chen, YuChao Gu, Rui Zhao, Yefei He, Hong Zhou, Mike Zheng Shou, Chunhua Shen
To showcase the power of the proposed approach, we generate datasets with rich dense pixel-wise labels for a wide range of downstream tasks, including semantic segmentation, instance segmentation, and depth estimation.
1 code implementation • 20 Dec 2019 • Yuliang Liu, Tong He, Hao Chen, Xinyu Wang, Canjie Luo, Shuaitao Zhang, Chunhua Shen, Lianwen Jin
More importantly, based on OBD, we provide a detailed analysis of the impact of a collection of refinements, which may inspire others to build state-of-the-art text detectors.
Ranked #3 on Scene Text Detection on ICDAR 2017 MLT
4 code implementations • CVPR 2018 • Yu Chen, Ying Tai, Xiaoming Liu, Chunhua Shen, Jian Yang
We present a novel deep end-to-end trainable Face Super-Resolution Network (FSRNet), which makes full use of the geometry prior, i. e., facial landmark heatmaps and parsing maps, to super-resolve very low-resolution (LR) face images without well-aligned requirement.
3 code implementations • CVPR 2019 • Xinlong Wang, Shu Liu, Xiaoyong Shen, Chunhua Shen, Jiaya Jia
A 3D point cloud describes the real scene precisely and intuitively. To date how to segment diversified elements in such an informative 3D scene is rarely discussed.
Ranked #15 on 3D Instance Segmentation on S3DIS (mRec metric)
2 code implementations • CVPR 2020 • Changqian Yu, Jingbo Wang, Changxin Gao, Gang Yu, Chunhua Shen, Nong Sang
Given an input image and corresponding ground truth, Affinity Loss constructs an ideal affinity map to supervise the learning of Context Prior.
Ranked #1 on Scene Understanding on ADE20K val
2 code implementations • CVPR 2018 • Xinlong Wang, Tete Xiao, Yuning Jiang, Shuai Shao, Jian Sun, Chunhua Shen
In this paper, we first explore how a state-of-the-art pedestrian detector is harmed by crowd occlusion via experimentation, providing insights into the crowd occlusion problem.
Ranked #9 on Pedestrian Detection on Caltech (using extra training data)
2 code implementations • 8 Mar 2021 • Lingtong Kong, Chunhua Shen, Jie Yang
Experiments on both synthetic Sintel data and real-world KITTI datasets demonstrate the effectiveness of the proposed approach, which needs only 1/10 computation of comparable networks to achieve on par accuracy.
Ranked #12 on Optical Flow Estimation on KITTI 2012
2 code implementations • 3 Feb 2020 • Wei Yin, Xinlong Wang, Chunhua Shen, Yifan Liu, Zhi Tian, Songcen Xu, Changming Sun, Dou Renyin
Compared with previous learning objectives, i. e., learning metric depth or relative depth, we propose to learn the affine-invariant depth using our diverse dataset to ensure both generalization and high-quality geometric shapes of scenes.
4 code implementations • 13 Sep 2018 • Vladimir Nekrasov, Thanuja Dharmasiri, Andrew Spek, Tom Drummond, Chunhua Shen, Ian Reid
Deployment of deep learning models in robotics as sensory information extractors can be a daunting task to handle, even using generic GPU cards.
Ranked #6 on Real-Time Semantic Segmentation on NYU Depth v2
1 code implementation • 24 Oct 2021 • Ning Wang, Yang Gao, Hao Chen, Peng Wang, Zhi Tian, Chunhua Shen, Yanning Zhang
Neural Architecture Search (NAS) has shown great potential in effectively reducing manual effort in network design by automatically discovering optimal architectures.
1 code implementation • CVPR 2019 • Chi Zhang, Guosheng Lin, Fayao Liu, Rui Yao, Chunhua Shen
Recent progress in semantic segmentation is driven by deep Convolutional Neural Networks and large-scale labeled image datasets.
Ranked #86 on Few-Shot Semantic Segmentation on PASCAL-5i (5-Shot)
2 code implementations • 22 Feb 2021 • Xiangxiang Chu, Zhi Tian, Bo Zhang, Xinlong Wang, Chunhua Shen
Built on PEG, we present Conditional Position encoding Vision Transformer (CPVT).
1 code implementation • 12 Oct 2022 • BoWen Zhang, Zhi Tian, Quan Tang, Xiangxiang Chu, Xiaolin Wei, Chunhua Shen, Yifan Liu
We explore the capability of plain Vision Transformers (ViTs) for semantic segmentation and propose the SegVit.
Ranked #4 on Semantic Segmentation on COCO-Stuff test
1 code implementation • 9 Jun 2023 • BoWen Zhang, Liyang Liu, Minh Hieu Phan, Zhi Tian, Chunhua Shen, Yifan Liu
This paper investigates the capability of plain Vision Transformers (ViTs) for semantic segmentation using the encoder-decoder framework and introduces \textbf{SegViTv2}.
Ranked #16 on Semantic Segmentation on ADE20K
1 code implementation • 19 Jan 2022 • Weian Mao, Yongtao Ge, Chunhua Shen, Zhi Tian, Xinlong Wang, Zhibin Wang, Anton Van Den Hengel
We propose a direct, regression-based approach to 2D human pose estimation from single images.
Ranked #2 on Keypoint Detection on MS COCO
1 code implementation • CVPR 2021 • Jianpeng Zhang, Yutong Xie, Yong Xia, Chunhua Shen
To address this, we propose a dynamic on-demand network (DoDNet) that learns to segment multiple organs and tumors on partially labeled datasets.
1 code implementation • 13 Nov 2022 • Yutong Xie, Jianpeng Zhang, Yong Xia, Chunhua Shen
To address this, we propose a Transformer based dynamic on-demand network (TransDoDNet) that learns to segment organs and tumors on multiple partially labeled datasets.
1 code implementation • 4 Apr 2019 • Vladimir Nekrasov, Chunhua Shen, Ian Reid
Automatic search of neural architectures for various vision and natural language tasks is becoming a prominent tool as it allows to discover high-performing structures on any dataset of interest.
Ranked #13 on Semantic Segmentation on CamVid
5 code implementations • CVPR 2019 • Qingsen Yan, Dong Gong, Qinfeng Shi, Anton Van Den Hengel, Chunhua Shen, Ian Reid, Yanning Zhang
Ghosting artifacts caused by moving objects or misalignments is a key challenge in high dynamic range (HDR) imaging for dynamic scenes.
5 code implementations • ICCV 2019 • Haipeng Xiong, Hao Lu, Chengxin Liu, Liang Liu, Zhiguo Cao, Chunhua Shen
A dense region can always be divided until sub-region counts are within the previously observed closed set.
Ranked #3 on Crowd Counting on TRANCOS
3 code implementations • 7 Jan 2020 • Haipeng Xiong, Hao Lu, Chengxin Liu, Liang Liu, Chunhua Shen, Zhiguo Cao
Visual counting, a task that aims to estimate the number of objects from an image/video, is an open-set problem by nature, i. e., the number of population can vary in [0, inf) in theory.
1 code implementation • 15 Dec 2021 • Dezhi Peng, Xinyu Wang, Yuliang Liu, Jiaxin Zhang, Mingxin Huang, Songxuan Lai, Shenggao Zhu, Jing Li, Dahua Lin, Chunhua Shen, Xiang Bai, Lianwen Jin
For the first time, we demonstrate that training scene text spotting models can be achieved with an extremely low-cost annotation of a single-point for each instance.
Ranked #3 on Text Spotting on SCUT-CTW1500
3 code implementations • 4 Jan 2023 • Yuliang Liu, Jiaxin Zhang, Dezhi Peng, Mingxin Huang, Xinyu Wang, Jingqun Tang, Can Huang, Dahua Lin, Chunhua Shen, Xiang Bai, Lianwen Jin
Within the context of our SPTS v2 framework, our experiments suggest a potential preference for single-point representation in scene text spotting when compared to other representations.
Ranked #15 on Text Spotting on ICDAR 2015
1 code implementation • CVPR 2021 • Tong He, Chunhua Shen, Anton Van Den Hengel
Previous top-performing approaches for point cloud instance segmentation involve a bottom-up strategy, which often includes inefficient operations or complex pipelines, such as grouping over-segmented components, introducing additional steps for refining, or designing complicated loss functions.
1 code implementation • 18 Jul 2021 • Tong He, Chunhua Shen, Anton Van Den Hengel
The proposed approach is proposal-free, and instead exploits a convolution process that adapts to the spatial and semantic characteristics of each instance.
1 code implementation • ICCV 2023 • Muzhi Zhu, Hengtao Li, Hao Chen, Chengxiang Fan, Weian Mao, Chenchen Jing, Yifan Liu, Chunhua Shen
In this work, we propose a novel training mechanism termed SegPrompt that uses category information to improve the model's class-agnostic segmentation ability for both known and unknown categories.
1 code implementation • CVPR 2020 • Yuankai Qi, Qi Wu, Peter Anderson, Xin Wang, William Yang Wang, Chunhua Shen, Anton Van Den Hengel
One of the long-term challenges of robotics is to enable robots to interact with humans in the visual world via natural language, as humans are visual animals that communicate through language.
1 code implementation • ICCV 2019 • Xin-Yu Zhang, Jiewei Cao, Chunhua Shen, Mingyu You
In this work, we develop a self-training method with progressive augmentation framework (PAST) to promote the model performance progressively on the target dataset.
Ranked #11 on Unsupervised Domain Adaptation on Market to Duke
3 code implementations • NeurIPS 2016 • Xiao-Jiao Mao, Chunhua Shen, Yu-Bin Yang
We propose to symmetrically link convolutional and de-convolutional layers with skip-layer connections, with which the training converges much faster and attains a higher-quality local optimum.
Ranked #37 on Image Super-Resolution on BSD100 - 4x upscaling
1 code implementation • 20 Mar 2022 • Weijia Wu, Yuanqiang Cai, Chunhua Shen, Debing Zhang, Ying Fu, Hong Zhou, Ping Luo
Recent video text spotting methods usually require the three-staged pipeline, i. e., detecting text in individual images, recognizing localized text, tracking text streams with post-processing to generate final results.
1 code implementation • ECCV 2020 • Enze Xie, Wenjia Wang, Wenhai Wang, Mingyu Ding, Chunhua Shen, Ping Luo
To address this important problem, this work proposes a large-scale dataset for transparent object segmentation, named Trans10K, consisting of 10, 428 images of real scenarios with carefully manual annotations, which are 10 times larger than the existing datasets.
Ranked #4 on Semantic Segmentation on Trans10K
2 code implementations • 19 Apr 2021 • Yuting Gao, Jia-Xin Zhuang, Shaohui Lin, Hao Cheng, Xing Sun, Ke Li, Chunhua Shen
Specifically, we find the final embedding obtained by the mainstream SSL methods contains the most fruitful information, and propose to distill the final embedding to maximally transmit a teacher's knowledge to a lightweight model by constraining the last embedding of the student to be consistent with that of the teacher.
1 code implementation • 16 Sep 2019 • Wenjia Wang, Enze Xie, Peize Sun, Wenhai Wang, Lixun Tian, Chunhua Shen, Ping Luo
Nonetheless, most of the previous methods may not work well in recognizing text with low resolution which is often seen in natural scene images.
1 code implementation • 26 Mar 2024 • Gan Pei, Jiangning Zhang, Menghan Hu, Zhenyu Zhang, Chengjie Wang, Yunsheng Wu, Guangtao Zhai, Jian Yang, Chunhua Shen, DaCheng Tao
Deepfake is a technology dedicated to creating highly realistic facial images and videos under specific conditions, which has significant application potential in fields such as entertainment, movie production, digital human creation, to name a few.
1 code implementation • 24 Nov 2023 • Weijia Wu, Zhuang Li, Yefei He, Mike Zheng Shou, Chunhua Shen, Lele Cheng, Yan Li, Tingting Gao, Di Zhang, Zhongyuan Wang
In this paper, we introduce an information-enriched diffusion model for paragraph-to-image generation task, termed ParaDiffusion, which delves into the transference of the extensive semantic comprehension capabilities of large language models to the task of image generation.
1 code implementation • 20 Aug 2023 • Yanda Li, Chi Zhang, Gang Yu, Zhibin Wang, Bin Fu, Guosheng Lin, Chunhua Shen, Ling Chen, Yunchao Wei
However, these datasets often exhibit domain bias, potentially constraining the generative capabilities of the models.
Ranked #53 on Visual Question Answering on MM-Vet
1 code implementation • ICCV 2023 • Wenjia Wang, Yongtao Ge, Haiyi Mei, Zhongang Cai, Qingping Sun, Yanjun Wang, Chunhua Shen, Lei Yang, Taku Komura
As it is hard to calibrate single-view RGB images in the wild, existing 3D human mesh reconstruction (3DHMR) methods either use a constant large focal length or estimate one based on the background environment context, which can not tackle the problem of the torso, limb, hand or face distortion caused by perspective camera projection when the camera is close to the human body.
Ranked #5 on 3D Human Pose Estimation on 3DPW
2 code implementations • 14 Jun 2022 • Peixian Chen, Mengdan Zhang, Yunhang Shen, Kekai Sheng, Yuting Gao, Xing Sun, Ke Li, Chunhua Shen
A natural usage of ViTs in detection is to replace the CNN-based backbone with a transformer-based backbone, which is straightforward and effective, with the price of bringing considerable computation burden for inference.
1 code implementation • 1 Aug 2021 • Guansong Pang, Choubo Ding, Chunhua Shen, Anton Van Den Hengel
Here, we study the problem of few-shot anomaly detection, in which we aim at using a few labeled anomaly examples to train sample-efficient discriminative detection models.
Ranked #5 on Supervised Anomaly Detection on MVTec AD (using extra training data)
2 code implementations • ECCV 2020 • Wenhai Wang, Xuebo Liu, Xiaozhong Ji, Enze Xie, Ding Liang, Zhibo Yang, Tong Lu, Chunhua Shen, Ping Luo
Unlike previous works that merely employed visual features for text detection, this work proposes a novel text spotter, named Ambiguity Eliminating Text Spotter (AE TextSpotter), which learns both visual and linguistic features to significantly reduce ambiguity in text detection.
1 code implementation • CVPR 2022 • Choubo Ding, Guansong Pang, Chunhua Shen
Despite most existing anomaly detection studies assume the availability of normal training samples only, a few labeled anomaly examples are often available in many real-world applications, such as defect samples identified during random quality inspection, lesion images confirmed by radiologists in daily medical screening, etc.
Ranked #4 on Supervised Anomaly Detection on MVTec AD (using extra training data)
1 code implementation • 18 Oct 2023 • Songyan Zhang, Xinyu Sun, Hao Chen, Bo Li, Chunhua Shen
Finding corresponding pixels within a pair of images is a fundamental computer vision task with various applications.
1 code implementation • CVPR 2021 • Peng Chen, Jing Liu, Bohan Zhuang, Mingkui Tan, Chunhua Shen
Network quantization allows inference to be conducted using low-precision arithmetic for improved inference efficiency of deep neural networks on edge devices.
1 code implementation • ICCV 2023 • Kaining Ying, Qing Zhong, Weian Mao, Zhenhua Wang, Hao Chen, Lin Yuanbo Wu, Yifan Liu, Chengxiang Fan, Yunzhi Zhuge, Chunhua Shen
The discrimination of instance embeddings plays a vital role in associating instances across time for online video instance segmentation (VIS).
Ranked #2 on Video Instance Segmentation on Youtube-VIS 2022 Validation (using extra training data)
1 code implementation • 17 Sep 2019 • Xinlong Wang, Wei Yin, Tao Kong, Yuning Jiang, Lei LI, Chunhua Shen
In this paper, we first analyse the data distributions and interaction of foreground and background, then propose the foreground-background separated monocular depth estimation (ForeSeE) method, to estimate the foreground depth and background depth using separate optimization objectives and depth decoders.
1 code implementation • ICCV 2017 • Hao Lu, Lei Zhang, Zhiguo Cao, Wei Wei, Ke Xian, Chunhua Shen, Anton Van Den Hengel
Domain adaption (DA) allows machine learning methods trained on data sampled from one distribution to be applied to data sampled from another.
1 code implementation • 13 Aug 2023 • Hanxi Li, Jianfei Hu, Bo Li, Hao Chen, Yongbin Zheng, Chunhua Shen
In this framework, the anomaly detection problem is solved via a cascade patch retrieval procedure that retrieves the nearest neighbors for each test image patch in a coarse-to-fine fashion.
Ranked #1 on Supervised Anomaly Detection on BTAD
1 code implementation • ICCV 2021 • Jianlong Yuan, Yifan Liu, Chunhua Shen, Zhibin Wang, Hao Li
Previous works [3, 27] fail to employ strong augmentation in pseudo label learning efficiently, as the large distribution change caused by strong augmentation harms the batch normalisation statistics.
1 code implementation • 16 Mar 2022 • Jun Wang, Ying Cui, Dongyan Guo, Junxia Li, Qingshan Liu, Chunhua Shen
To solve the problems, we leverage the cross-attention and self-attention mechanisms to design novel neural network for processing point cloud in a per-point manner to eliminate kNNs.
1 code implementation • CVPR 2015 • Fumin Shen, Chunhua Shen, Wei Liu, Heng Tao Shen
This paper has been withdrawn by the authour.
3 code implementations • NeurIPS 2019 • Jiezhang Cao, Langyuan Mo, Yifan Zhang, Kui Jia, Chunhua Shen, Mingkui Tan
Multiple marginal matching problem aims at learning mappings to match a source domain to multiple target domains and it has attracted great attention in many applications, such as multi-domain image translation.
1 code implementation • 8 May 2021 • Yuliang Liu, Chunhua Shen, Lianwen Jin, Tong He, Peng Chen, Chongyu Liu, Hao Chen
Previous methods can be roughly categorized into two groups: character-based and segmentation-based, which often require character-level annotations and/or complex post-processing due to the unstructured output.
Ranked #7 on Text Spotting on Inverse-Text
1 code implementation • ICCV 2023 • Yuzhong Zhao, Qixiang Ye, Weijia Wu, Chunhua Shen, Fang Wan
During training, GenPromp converts image category labels to learnable prompt embeddings which are fed to a generative model to conditionally recover the input image with noise and learn representative embeddings.
Ranked #1 on Weakly-Supervised Object Localization on CUB-200-2011 (Top-1 Localization Accuracy metric, using extra training data)
1 code implementation • 8 Mar 2019 • Yutong Xie, Jianpeng Zhang, Yong Xia, Chunhua Shen
Our results suggest that it is possible to boost the performance of skin lesion segmentation and classification simultaneously via training a unified model to perform both tasks in a mutual bootstrapping way.
2 code implementations • CVPR 2018 • Bohan Zhuang, Chunhua Shen, Mingkui Tan, Lingqiao Liu, Ian Reid
This paper tackles the problem of training a deep convolutional neural network with both low-precision weights and low-bitwidth activations.
1 code implementation • 23 May 2022 • Mingbao Lin, Mengzhao Chen, Yuxin Zhang, Chunhua Shen, Rongrong Ji, Liujuan Cao
Experimental results on ImageNet demonstrate that our SuperViT can considerably reduce the computational costs of ViT models with even performance increase.
2 code implementations • ICCV 2017 • Yu Chen, Chunhua Shen, Xiu-Shen Wei, Lingqiao Liu, Jian Yang
In contrast, human vision is able to predict poses by exploiting geometric constraints of joint inter-connectivity.
Ranked #15 on Pose Estimation on MPII Human Pose
1 code implementation • 21 Jun 2015 • Yao Li, Lingqiao Liu, Chunhua Shen, Anton Van Den Hengel
The purpose of mid-level visual element discovery is to find clusters of image patches that are both representative and discriminative.
1 code implementation • CVPR 2021 • Yutong Dai, Hao Lu, Chunhua Shen
By looking at existing upsampling operators from a unified mathematical perspective, we generalize them into a second-order form and introduce Affinity-Aware Upsampling (A2U) where upsampling kernels are generated using a light-weight lowrank bilinear model and are conditioned on second-order features.
1 code implementation • 21 Jan 2016 • Hui Li, Chunhua Shen
Inspired by the success of deep neural networks (DNNs) in various vision applications, here we leverage DNNs to learn high-level features in a cascade framework, which lead to improved performance on both detection and recognition.
1 code implementation • 10 Apr 2018 • Dong Gong, Zhen Zhang, Qinfeng Shi, Anton Van Den Hengel, Chunhua Shen, Yanning Zhang
Extensive experiments on synthetic benchmarks and challenging real-world images demonstrate that the proposed deep optimization method is effective and robust to produce favorable results as well as practical for real-world image deblurring applications.
2 code implementations • ICCV 2019 • Haokui Zhang, Chunhua Shen, Ying Li, Yuanzhouhan Cao, Yu Liu, Youliang Yan
The temporal consistency loss is combined with the spatial loss to update the model in an end-to-end fashion.
Ranked #5 on Monocular Depth Estimation on Mid-Air Dataset
1 code implementation • 13 Sep 2019 • Xin-Yu Zhang, Rufeng Zhang, Jiewei Cao, Dong Gong, Mingyu You, Chunhua Shen
Finally, we aggregate the global appearance and part features to improve the feature performance further.
1 code implementation • ECCV 2020 • Liang Liu, Hao Lu, Hongwei Zou, Haipeng Xiong, Zhiguo Cao, Chunhua Shen
Inspired by scale weighing, we propose a novel 'counting scale' termed LibraNet where the count value is analogized by weight.
2 code implementations • 13 Mar 2022 • Xiaojie Chu, Yongtao Wang, Chunhua Shen, Jingdong Chen, Wei Chu
The development of scene text recognition (STR) in the era of deep learning has been mainly focused on novel architectures of STR models.
1 code implementation • 27 Mar 2020 • Jianpeng Zhang, Yutong Xie, Guansong Pang, Zhibin Liao, Johan Verjans, Wenxin Li, Zongji Sun, Jian He, Yi Li, Chunhua Shen, Yong Xia
In this paper, we formulate the task of differentiating viral pneumonia from non-viral pneumonia and healthy controls into an one-class classification-based anomaly detection problem, and thus propose the confidence-aware anomaly detection (CAAD) model, which consists of a shared feature extractor, an anomaly detection module, and a confidence prediction module.
2 code implementations • 15 Sep 2020 • Guansong Pang, Anton Van Den Hengel, Chunhua Shen, Longbing Cao
We consider the problem of anomaly detection with a small set of partially labeled anomaly examples and a large-scale unlabeled dataset.
1 code implementation • 23 Dec 2021 • Jie Zhang, Chen Chen, Bo Li, Lingjuan Lyu, Shuang Wu, Shouhong Ding, Chunhua Shen, Chao Wu
One-shot Federated Learning (FL) has recently emerged as a promising approach, which allows the central server to learn a model in a single communication round.
1 code implementation • 29 Nov 2020 • Hu Wang, Peng Chen, Bohan Zhuang, Chunhua Shen
With the rising popularity of intelligent mobile devices, it is of great practical significance to develop accurate, realtime and energy-efficient image Super-Resolution (SR) inference methods.
1 code implementation • NeurIPS 2021 • BoWen Zhang, Yifan Liu, Zhi Tian, Chunhua Shen
This neural representation enables our decoder to leverage the smoothness prior in the semantic label space, and thus makes our decoder more efficient.
1 code implementation • 26 Sep 2022 • Junwei Liang, Enwei Zhang, Jun Zhang, Chunhua Shen
We study the task of robust feature representations, aiming to generalize well on multiple datasets for action recognition.
2 code implementations • 7 Dec 2020 • Haokui Zhang, Ying Li, Yenan Jiang, Peng Wang, Qiang Shen, Chunhua Shen
In contrast to previous approaches, we do not impose restrictions over the source data sets, in which they do not have to be collected by the same sensors as the target data sets.
1 code implementation • CVPR 2018 • Tong Shen, Guosheng Lin, Chunhua Shen, Ian Reid
In this work, we focus on weak supervision, developing a method for training a high-quality pixel-level classifier for semantic segmentation, using only image-level class labels as the provided ground-truth.
1 code implementation • CVPR 2020 • Haokui Zhang, Ying Li, Hao Chen, Chunhua Shen
We also present analysis on the architectures found by NAS.
1 code implementation • 24 Dec 2020 • Haokui Zhang, Ying Li, Hao Chen, Chengrong Gong, Zongwen Bai, Chunhua Shen
For the inner search space, we propose a layer-wise architecture sharing strategy (LWAS), resulting in more flexible architectures and better performance.
1 code implementation • CVPR 2023 • Qingsheng Wang, Lingqiao Liu, Chenchen Jing, Hao Chen, Guoqiang Liang, Peng Wang, Chunhua Shen
Compositional Zero-Shot Learning (CZSL) aims to train models to recognize novel compositional concepts based on learned concepts such as attribute-object combinations.
Ranked #1 on Compositional Zero-Shot Learning on MIT-States
1 code implementation • 21 Nov 2020 • Honglei Zhang, Hu Wang, Yuanzhouhan Cao, Chunhua Shen, Yidong Li
In deep data hiding models, to maximize the encoding capacity, each pixel of the cover image ought to be treated differently since they have different sensitivities w. r. t.
1 code implementation • 28 Nov 2016 • Jianfeng Dong, Xiao-Jiao Mao, Chunhua Shen, Yu-Bin Yang
In this paper, we investigate convolutional denoising auto-encoders to show that unsupervised pre-training can still improve the performance of high-level image related tasks such as image classification and semantic segmentation.
1 code implementation • 4 Apr 2017 • Qichang Hu, Huibing Wang, Teng Li, Chunhua Shen
By applying our method to several fine-grained car recognition data sets, we demonstrate that the proposed method can achieve better performance than recent approaches in the literature.
Ranked #1 on Fine-Grained Image Classification on CarFlag-563
1 code implementation • 6 Mar 2023 • Peng-Tao Jiang, YuQi Yang, Yang Cao, Qibin Hou, Ming-Ming Cheng, Chunhua Shen
To date, most existing datasets focus on autonomous driving scenes.
1 code implementation • ECCV 2018 • Ruoxi Deng, Chunhua Shen, Shengjun Liu, Huibing Wang, Xinru Liu
Recent methods for boundary or edge detection built on Deep Convolutional Neural Networks (CNNs) typically suffer from the issue of predicted edges being thick and need post-processing to obtain crisp boundaries.
1 code implementation • 11 Oct 2021 • Lin Cheng, Pengfei Fang, Yanjie Liang, Liao Zhang, Chunhua Shen, Hanzi Wang
Inspired by those observations, we propose a novel visual saliency method, termed Target-Selective Gradient Backprop (TSGB), which leverages rectification operations to effectively emphasize target classes and further efficiently propagate the saliency to the image space, thereby generating target-selective and fine-grained saliency maps.
1 code implementation • 1 Dec 2022 • Yulei Qin, Xingyu Chen, Chao Chen, Yunhang Shen, Bo Ren, Yun Gu, Jie Yang, Chunhua Shen
Most existing methods focus on learning noise-robust models from web images while neglecting the performance drop caused by the differences between web domain and real-world domain.
1 code implementation • ICCV 2023 • Weijia Wu, Yuzhong Zhao, Mike Zheng Shou, Hong Zhou, Chunhua Shen
In contrast, synthetic data can be freely available using a generative model (e. g., DALL-E, Stable Diffusion).
1 code implementation • 22 Nov 2023 • Zhe Zhang, Gaochang Wu, Jing Zhang, Chunhua Shen, DaCheng Tao, Tianyou Chai
To solve the challenge, we propose a novel DA-STC method for domain adaptive video semantic segmentation, which incorporates a bidirectional multi-level spatio-temporal fusion module and a category-aware spatio-temporal feature alignment module to facilitate consistent learning for domain-invariant features.
1 code implementation • 29 Jul 2019 • Tong Shen, Dong Gong, Wei zhang, Chunhua Shen, Tao Mei
To tackle the unsupervised domain adaptation problem, we explore the possibilities to generate high-quality labels as proxy labels to supervise the training on target data.
1 code implementation • 11 May 2018 • Xiu-Shen Wei, Peng Wang, Lingqiao Liu, Chunhua Shen, Jianxin Wu
To solve this problem, we propose an end-to-end trainable deep network which is inspired by the state-of-the-art fine-grained recognition model and is tailored for the FSFG task.
1 code implementation • 18 Jul 2022 • Wejia Wu, Zhuang Li, Jiahong Li, Chunhua Shen, Hong Zhou, Size Li, Zhongyuan Wang, Ping Luo
Our contributions are three-fold: 1) CoText simultaneously address the three tasks (e. g., text detection, tracking, recognition) in a real-time end-to-end trainable framework.
1 code implementation • 26 Feb 2015 • Fayao Liu, Chunhua Shen, Guosheng Lin, Ian Reid
Therefore, here we present a deep convolutional neural field model for estimating depths from single monocular images, aiming to jointly explore the capacity of deep CNN and continuous CRF.
1 code implementation • CVPR 2015 • Lingqiao Liu, Chunhua Shen, Anton Van Den Hengel
This paper, however, advocates that if used appropriately convolutional layer activations can be turned into a powerful image representation which enjoys many advantages over fully-connected layer activations.
1 code implementation • 28 Jul 2020 • Jiezhang Cao, Yong Guo, Qingyao Wu, Chunhua Shen, Junzhou Huang, Mingkui Tan
In this paper, rather than sampling from the predefined prior distribution, we propose an LCCGAN model with local coordinate coding (LCC) to improve the performance of generating data.
1 code implementation • CVPR 2016 • Qi Wu, Chunhua Shen, Lingqiao Liu, Anthony Dick, Anton Van Den Hengel
Much of the recent progress in Vision-to-Language (V2L) problems has been achieved through a combination of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs).
no code implementations • ICML 2018 • Jiezhang Cao, Yong Guo, Qingyao Wu, Chunhua Shen, Junzhou Huang, Mingkui Tan
Generative adversarial networks (GANs) aim to generate realistic data from some prior distribution (e. g., Gaussian noises).
no code implementations • 5 Jun 2018 • Lei Zhang, Peng Wang, Chunhua Shen, Lingqiao Liu, Wei Wei, Yanning Zhang, Anton Van Den Hengel
In this study, we revisit this problem from an orthog- onal view, and propose a novel learning strategy to maxi- mize the pixel-wise fitting capacity of a given lightweight network architecture.
no code implementations • 1 Nov 2017 • Yu Chen, Chunhua Shen, Hao Chen, Xiu-Shen Wei, Lingqiao Liu, Jian Yang
In contrast, human vision is able to predict poses by exploiting geometric constraints of landmark point inter-connectivity.
no code implementations • 2 Jun 2018 • Yuanzhouhan Cao, Tianqi Zhao, Ke Xian, Chunhua Shen, Zhiguo Cao, Shugong Xu
In this paper, we propose to improve the performance of metric depth estimation with relative depths collected from stereo movie videos using existing stereo matching algorithm.
no code implementations • 3 May 2018 • Ni Zhuang, Yan Yan, Si Chen, Hanzi Wang, Chunhua Shen
To address the above problem, we propose a novel deep transfer neural network method based on multi-label learning for facial attribute classification, termed FMTNet, which consists of three sub-networks: the Face detection Network (FNet), the Multi-label learning Network (MNet) and the Transfer learning Network (TNet).
no code implementations • 19 Feb 2018 • Pingping Zhang, Wei Liu, Huchuan Lu, Chunhua Shen
Inspired by the intrinsic reflection of natural images, in this paper we propose a novel feature learning framework for large-scale salient object detection.
no code implementations • 14 Apr 2018 • Pingping Zhang, Huchuan Lu, Chunhua Shen
Salient object detection (SOD), which aims to find the most important region of interest and segment the relevant object/item in that area, is an important yet challenging vision task.
no code implementations • CVPR 2018 • Yibing Song, Chao Ma, Xiaohe Wu, Lijun Gong, Linchao Bao, WangMeng Zuo, Chunhua Shen, Rynson Lau, Ming-Hsuan Yang
To augment positive samples, we use a generative network to randomly generate masks, which are applied to adaptively dropout input features to capture a variety of appearance changes.
no code implementations • CVPR 2018 • Chao Ma, Chunhua Shen, Anthony Dick, Qi Wu, Peng Wang, Anton Van Den Hengel, Ian Reid
In this paper, we exploit a memory-augmented neural network to predict accurate answers to visual questions, even when those answers occur rarely in the training set.
no code implementations • 7 Mar 2018 • Tianyi Zhang, Guosheng Lin, Jianfei Cai, Tong Shen, Chunhua Shen, Alex C. Kot
In our work, we focus on the weakly supervised semantic segmentation with image label annotations.
no code implementations • 22 Feb 2018 • Pingping Zhang, Wei Liu, Dong Wang, Yinjie Lei, Hongyu Wang, Chunhua Shen, Huchuan Lu
Extensive experiments demonstrate that the proposed algorithm achieves competitive performance in both saliency detection and visual tracking, especially outperforming other related trackers on the non-rigid object tracking datasets.
no code implementations • 20 Feb 2018 • Pingping Zhang, Luyao Wang, Dong Wang, Huchuan Lu, Chunhua Shen
This paper proposes an Agile Aggregating Multi-Level feaTure framework (Agile Amulet) for salient object detection.
no code implementations • 25 Dec 2017 • Guanjun Guo, Hanzi Wang, Chunhua Shen, Yan Yan, Hong-Yuan Mark Liao
The deep CNN model is then designed to extract features from several image cropping datasets, upon which the cropping bounding boxes are predicted by the proposed CCR method.
no code implementations • 1 Dec 2017 • Zifeng Wu, Chunhua Shen, Anton Van Den Hengel
We propose an approach to semantic (image) segmentation that reduces the computational costs by a factor of 25 with limited impact on the quality of results.
no code implementations • 21 Nov 2017 • Jun-Jie Zhang, Qi Wu, Chunhua Shen, Jian Zhang, Jianfeng Lu, Anton Van Den Hengel
Despite significant progress in a variety of vision-and-language problems, developing a method capable of asking intelligent, goal-oriented questions about images is proven to be an inscrutable challenge.
no code implementations • CVPR 2018 • Qi Wu, Peng Wang, Chunhua Shen, Ian Reid, Anton Van Den Hengel
The Visual Dialogue task requires an agent to engage in a conversation about an image with a human.
Ranked #4 on Visual Dialog on VisDial v0.9 val
no code implementations • 19 Nov 2017 • Jun-Jie Zhang, Qi Wu, Jian Zhang, Chunhua Shen, Jianfeng Lu
These comments can be a description of the image, or some objects, attributes, scenes in it, which are normally used as the user-provided tags.
no code implementations • CVPR 2018 • Bohan Zhuang, Qi Wu, Chunhua Shen, Ian Reid, Anton Van Den Hengel
To this end we propose a unified framework, the ParalleL AttentioN (PLAN) network, to discover the object in an image that is being referred to in variable length natural expression descriptions, from short phrases query to long multi-round dialogs.
no code implementations • 11 Jul 2017 • Xinlong Wang, Zhipeng Man, Mingyu You, Chunhua Shen
Our experimental results on a few data sets demonstrate the effectiveness of using GAN images: an improvement of 7. 5% over a strong baseline with moderate-sized real data being available.
no code implementations • 26 Sep 2017 • Hui Li, Peng Wang, Chunhua Shen
In contrast to existing approaches which take license plate detection and recognition as two separate tasks and settle them step by step, our method jointly solves these two tasks by a single network.
no code implementations • 8 May 2016 • Yuanzhouhan Cao, Zifeng Wu, Chunhua Shen
Then we train fully convolutional deep residual networks to predict the depth label of each pixel.
no code implementations • 17 Jun 2016 • Peng Wang, Qi Wu, Chunhua Shen, Anton Van Den Hengel, Anthony Dick
We evaluate several baseline models on the FVQA dataset, and describe a novel model which is capable of reasoning about an image on the basis of supporting facts.
Ranked #2 on Visual Question Answering (VQA) on F-VQA
no code implementations • 25 May 2017 • Tong Shen, Guosheng Lin, Lingqiao Liu, Chunhua Shen, Ian Reid
Training a Fully Convolutional Network (FCN) for semantic segmentation requires a large number of masks with pixel level labelling, which involves a large amount of human labour and time for annotation.
no code implementations • 3 Aug 2017 • Lei Zhang, Wei Wei, Qinfeng Shi, Chunhua Shen, Anton Van Den Hengel, Yanning Zhang
The prior for the non-low-rank structure is established based on a mixture of Gaussians which is shown to be flexible enough, and powerful enough, to inform the completion process for a variety of real tensor data.
no code implementations • 25 Jul 2017 • Ruoxi Deng, Tianqi Zhao, Chunhua Shen, Shengjun Liu
We study the problem of estimating the relative depth order of point pairs in a monocular image.
no code implementations • 20 Jul 2017 • Xiu-Shen Wei, Chen-Lin Zhang, Jianxin Wu, Chunhua Shen, Zhi-Hua Zhou
Reusable model design becomes desirable with the rapid expansion of computer vision and machine learning applications.
Ranked #11 on Single-object discovery on COCO_20k
no code implementations • 18 Jul 2017 • Ruizhi Qiao, Lingqiao Liu, Chunhua Shen, Anton Van Den Hengel
To overcome this visual-semantic discrepancy, this work proposes an objective function to re-align the distributed word embeddings with visual information by learning a neural network to map it into a new representation called visually aligned word embedding (VAWE).
no code implementations • ICCV 2017 • Hui Li, Peng Wang, Chunhua Shen
In this work, we jointly address the problem of text detection and recognition in natural scene images based on convolutional recurrent neural networks.
no code implementations • 7 Jul 2017 • Hao Lu, Zhiguo Cao, Yang Xiao, Bohan Zhuang, Chunhua Shen
To our knowledge, this is the first time that a plant-related counting problem is considered using computer vision technologies under unconstrained field-based environment.
no code implementations • 28 May 2017 • Bohan Zhuang, Qi Wu, Chunhua Shen, Ian Reid, Anton Van Den Hengel
In addressing this problem we first construct a large-scale human-centric visual relationship detection dataset (HCVRD), which provides many more types of relationship annotation (nearly 10K categories) than the previous released datasets.
Human-Object Interaction Detection Relationship Detection +1
no code implementations • 8 May 2017 • Xiu-Shen Wei, Chen-Lin Zhang, Yao Li, Chen-Wei Xie, Jianxin Wu, Chunhua Shen, Zhi-Hua Zhou
Reusable model design becomes desirable with the rapid expansion of machine learning applications.
no code implementations • 10 Mar 2016 • Guosheng Lin, Chunhua Shen, Anton Van Den Hengel, Ian Reid
We formulate deep structured models by combining CNNs and Conditional Random Fields (CRFs) for learning the patch-patch context between image regions.
no code implementations • 18 Mar 2017 • Bohan Zhuang, Lingqiao Liu, Chunhua Shen, Ian Reid
Recognizing how objects interact with each other is a crucial task in visual recognition.
no code implementations • 28 Mar 2017 • Wei Liu, Xiaogang Chen, Chunhua Shen, Jingyi Yu, Qiang Wu, Jie Yang
In this paper, we propose a general framework for Robust Guided Image Filtering (RGIF), which contains a data term and a smoothness term, to solve the two issues mentioned above.
no code implementations • 26 Mar 2017 • Fayao Liu, Guosheng Lin, Ruizhi Qiao, Chunhua Shen
In this fashion, we easily achieve nonlinear learning of potential functions on both unary and pairwise terms in CRFs.
no code implementations • 4 Dec 2016 • Jun-Jie Zhang, Qi Wu, Chunhua Shen, Jian Zhang, Jianfeng Lu
Recent state-of-the-art approaches to multi-label image classification exploit the label dependencies in an image, at global level, largely improving the labeling capacity.
no code implementations • 25 Jan 2017 • Tong Shen, Guosheng Lin, Chunhua Shen, Ian Reid
Semantic image segmentation is a fundamental task in image understanding.
no code implementations • 18 Jan 2017 • Zetao Chen, Adam Jacobson, Niko Sunderhauf, Ben Upcroft, Lingqiao Liu, Chunhua Shen, Ian Reid, Michael Milford
The success of deep learning techniques in the computer vision domain has triggered a range of initial investigations into their utility for visual place recognition, all using generic features from networks that were trained for other types of recognition tasks.
1 code implementation • 16 Jan 2016 • Lingqiao Liu, Peng Wang, Chunhua Shen, Lei Wang, Anton Van Den Hengel, Chao Wang, Heng Tao Shen
To handle this limitation, in this paper we break the convention which assumes that a local feature is drawn from one of few Gaussian distributions.
no code implementations • 4 Oct 2015 • Lingqiao Liu, Chunhua Shen, Anton Van Den Hengel
Most of these studies adopt activations from a single DCNN layer, usually the fully-connected layer, as the image representation.
no code implementations • 9 Mar 2016 • Qi Wu, Chunhua Shen, Anton Van Den Hengel, Peng Wang, Anthony Dick
Much recent progress in Vision-to-Language problems has been achieved through a combination of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs).
no code implementations • CVPR 2017 • Peng Wang, Qi Wu, Chunhua Shen, Anton Van Den Hengel
To train a method to perform even one of these operations accurately from {image, question, answer} tuples would be challenging, but to aim to achieve them all with a limited set of such training data seems ambitious at best.
no code implementations • CVPR 2017 • Dong Gong, Jie Yang, Lingqiao Liu, Yanning Zhang, Ian Reid, Chunhua Shen, Anton Van Den Hengel, Qinfeng Shi
The critical observation underpinning our approach is thus that learning the motion flow instead allows the model to focus on the cause of the blur, irrespective of the image content.
no code implementations • CVPR 2017 • Yao Li, Guosheng Lin, Bohan Zhuang, Lingqiao Liu, Chunhua Shen, Anton Van Den Hengel
In this work, we propose to model the relational information between people as a sequence prediction task.
no code implementations • CVPR 2017 • Bohan Zhuang, Lingqiao Liu, Yao Li, Chunhua Shen, Ian Reid
Large-scale datasets have driven the rapid development of deep neural networks for visual recognition.
no code implementations • 6 Oct 2016 • Yuanzhouhan Cao, Chunhua Shen, Heng Tao Shen
Augmenting RGB data with measured depth has been shown to improve the performance of a range of tasks in computer vision including object detection and semantic segmentation.
no code implementations • CVPR 2016 • Bohan Zhuang, Guosheng Lin, Chunhua Shen, Ian Reid
To solve the first stage, we design a large-scale high-order binary codes inference algorithm to reduce the high-order objective to a standard binary quadratic problem such that graph cuts can be used to efficiently infer the binary code which serve as the label of each training datum.
no code implementations • 15 Mar 2016 • Yao Li, Linqiao Liu, Chunhua Shen, Anton Van Den Hengel
More specifically, we observe that given a set of object proposals extracted from an image that contains the object of interest, an accurate strongly supervised object detector should give high scores to only a small minority of proposals, and low scores to most of them.
no code implementations • 22 Jun 2016 • Jiewei Cao, Lingqiao Liu, Peng Wang, Zi Huang, Chunhua Shen, Heng Tao Shen
Instance retrieval requires one to search for images that contain a particular object within a large corpus.