no code implementations • ECCV 2020 • Guo-Sen Xie, Li Liu, Fan Zhu, Fang Zhao, Zheng Zhang, Yazhou Yao, Jie Qin, Ling Shao
To exploit the progressive interactions among these regions, we represent them as a region graph, on which the parts relation reasoning is performed with graph convolutions, thus leading to our PRR branch.
no code implementations • ECCV 2020 • Jiaxin Chen, Jie Qin, Yuming Shen, Li Liu, Fan Zhu, Ling Shao
This paper proposes a novel method for 3D shape representation learning, namely Hyperbolic Embedded Attentive Representation (HEAR).
no code implementations • 23 Feb 2022 • Fan Zhu, Ruixing Jia, Lei Yang, Youcan Yan, Zheng Wang, Jia Pan, Wenping Wang
We propose a deep visuo-tactile model for realtime estimation of the liquid inside a deformable container in a proprioceptive way. We fuse two sensory modalities, i. e., the raw visual inputs from the RGB camera and the tactile cues from our specific tactile sensor without any extra sensor calibrations. The robotic system is well controlled and adjusted based on the estimation model in real time.
no code implementations • 22 Jun 2021 • Hao Huang, Boulbaba Ben Amor, Xichan Lin, Fan Zhu, Yi Fang
Our ResNet-TW (Deep Residual Network for Time Warping) tackles the alignment problem by compositing a flow of incremental diffeomorphic mappings.
no code implementations • 22 Jun 2021 • Hao Huang, Boulbaba Ben Amor, Xichan Lin, Fan Zhu, Yi Fang
In this work, we introduce a joint geometric-neural networks approach for comparing, deforming and generating 3D protein structures.
no code implementations • 2 May 2021 • Xin Xu, Yu Dong, Fan Zhu
For example, humans are good at interactive tasks (while autonomous driving systems usually do not), but we are often incompetent for tasks with strict precision demands.
1 code implementation • CVPR 2020 • Yichao Yan, Jie Qin1, Jiaxin Chen, Li Liu, Fan Zhu, Ying Tai, Ling Shao
In each hypergraph, different temporal granularities are captured by hyperedges that connect a set of graph nodes (i. e., part-based features) across different temporal ranges.
Ranked #6 on Person Re-Identification on iLIDS-VID
1 code implementation • 29 Apr 2021 • Yichao Yan, Jie Qin, Bingbing Ni, Jiaxin Chen, Li Liu, Fan Zhu, Wei-Shi Zheng, Xiaokang Yang, Ling Shao
Extensive experiments on the novel dataset as well as three existing datasets clearly demonstrate the effectiveness of the proposed framework for both group-based re-id tasks.
1 code implementation • CVPR 2021 • Yichao Yan, Jinpeng Li, Jie Qin, Song Bai, Shengcai Liao, Li Liu, Fan Zhu, Ling Shao
Person search aims to simultaneously localize and identify a query person from realistic, uncropped images, which can be regarded as the unified task of pedestrian detection and person re-identification (re-id).
Ranked #10 on Person Search on CUHK-SYSU
1 code implementation • ICCV 2021 • Bing Wang, Changhao Chen, Zhaopeng Cui, Jie Qin, Chris Xiaoxuan Lu, Zhengdi Yu, Peijun Zhao, Zhen Dong, Fan Zhu, Niki Trigoni, Andrew Markham
Accurately describing and detecting 2D and 3D keypoints is crucial to establishing correspondences across images and point clouds.
1 code implementation • CVPR 2021 • Lei Huang, Yi Zhou, Li Liu, Fan Zhu, Ling Shao
Results show that GW consistently improves the performance of different architectures, with absolute gains of $1. 02\%$ $\sim$ $1. 49\%$ in top-1 accuracy on ImageNet and $1. 82\%$ $\sim$ $3. 21\%$ in bounding box AP on COCO.
no code implementations • 27 Sep 2020 • Lei Huang, Jie Qin, Yi Zhou, Fan Zhu, Li Liu, Ling Shao
Normalization techniques are essential for accelerating the training and improving the generalization of deep neural networks (DNNs), and have successfully been used in various applications.
3 code implementations • 20 Jun 2020 • Ionut Cosmin Duta, Li Liu, Fan Zhu, Ling Shao
This work introduces pyramidal convolution (PyConv), which is capable of processing the input at multiple filter scales.
Ranked #69 on Semantic Segmentation on ADE20K val
2 code implementations • 10 Apr 2020 • Ionut Cosmin Duta, Li Liu, Fan Zhu, Ling Shao
We successfully train a 404-layer deep CNN on the ImageNet dataset and a 3002-layer network on CIFAR-10 and CIFAR-100, while the baseline is not able to converge at such extreme depths.
1 code implementation • CVPR 2020 • Lei Huang, Li Liu, Fan Zhu, Diwen Wan, Zehuan Yuan, Bo Li, Ling Shao
Orthogonality is widely used for training deep neural networks (DNNs) due to its ability to maintain all singular values of the Jacobian close to 1 and reduce redundancy in representation.
1 code implementation • CVPR 2020 • Lei Huang, Lei Zhao, Yi Zhou, Fan Zhu, Li Liu, Ling Shao
Our work originates from the observation that while various whitening transformations equivalently improve the conditioning, they show significantly different behaviors in discriminative scenarios and training Generative Adversarial Networks (GANs).
2 code implementations • CVPR 2020 • Yuming Shen, Jie Qin, Jiaxin Chen, Mengyang Yu, Li Liu, Fan Zhu, Fumin Shen, Ling Shao
One bottleneck (i. e., binary codes) conveys the high-level intrinsic data structure captured by the code-driven graph to the other (i. e., continuous variables for low-level detail information), which in turn propagates the updated network feedback for the encoder to learn more discriminative binary codes.
no code implementations • ECCV 2020 • Lei Huang, Jie Qin, Li Liu, Fan Zhu, Ling Shao
To this end, we propose layer-wise conditioning analysis, which explores the optimization landscape with respect to each layer independently.
no code implementations • NeurIPS 2019 • Lizhong Ding, Mengyang Yu, Li Liu, Fan Zhu, Yong liu, Yu Li, Ling Shao
DEAN can be interpreted as a GOF game between two generative networks, where one explicit generative network learns an energy-based distribution that fits the real data, and the other implicit generative network is trained by minimizing a GOF test statistic between the energy-based distribution and the generated data, such that the underlying distribution of the generated data is close to the energy-based distribution.
no code implementations • 16 Sep 2019 • Huan Xiong, Mengyang Yu, Li Liu, Fan Zhu, Fumin Shen, Ling Shao
Binary optimization, a representative subclass of discrete optimization, plays an important role in mathematical optimization and has various applications in computer vision and machine learning.
1 code implementation • 26 Aug 2019 • Yuming Shen, Jie Qin, Jiaxin Chen, Li Liu, Fan Zhu
Recent binary representation learning models usually require sophisticated binary optimization, similarity measure or even generative models as auxiliaries.
2 code implementations • ICCV 2019 • Ziqin Wang, Jun Xu, Li Liu, Fan Zhu, Ling Shao
Specifically, to integrate the insights of matching based and propagation based methods, we employ an encoder-decoder framework to learn pixel-level similarity and segmentation in an end-to-end manner.
1 code implementation • 17 Jun 2019 • Jun Xu, Yuan Huang, Ming-Ming Cheng, Li Liu, Fan Zhu, Zhou Xu, Ling Shao
A simple but useful observation on our NAC is: as long as the noise is weak, it is feasible to learn a self-supervised network only with the corrupted image, approximating the optimal parameters of a supervised network learned with pairs of noisy and clean images.
1 code implementation • 17 Jun 2019 • Yingkun Hou, Jun Xu, Mingxia Liu, Guanghai Liu, Li Liu, Fan Zhu, Ling Shao
This is motivated by the fact that finding closely similar pixels is more feasible than similar patches in natural images, which can be used to enhance image denoising performance.
1 code implementation • 16 Jun 2019 • Jun Xu, Yingkun Hou, Dongwei Ren, Li Liu, Fan Zhu, Mengyang Yu, Haoqian Wang, Ling Shao
A novel Structure and Texture Aware Retinex (STAR) model is further proposed for illumination and reflectance decomposition of a single image.
no code implementations • 5 Jun 2019 • Hongyu Li, Fan Zhu, Junhua Qiu
Firstly, since document image quality assessment is more interested in text, we propose a text line based framework to estimate document image quality, which is composed of three stages: text line detection, text line quality prediction, and overall quality assessment.
3 code implementations • 30 May 2019 • Syed Waqas Zamir, Aditya Arora, Akshita Gupta, Salman Khan, Guolei Sun, Fahad Shahbaz Khan, Fan Zhu, Ling Shao, Gui-Song Xia, Xiang Bai
Compared to existing small-scale aerial image based instance segmentation datasets, iSAID contains 15$\times$ the number of object categories and 5$\times$ the number of instances.
Ranked #1 on Object Detection on iSAID
no code implementations • 27 May 2019 • Yazhou Yao, Zeren Sun, Fumin Shen, Li Liu, Li-Min Wang, Fan Zhu, Lizhong Ding, Gangshan Wu, Ling Shao
To address this issue, we present an adaptive multi-model framework that resolves polysemy by visual disambiguation.
5 code implementations • CVPR 2019 • Lei Huang, Yi Zhou, Fan Zhu, Li Liu, Ling Shao
With the support of SND, we provide natural explanations to several phenomena from the perspective of optimization, e. g., why group-wise whitening of DBN generally outperforms full-whitening and why the accuracy of BN degenerates with reduced batch sizes.
Ranked #8 on Robust Object Detection on DWD
no code implementations • ECCV 2018 • Zheng Zhang, Li Liu, Jie Qin, Fan Zhu, Fumin Shen, Yong Xu, Ling Shao, Heng Tao Shen
How to economically cluster large-scale multi-view images is a long-standing problem in computer vision.
1 code implementation • ECCV 2018 • Diwen Wan, Fumin Shen, Li Liu, Fan Zhu, Jie Qin, Ling Shao, Heng Tao Shen
Despite the remarkable success of Convolutional Neural Networks (CNNs) on generalized visual tasks, high computational and memory costs restrict their comprehensive applications on consumer electronics (e. g., portable or smart wearable devices).
1 code implementation • ECCV 2018 • Jingyi Zhang, Fumin Shen, Li Liu, Fan Zhu, Mengyang Yu, Ling Shao, Heng Tao Shen, Luc van Gool
The generative model learns a mapping that the distributions of sketches can be indistinguishable from the distribution of natural images using an adversarial loss, and simultaneously learns an inverse mapping based on the cycle consistency loss in order to enhance the indistinguishability.
1 code implementation • 30 Aug 2018 • Fan Zhu, Lin Ma, Xin Xu, Dingfeng Guo, Xiao Cui, Qi Kong
Since manual calibration is not sustainable once entering into mass production stage for industrial purposes, we here introduce a machine-learning based auto-calibration system for autonomous driving vehicles.
1 code implementation • 20 Jul 2018 • Haoyang Fan, Fan Zhu, Changchun Liu, Liangliang Zhang, Li Zhuang, Dong Li, Weicheng Zhu, Jiangtao Hu, Hongye Li, Qi Kong
In this manuscript, we introduce a real-time motion planning system based on the Baidu Apollo (open source) autonomous driving platform.
no code implementations • 11 Jul 2018 • Hongyu Li, Fan Zhu, Junhua Qiu
Document image quality assessment (DIQA) is an important and challenging problem in real applications.
Image Quality Assessment Optical Character Recognition (OCR)
no code implementations • 22 Aug 2017 • Yazhou Yao, Jian Zhang, Fumin Shen, Li Liu, Fan Zhu, Dongxiang Zhang, Heng-Tao Shen
To eliminate manual annotation, in this work, we propose a novel image dataset construction framework by employing multiple textual queries.
no code implementations • CVPR 2017 • Jin Xie, Guoxian Dai, Fan Zhu, Yi Fang
For 3D shapes, we then compute the Wasserstein barycenters of deep features of multiple projections to form a barycentric representation.
no code implementations • CVPR 2015 • Yi Fang, Jin Xie, Guoxian Dai, Meng Wang, Fan Zhu, Tiantian Xu, Edward Wong
Shape descriptor is a concise yet informative representation that provides a 3D object with an identification as a member of some category.
no code implementations • CVPR 2015 • Jin Xie, Yi Fang, Fan Zhu, Edward Wong
Then, by imposing the Fisher discrimination criterion on the neurons in the hidden layer, we developed a novel discriminative deep auto-encoder for shape feature learning.
no code implementations • CVPR 2014 • Fan Zhu, Zhuolin Jiang, Ling Shao
We present a novel object recognition framework based on multiple figure-ground hypotheses with a large object spatial support, generated by bottom-up processes and mid-level cues in an unsupervised manner.