Search Results for author: Hongsheng Li

Found 134 papers, 63 papers with code

RBF-Softmax: Learning Deep Representative Prototypes with Radial Basis Function Softmax

1 code implementation ECCV 2020 Xiao Zhang, Rui Zhao, Yu Qiao, Hongsheng Li

To address this problem, this paper introduces a novel Radial Basis Function (RBF) distances to replace the commonly used inner products in the softmax loss function, such that it can adaptively assign losses to regularize the intra-class and inter-class distances by reshaping the relative differences, and thus creating more representative prototypes of classes to improve optimization.

Container: Context Aggregation Networks

1 code implementation NeurIPS 2021 Peng Gao, Jiasen Lu, Hongsheng Li, Roozbeh Mottaghi, Aniruddha Kembhavi

Convolutional neural networks (CNNs) are ubiquitous in computer vision, with a myriad of effective and efficient variations.

Instance Segmentation Object Detection +2

DominoSearch: Find layer-wise fine-grained N:M sparse schemes from dense neural networks

1 code implementation NeurIPS 2021 Wei Sun, Aojun Zhou, Sander Stuijk, Rob Wijnhoven, Andrew Oakleigh Nelson, Hongsheng Li, Henk Corporaal

However, the existing N:M algorithms only address the challenge of how to train N:M sparse neural networks in a uniform fashion (i. e. every layer has the same N:M sparsity) and suffer from a significant accuracy drop for high sparsity (i. e. when sparsity > 80\%).

Network Pruning

Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling

1 code implementation6 Nov 2021 Renrui Zhang, Rongyao Fang, Wei zhang, Peng Gao, Kunchang Li, Jifeng Dai, Yu Qiao, Hongsheng Li

To further enhance CLIP's few-shot capability, CLIP-Adapter proposed to fine-tune a lightweight residual feature adapter and significantly improves the performance for few-shot classification.

Fine-tuning Language Modelling +1

Rethinking Noise Synthesis and Modeling in Raw Denoising

1 code implementation ICCV 2021 Yi Zhang, Hongwei Qin, Xiaogang Wang, Hongsheng Li

However, the real raw image noise is contributed by many noise sources and varies greatly among different sensors.

Image Denoising

CLIP-Adapter: Better Vision-Language Models with Feature Adapters

1 code implementation9 Oct 2021 Peng Gao, Shijie Geng, Renrui Zhang, Teli Ma, Rongyao Fang, Yongfeng Zhang, Hongsheng Li, Yu Qiao

Large-scale contrastive vision-language pre-training has shown significant progress in visual representation learning.

Fine-tuning Representation Learning

UniNet: Unified Architecture Search with Convolution, Transformer, and MLP

no code implementations8 Oct 2021 Jihao Liu, Hongsheng Li, Guanglu Song, Xin Huang, Yu Liu

Recently, transformer and multi-layer perceptron (MLP) architectures have achieved impressive results on various vision tasks.

Object Detection Semantic Segmentation

FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting

1 code implementation ICCV 2021 Rui Liu, Hanming Deng, Yangyi Huang, Xiaoyu Shi, Lewei Lu, Wenxiu Sun, Xiaogang Wang, Jifeng Dai, Hongsheng Li

On the contrary, the soft composition operates by stitching different patches into a whole feature map where pixels in overlapping regions are summed up.

Tokenization Video Inpainting

Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation

1 code implementation ICCV 2021 Ziniu Wan, Zhengjia Li, Maoqing Tian, Jianbo Liu, Shuai Yi, Hongsheng Li

To this end, we propose Multi-level Attention Encoder-Decoder Network (MAED), including a Spatial-Temporal Encoder (STE) and a Kinematic Topology Decoder (KTD) to model multi-level attentions in a unified framework.

Pose Estimation

Spatio-Temporal Interaction Graph Parsing Networks for Human-Object Interaction Recognition

no code implementations19 Aug 2021 Ning Wang, Guangming Zhu, Liang Zhang, Peiyi Shen, Hongsheng Li, Cong Hua

With the effective spatio-temporal relationship modeling, it is possible not only to uncover contextual information in each frame but also to directly capture inter-time dependencies.

Human-Object Interaction Detection

LIGA-Stereo: Learning LiDAR Geometry Aware Representations for Stereo-based 3D Detector

no code implementations ICCV 2021 Xiaoyang Guo, Shaoshuai Shi, Xiaogang Wang, Hongsheng Li

Compared with the state-of-the-art stereo detector, our method has improved the 3D detection performance of cars, pedestrians, cyclists by 10. 44%, 5. 69%, 5. 97% mAP respectively on the official KITTI benchmark.

Stereo Matching

LIF-Seg: LiDAR and Camera Image Fusion for 3D LiDAR Semantic Segmentation

no code implementations17 Aug 2021 Lin Zhao, Hui Zhou, Xinge Zhu, Xiao Song, Hongsheng Li, Wenbing Tao

However, two major issues of the fusion between camera and LiDAR hinder its performance, \ie, how to effectively fuse these two modalities and how to precisely align them (suffering from the weak spatiotemporal synchronization problem).

Autonomous Driving LIDAR Semantic Segmentation +2

ST3D++: Denoised Self-training for Unsupervised Domain Adaptation on 3D Object Detection

no code implementations15 Aug 2021 Jihan Yang, Shaoshuai Shi, Zhe Wang, Hongsheng Li, Xiaojuan Qi

These specific designs enable the detector to be trained on meticulously refined pseudo labeled target data with denoised training signals, and thus effectively facilitate adapting an object detector to a target domain without requiring annotations.

3D Object Detection Data Augmentation +2

Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization

1 code implementation ICCV 2021 Linjiang Huang, Liang Wang, Hongsheng Li

In this paper, we present a framework named FAC-Net based on the I3D backbone, on which three branches are appended, named class-wise foreground classification branch, class-agnostic attention branch and multiple instance learning branch.

Multiple Instance Learning Video Understanding +2

Instance-weighted Central Similarity for Multi-label Image Retrieval

no code implementations11 Aug 2021 Zhiwei Zhang, Allen Peng, Hongsheng Li

Deep hashing has been widely applied to large-scale image retrieval by encoding high-dimensional data points into binary codes for efficient retrieval.

Multi-Label Image Retrieval

Unsupervised Domain Adaptive 3D Detection with Multi-Level Consistency

no code implementations ICCV 2021 Zhipeng Luo, Zhongang Cai, Changqing Zhou, Gongjie Zhang, Haiyu Zhao, Shuai Yi, Shijian Lu, Hongsheng Li, Shanghang Zhang, Ziwei Liu

In addition, existing 3D domain adaptive detection methods often assume prior access to the target domain annotations, which is rarely feasible in the real world.

3D Object Detection Autonomous Driving

Hybrid Supervision Learning for Pathology Whole Slide Image Classification

no code implementations2 Jul 2021 Jiahui Li, Wen Chen, Xiaodi Huang, Zhiqiang Hu, Qi Duan, Hongsheng Li, Dimitris N. Metaxas, Shaoting Zhang

To handle this problem, we propose a hybrid supervision learning framework for this kind of high resolution images with sufficient image-level coarse annotations and a few pixel-level fine labels.

Classification Image Classification +2

Refining Pseudo Labels with Clustering Consensus over Generations for Unsupervised Object Re-identification

no code implementations CVPR 2021 Xiao Zhang, Yixiao Ge, Yu Qiao, Hongsheng Li

Unsupervised object re-identification targets at learning discriminative representations for object retrieval without any annotations.

Scalable Transformers for Neural Machine Translation

no code implementations4 Jun 2021 Peng Gao, Shijie Geng, Yu Qiao, Xiaogang Wang, Jifeng Dai, Hongsheng Li

In this paper, we propose a novel Scalable Transformers, which naturally contains sub-Transformers of different scales and have shared parameters.

Machine Translation Translation

Container: Context Aggregation Network

2 code implementations2 Jun 2021 Peng Gao, Jiasen Lu, Hongsheng Li, Roozbeh Mottaghi, Aniruddha Kembhavi

Convolutional neural networks (CNNs) are ubiquitous in computer vision, with a myriad of effective and efficient variations.

Image Classification Instance Segmentation +3

FNAS: Uncertainty-Aware Fast Neural Architecture Search

no code implementations25 May 2021 Jihao Liu, Ming Zhang, Yangting Sun, Boxiao Liu, Guanglu Song, Yu Liu, Hongsheng Li

Further, an architecture knowledge pool together with a block similarity function is proposed to utilize parameter knowledge and reduces the searching time by 2 times.

Fairness Neural Architecture Search

VS-Net: Voting with Segmentation for Visual Localization

1 code implementation CVPR 2021 Zhaoyang Huang, Han Zhou, Yijin Li, Bangbang Yang, Yan Xu, Xiaowei Zhou, Hujun Bao, Guofeng Zhang, Hongsheng Li

To address this problem, we propose a novel visual localization framework that establishes 2D-to-3D correspondences between the query image and the 3D map with a series of learnable scene-specific landmarks.

Semantic Segmentation Visual Localization

Inverting Generative Adversarial Renderer for Face Reconstruction

no code implementations CVPR 2021 Jingtan Piao, Keqiang Sun, KwanYee Lin, Quan Wang, Hongsheng Li

Since the GAR learns to model the complicated real-world image, instead of relying on the simplified graphics rules, it is capable of producing realistic images, which essentially inhibits the domain-shift noise in training and optimization.

Face Reconstruction

Decoupled Spatial-Temporal Transformer for Video Inpainting

1 code implementation14 Apr 2021 Rui Liu, Hanming Deng, Yangyi Huang, Xiaoyu Shi, Lewei Lu, Wenxiu Sun, Xiaogang Wang, Jifeng Dai, Hongsheng Li

Seamless combination of these two novel designs forms a better spatial-temporal attention scheme and our proposed model achieves better performance than state-of-the-art video inpainting approaches with significant boosted efficiency.

Video Inpainting

Semantic Scene Completion via Integrating Instances and Scene in-the-Loop

no code implementations CVPR 2021 Yingjie Cai, Xuesong Chen, Chao Zhang, Kwan-Yee Lin, Xiaogang Wang, Hongsheng Li

The key insight is that we decouple the instances from a coarsely completed semantic scene instead of a raw input image to guide the reconstruction of instances and the overall scene.

indoor scene understanding Scene Understanding

LIFE: Lighting Invariant Flow Estimation

no code implementations7 Apr 2021 Zhaoyang Huang, Xiaokun Pan, Runsen Xu, Yan Xu, Ka Chun Cheung, Guofeng Zhang, Hongsheng Li

However, local image contents are inevitably ambiguous and error-prone during the cross-image feature matching process, which hinders downstream tasks.

Structure from Motion

Fixing the Teacher-Student Knowledge Discrepancy in Distillation

no code implementations31 Mar 2021 Jiangfan Han, Mengya Gao, Yujie Wang, Quanquan Li, Hongsheng Li, Xiaogang Wang

To solve this problem, in this paper, we propose a novel student-dependent distillation method, knowledge consistent distillation, which makes teacher's knowledge more consistent with the student and provides the best suitable knowledge to different student networks for distillation.

Knowledge Distillation Object Detection

AutoLoss-Zero: Searching Loss Functions from Scratch for Generic Tasks

no code implementations25 Mar 2021 Hao Li, Tianwen Fu, Jifeng Dai, Hongsheng Li, Gao Huang, Xizhou Zhu

However, the automatic design of loss functions for generic tasks with various evaluation metrics remains under-investigated.

DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial Network

1 code implementation CVPR 2021 Rui Liu, Yixiao Ge, Ching Lam Choi, Xiaogang Wang, Hongsheng Li

Conditional generative adversarial networks (cGANs) target at synthesizing diverse images given the input conditions and latent codes, but unfortunately, they usually suffer from the issue of mode collapse.

Contrastive Learning Image Generation

ST3D: Self-training for Unsupervised Domain Adaptation on 3D Object Detection

1 code implementation CVPR 2021 Jihan Yang, Shaoshuai Shi, Zhe Wang, Hongsheng Li, Xiaojuan Qi

Then, the detector is iteratively improved on the target domain by alternatively conducting two steps, which are the pseudo label updating with the developed quality-aware triplet memory bank and the model training with curriculum data augmentation.

3D Object Detection Data Augmentation +1

Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch

2 code implementations ICLR 2021 Aojun Zhou, Yukun Ma, Junnan Zhu, Jianbo Liu, Zhijie Zhang, Kun Yuan, Wenxiu Sun, Hongsheng Li

In this paper, we are the first to study training from scratch an N:M fine-grained structured sparse network, which can maintain the advantages of both unstructured fine-grained sparsity and structured coarse-grained sparsity simultaneously on specifically designed GPUs.

Fast Convergence of DETR with Spatially Modulated Co-Attention

2 code implementations19 Jan 2021 Peng Gao, Minghang Zheng, Xiaogang Wang, Jifeng Dai, Hongsheng Li

The recently proposed Detection Transformer (DETR) model successfully applies Transformer to objects detection and achieves comparable performance with two-stage object detection frameworks, such as Faster-RCNN.

Object Detection

Progressive Correspondence Pruning by Consensus Learning

no code implementations ICCV 2021 Chen Zhao, Yixiao Ge, Feng Zhu, Rui Zhao, Hongsheng Li, Mathieu Salzmann

Correspondence selection aims to correctly select the consistent matches (inliers) from an initial set of putative correspondences.

Denoising Pose Estimation

Self-supervised Temporal Learning

no code implementations1 Jan 2021 Hao Shao, Yu Liu, Hongsheng Li

Inspired by spatial-based contrastive SSL, we show that significant improvement can be achieved by a proposed temporal-based contrastive learning approach, which includes three novel and efficient modules: temporal augmentations, temporal memory bank and SSTL loss.

Contrastive Learning Self-Supervised Learning +2

Towards Overcoming False Positives in Visual Relationship Detection

no code implementations23 Dec 2020 Daisheng Jin, Xiao Ma, Chongzhi Zhang, Yizhuo Zhou, Jiashu Tao, Mingyuan Zhang, Haiyu Zhao, Shuai Yi, Zhoujun Li, Xianglong Liu, Hongsheng Li

We observe that during training, the relationship proposal distribution is highly imbalanced: most of the negative relationship proposals are easy to identify, e. g., the inaccurate object detection, which leads to the under-fitting of low-frequency difficult proposals.

Graph Attention Human-Object Interaction Detection +2

A Holistically-Guided Decoder for Deep Representation Learning with Applications to Semantic Segmentation and Object Detection

no code implementations18 Dec 2020 Jianbo Liu, Sijie Ren, Yuanjie Zheng, Xiaogang Wang, Hongsheng Li

With the proposed holistically-guided decoder, we implement the EfficientFCN architecture for semantic segmentation and HGD-FPN for object detection and instance segmentation.

Instance Segmentation Object Detection +2

REFINE: Prediction Fusion Network for Panoptic Segmentation

no code implementations15 Dec 2020 Jiawei Ren, Cunjun Yu, Zhongang Cai, Mingyuan Zhang, Chongsong Chen, Haiyu Zhao, Shuai Yi, Hongsheng Li

Panoptic segmentation aims at generating pixel-wise class and instance predictions for each pixel in the input image, which is a challenging task and far more complicated than naively fusing the semantic and instance segmentation results.

Instance Segmentation Panoptic Segmentation

End-to-End Object Detection with Adaptive Clustering Transformer

1 code implementation18 Nov 2020 Minghang Zheng, Peng Gao, Renrui Zhang, Kunchang Li, Xiaogang Wang, Hongsheng Li, Hao Dong

In this paper, a novel variant of transformer named Adaptive Clustering Transformer(ACT) has been proposed to reduce the computation cost for high-resolution input.

Object Detection

SelfVoxeLO: Self-supervised LiDAR Odometry with Voxel-based Deep Neural Networks

no code implementations19 Oct 2020 Yan Xu, Zhaoyang Huang, Kwan-Yee Lin, Xinge Zhu, Jianping Shi, Hujun Bao, Guofeng Zhang, Hongsheng Li

To suit our network to self-supervised learning, we design several novel loss functions that utilize the inherent properties of LiDAR point clouds.

Self-Supervised Learning

PV-RCNN: The Top-Performing LiDAR-only Solutions for 3D Detection / 3D Tracking / Domain Adaptation of Waymo Open Dataset Challenges

1 code implementation28 Aug 2020 Shaoshuai Shi, Chaoxu Guo, Jihan Yang, Hongsheng Li

In this technical report, we present the top-performing LiDAR-only solutions for 3D detection, 3D tracking and domain adaptation three tracks in Waymo Open Dataset Challenges 2020.

3D Object Detection Domain Adaptation

EfficientFCN: Holistically-guided Decoding for Semantic Segmentation

no code implementations ECCV 2020 Jianbo Liu, Junjun He, Jiawei Zhang, Jimmy S. Ren, Hongsheng Li

State-of-the-art semantic segmentation algorithms are mostly based on dilated Fully Convolutional Networks (dilatedFCN), which adopt dilated convolutions in the backbone networks to extract high-resolution feature maps for achieving high-performance segmentation performance.

Semantic Segmentation

Multi-organ Segmentation via Co-training Weight-averaged Models from Few-organ Datasets

no code implementations17 Aug 2020 Rui Huang, Yuanjie Zheng, Zhiqiang Hu, Shaoting Zhang, Hongsheng Li

In most scenarios, one might obtain annotations of a single or a few organs from one training set, and obtain annotations of the the other organs from another set of training images.

Open-Edit: Open-Domain Image Manipulation with Open-Vocabulary Instructions

1 code implementation ECCV 2020 Xihui Liu, Zhe Lin, Jianming Zhang, Handong Zhao, Quan Tran, Xiaogang Wang, Hongsheng Li

We propose a novel algorithm, named Open-Edit, which is the first attempt on open-domain image manipulation with open-vocabulary instructions.

Image Manipulation

Cylinder3D: An Effective 3D Framework for Driving-scene LiDAR Semantic Segmentation

2 code implementations4 Aug 2020 Hui Zhou, Xinge Zhu, Xiao Song, Yuexin Ma, Zhe Wang, Hongsheng Li, Dahua Lin

A straightforward solution to tackle the issue of 3D-to-2D projection is to keep the 3D representation and process the points in the 3D space.

3D Semantic Segmentation LIDAR Semantic Segmentation

Balanced Meta-Softmax for Long-Tailed Visual Recognition

1 code implementation NeurIPS 2020 Jiawei Ren, Cunjun Yu, Shunan Sheng, Xiao Ma, Haiyu Zhao, Shuai Yi, Hongsheng Li

In our experiments, we demonstrate that Balanced Meta-Softmax outperforms state-of-the-art long-tailed classification solutions on both visual recognition and instance segmentation tasks.

General Classification Instance Segmentation +2

Complementary Boundary Generator with Scale-Invariant Relation Modeling for Temporal Action Localization: Submission to ActivityNet Challenge 2020

no code implementations20 Jul 2020 Haisheng Su, Jinyuan Feng, Hao Shao, Zhenyu Jiang, Manyuan Zhang, Wei Wu, Yu Liu, Hongsheng Li, Junjie Yan

Specifically, in order to generate high-quality proposals, we consider several factors including the video feature encoder, the proposal generator, the proposal-proposal relations, the scale imbalance, and ensemble strategy.

Temporal Action Localization

Bi-directional Cross-Modality Feature Propagation with Separation-and-Aggregation Gate for RGB-D Semantic Segmentation

2 code implementations ECCV 2020 Xiaokang Chen, Kwan-Yee Lin, Jingbo Wang, Wayne Wu, Chen Qian, Hongsheng Li, Gang Zeng

Depth information has proven to be a useful cue in the semantic segmentation of RGB-D images for providing a geometric counterpart to the RGB representation.

Semantic Segmentation

Dynamic Graph Representation Learning for Video Dialog via Multi-Modal Shuffled Transformers

no code implementations8 Jul 2020 Shijie Geng, Peng Gao, Moitreya Chatterjee, Chiori Hori, Jonathan Le Roux, Yongfeng Zhang, Hongsheng Li, Anoop Cherian

Given an input video, its associated audio, and a brief caption, the audio-visual scene aware dialog (AVSD) task requires an agent to indulge in a question-answer dialog with a human about the audio-visual content.

Graph Representation Learning

1st place solution for AVA-Kinetics Crossover in AcitivityNet Challenge 2020

1 code implementation16 Jun 2020 Siyu Chen, Junting Pan, Guanglu Song, Manyuan Zhang, Hao Shao, Ziyi Lin, Jing Shao, Hongsheng Li, Yu Liu

This technical report introduces our winning solution to the spatio-temporal action localization track, AVA-Kinetics Crossover, in ActivityNet Challenge 2020.

Spatio-Temporal Action Localization Temporal Action Localization

Actor-Context-Actor Relation Network for Spatio-Temporal Action Localization

1 code implementation CVPR 2021 Junting Pan, Siyu Chen, Mike Zheng Shou, Yu Liu, Jing Shao, Hongsheng Li

We propose to explicitly model the Actor-Context-Actor Relation, which is the relation between two actors based on their interactions with the context.

Action Detection Action Recognition +2

Self-supervising Fine-grained Region Similarities for Large-scale Image Localization

1 code implementation ECCV 2020 Yixiao Ge, Haibo Wang, Feng Zhu, Rui Zhao, Hongsheng Li

The task of large-scale retrieval-based image localization is to estimate the geographical location of a query image by recognizing its nearest reference images from a city-scale dataset.

Image Retrieval

3D Sketch-aware Semantic Scene Completion via Semi-supervised Structure Prior

2 code implementations CVPR 2020 Xiaokang Chen, Kwan-Yee Lin, Chen Qian, Gang Zeng, Hongsheng Li

To this end, we first propose a novel 3D sketch-aware feature embedding to explicitly encode geometric information effectively and efficiently.

Structured Domain Adaptation with Online Relation Regularization for Unsupervised Person Re-ID

2 code implementations14 Mar 2020 Yixiao Ge, Feng Zhu, Rui Zhao, Hongsheng Li

An improved pseudo-label-based encoder can therefore be obtained by jointly training the source-to-target translated images with ground-truth identities and target-domain images with pseudo identities.

Translation Unsupervised Domain Adaptation +1

MagnifierNet: Towards Semantic Adversary and Fusion for Person Re-identification

1 code implementation25 Feb 2020 Yushi Lan, Yu-An Liu, Maoqing Tian, Xinchi Zhou, Xuesen Zhang, Shuai Yi, Hongsheng Li

Meanwhile, we introduce "Semantic Fusion Branch" to filter out irrelevant noises by selectively fusing semantic region information sequentially.

Person Re-Identification

Structure-Feature based Graph Self-adaptive Pooling

no code implementations30 Jan 2020 Liang Zhang, Xudong Wang, Hongsheng Li, Guangming Zhu, Peiyi Shen, Ping Li, Xiaoyuan Lu, Syed Afaq Ali Shah, Mohammed Bennamoun

To solve these problems mentioned above, we propose a novel graph self-adaptive pooling method with the following objectives: (1) to construct a reasonable pooled graph topology, structure and feature information of the graph are considered simultaneously, which provide additional veracity and objectivity in node selection; and (2) to make the pooled nodes contain sufficiently effective graph information, node feature information is aggregated before discarding the unimportant nodes; thus, the selected nodes contain information from neighbor nodes, which can enhance the use of features of the unselected nodes.

Graph Classification

Mutual Mean-Teaching: Pseudo Label Refinery for Unsupervised Domain Adaptation on Person Re-identification

2 code implementations ICLR 2020 Yixiao Ge, Dapeng Chen, Hongsheng Li

In order to mitigate the effects of noisy pseudo labels, we propose to softly refine the pseudo labels in the target domain by proposing an unsupervised framework, Mutual Mean-Teaching (MMT), to learn better features from the target domain via off-line refined hard pseudo labels and on-line refined soft pseudo labels in an alternative training manner.

Unsupervised Domain Adaptation Unsupervised Person Re-Identification

PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection

5 code implementations CVPR 2020 Shaoshuai Shi, Chaoxu Guo, Li Jiang, Zhe Wang, Jianping Shi, Xiaogang Wang, Hongsheng Li

We present a novel and high-performance 3D object detection framework, named PointVoxel-RCNN (PV-RCNN), for accurate 3D object detection from point clouds.

3D Object Detection

Depth Completion from Sparse LiDAR Data with Depth-Normal Constraints

no code implementations ICCV 2019 Yan Xu, Xinge Zhu, Jianping Shi, Guofeng Zhang, Hujun Bao, Hongsheng Li

Most of existing methods directly train a network to learn a mapping from sparse depth inputs to dense depth maps, which has difficulties in utilizing the 3D geometric constraints and handling the practical sensor noises.

Autonomous Driving Depth Completion

Multi-modality Latent Interaction Network for Visual Question Answering

no code implementations ICCV 2019 Peng Gao, Haoxuan You, Zhanpeng Zhang, Xiaogang Wang, Hongsheng Li

The proposed module learns the cross-modality relationships between latent visual and language summarizations, which summarize visual regions and question into a small number of latent representations to avoid modeling uninformative individual region-word relations.

Language Modelling Question Answering +1

FocusNet: Imbalanced Large and Small Organ Segmentation with an End-to-End Deep Neural Network for Head and Neck CT Images

no code implementations28 Jul 2019 Yunhe Gao, Rui Huang, Ming Chen, Zhe Wang, Jincheng Deng, YuanYuan Chen, Yiwei Yang, Jie Zhang, Chanjuan Tao, Hongsheng Li

In this paper, we propose an end-to-end deep neural network for solving the problem of imbalanced large and small organ segmentation in head and neck (HaN) CT images.

Signet Ring Cell Detection With a Semi-supervised Learning Framework

1 code implementation9 Jul 2019 Jiahui Li, Shuang Yang, Xiaodi Huang, Qian Da, Xiaoqun Yang, Zhiqiang Hu, Qi Duan, Chaofu Wang, Hongsheng Li

Our framework achieves accurate signet ring cell detection and can be readily applied in the clinical trails.

From Points to Parts: 3D Object Detection from Point Cloud with Part-aware and Part-aggregation Network

4 code implementations8 Jul 2019 Shaoshuai Shi, Zhe Wang, Jianping Shi, Xiaogang Wang, Hongsheng Li

3D object detection from LiDAR point cloud is a challenging problem in 3D scene understanding and has many practical applications.

3D Object Detection Scene Understanding

Generalizing Monocular 3D Human Pose Estimation in the Wild

1 code implementation11 Apr 2019 Luyang Wang, Yan Chen, Zhenhua Guo, Keyuan Qian, Mude Lin, Hongsheng Li, Jimmy S. Ren

We observe that recent innovation in this area mainly focuses on new techniques that explicitly address the generalization issue when using this dataset, because this database is constructed in a highly controlled environment with limited human subjects and background variations.

3D Pose Estimation Monocular 3D Human Pose Estimation

Conditional Adversarial Generative Flow for Controllable Image Synthesis

no code implementations CVPR 2019 Rui Liu, Yu Liu, Xinyu Gong, Xiaogang Wang, Hongsheng Li

Flow-based generative models show great potential in image synthesis due to its reversible pipeline and exact log-likelihood target, yet it suffers from weak ability for conditional image synthesis, especially for multi-label or unaware conditions.

Image Generation

Group-wise Correlation Stereo Network

1 code implementation CVPR 2019 Xiaoyang Guo, Kai Yang, Wukui Yang, Xiaogang Wang, Hongsheng Li

Previous works built cost volumes with cross-correlation or concatenation of left and right features across all disparity levels, and then a 2D or 3D convolutional neural network is utilized to regress the disparity maps.

Autonomous Driving Stereo Matching +1

Unsupervised Cross-spectral Stereo Matching by Learning to Synthesize

1 code implementation4 Mar 2019 Mingyang Liang, Xiaoyang Guo, Hongsheng Li, Xiaogang Wang, You Song

Unsupervised cross-spectral stereo matching aims at recovering disparity given cross-spectral image pairs without any supervision in the form of ground truth disparity or depth.

Image-to-Image Translation Stereo Matching +2

Improving Referring Expression Grounding with Cross-modal Attention-guided Erasing

no code implementations CVPR 2019 Xihui Liu, ZiHao Wang, Jing Shao, Xiaogang Wang, Hongsheng Li

Referring expression grounding aims at locating certain objects or persons in an image with a referring expression, where the key challenge is to comprehend and align various types of information from visual and textual domain, such as visual attributes, location and interactions with surrounding regions.

A^2-Net: Molecular Structure Estimation from Cryo-EM Density Volumes

no code implementations3 Jan 2019 Kui Xu, Zhe Wang, Jiangping Shi, Hongsheng Li, Qiangfeng Cliff Zhang

Constructing of molecular structural models from Cryo-Electron Microscopy (Cryo-EM) density volumes is the critical last step of structure determination by Cryo-EM technologies.

Electron Microscopy Pose Estimation +1

Dynamic Fusion with Intra- and Inter- Modality Attention Flow for Visual Question Answering

no code implementations13 Dec 2018 Gao Peng, Zhengkai Jiang, Haoxuan You, Pan Lu, Steven Hoi, Xiaogang Wang, Hongsheng Li

It can robustly capture the high-level interactions between language and vision domains, thus significantly improves the performance of visual question answering.

Question Answering Visual Question Answering

FD-GAN: Pose-guided Feature Distilling GAN for Robust Person Re-identification

2 code implementations NeurIPS 2018 Yixiao Ge, Zhuowan Li, Haiyu Zhao, Guojun Yin, Shuai Yi, Xiaogang Wang, Hongsheng Li

Our proposed FD-GAN achieves state-of-the-art performance on three person reID datasets, which demonstrates that the effectiveness and robust feature distilling capability of the proposed FD-GAN.

Person Re-Identification

Learning Monocular Depth by Distilling Cross-domain Stereo Networks

1 code implementation ECCV 2018 Xiaoyang Guo, Hongsheng Li, Shuai Yi, Jimmy Ren, Xiaogang Wang

Monocular depth estimation aims at estimating a pixelwise depth map for a single image, which has wide applications in scene understanding and autonomous driving.

Autonomous Driving Monocular Depth Estimation +3

Question-Guided Hybrid Convolution for Visual Question Answering

no code implementations ECCV 2018 Peng Gao, Pan Lu, Hongsheng Li, Shuang Li, Yikang Li, Steven Hoi, Xiaogang Wang

Most state-of-the-art VQA methods fuse the high-level textual and visual features from the neural network and abandon the visual spatial information when learning multi-modal features. To address these problems, question-guided kernels generated from the input question are designed to convolute with visual features for capturing the textual and visual relationship in the early stage.

Question Answering Visual Question Answering

Generative Adversarial Frontal View to Bird View Synthesis

no code implementations1 Aug 2018 Xinge Zhu, Zhichao Yin, Jianping Shi, Hongsheng Li, Dahua Lin

Due to the large gap and severe deformation between the frontal view and bird view, generating a bird view image from a single frontal view is challenging.

Bird View Synthesis Homography Estimation +1

Deep Group-shuffling Random Walk for Person Re-identification

1 code implementation CVPR 2018 Yantao Shen, Hongsheng Li, Tong Xiao, Shuai Yi, Dapeng Chen, Xiaogang Wang

Person re-identification aims at finding a person of interest in an image gallery by comparing the probe image of this person with all the gallery images.

Person Re-Identification

Person Re-identification with Deep Similarity-Guided Graph Neural Network

no code implementations ECCV 2018 Yantao Shen, Hongsheng Li, Shuai Yi, Dapeng Chen, Xiaogang Wang

However, existing person re-identification models mostly estimate the similarities of different image pairs of probe and gallery images independently while ignores the relationship information between different probe-gallery pairs.

Person Re-Identification

Video Person Re-Identification With Competitive Snippet-Similarity Aggregation and Co-Attentive Snippet Embedding

no code implementations CVPR 2018 Dapeng Chen, Hongsheng Li, Tong Xiao, Shuai Yi, Xiaogang Wang

The attention weights are obtained based on a query feature, which is learned from the whole probe snippet by an LSTM network, making the resulting embeddings less affected by noisy frames.

Video-Based Person Re-Identification

Group Consistent Similarity Learning via Deep CRF for Person Re-Identification

no code implementations CVPR 2018 Dapeng Chen, Dan Xu, Hongsheng Li, Nicu Sebe, Xiaogang Wang

Extensive experiments demonstrate the effectiveness of our model that combines DNN and CRF for learning robust multi-scale local similarities.

Person Re-Identification

Learnable Histogram: Statistical Context Features for Deep Neural Networks

no code implementations25 Apr 2018 Zhe Wang, Hongsheng Li, Wanli Ouyang, Xiaogang Wang

Statistical features, such as histogram, Bag-of-Words (BoW) and Fisher Vector, were commonly used with hand-crafted features in conventional classification methods, but attract less attention since the popularity of deep learning methods.

General Classification Object Detection +1

3D Human Pose Estimation in the Wild by Adversarial Learning

no code implementations CVPR 2018 Wei Yang, Wanli Ouyang, Xiaolong Wang, Jimmy Ren, Hongsheng Li, Xiaogang Wang

Instead of defining hard-coded rules to constrain the pose estimation results, we design a novel multi-source discriminator to distinguish the predicted 3D poses from the ground-truth, which helps to enforce the pose estimator to generate anthropometrically valid poses even with images in the wild.

Monocular 3D Human Pose Estimation

Single View Stereo Matching

1 code implementation CVPR 2018 Yue Luo, Jimmy Ren, Mude Lin, Jiahao Pang, Wenxiu Sun, Hongsheng Li, Liang Lin

The resulting model outperforms all the previous monocular depth estimation methods as well as the stereo block matching method in the challenging KITTI dataset by only using a small number of real training data.

Ranked #11 on Monocular Depth Estimation on KITTI Eigen split (using extra training data)

Monocular Depth Estimation Stereo Matching +1

StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks

16 code implementations19 Oct 2017 Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, Dimitris Metaxas

In this paper, we propose Stacked Generative Adversarial Networks (StackGAN) aiming at generating high-resolution photo-realistic images.

Text-to-Image Generation

Zoom-in-Net: Deep Mining Lesions for Diabetic Retinopathy Detection

no code implementations14 Jun 2017 Zhe Wang, Yanxin Yin, Jianping Shi, Wei Fang, Hongsheng Li, Xiaogang Wang

We propose a convolution neural network based algorithm for simultaneously diagnosing diabetic retinopathy and highlighting suspicious regions.

Diabetic Retinopathy Detection

Learning Deep Representations for Scene Labeling with Semantic Context Guided Supervision

no code implementations8 Jun 2017 Zhe Wang, Hongsheng Li, Wanli Ouyang, Xiaogang Wang

The experiments show that our proposed method makes deep models learn more discriminative feature representations without increasing model size or complexity.

Scene Labeling

Object Detection in Videos with Tubelet Proposal Networks

no code implementations CVPR 2017 Kai Kang, Hongsheng Li, Tong Xiao, Wanli Ouyang, Junjie Yan, Xihui Liu, Xiaogang Wang

Object detection in videos has drawn increasing attention recently with the introduction of the large-scale ImageNet VID dataset.

Object Detection Object Tracking

Learning Spatial Regularization with Image-level Supervisions for Multi-label Image Classification

2 code implementations CVPR 2017 Feng Zhu, Hongsheng Li, Wanli Ouyang, Nenghai Yu, Xiaogang Wang

Analysis of the learned SRN model demonstrates that it can effectively capture both semantic and spatial relations of labels for improving classification performance.

Classification General Classification +2

Person Search with Natural Language Description

1 code implementation CVPR 2017 Shuang Li, Tong Xiao, Hongsheng Li, Bolei Zhou, Dayu Yue, Xiaogang Wang

Searching persons in large-scale image databases with the query of natural language description has important applications in video surveillance.

Person Search Text based Person Retrieval

Crafting GBD-Net for Object Detection

1 code implementation8 Oct 2016 Xingyu Zeng, Wanli Ouyang, Junjie Yan, Hongsheng Li, Tong Xiao, Kun Wang, Yu Liu, Yucong Zhou, Bin Yang, Zhe Wang, Hui Zhou, Xiaogang Wang

The effectiveness of GBD-Net is shown through experiments on three object detection datasets, ImageNet, Pascal VOC2007 and Microsoft COCO.

Object Detection

End-To-End Learning of Deformable Mixture of Parts and Deep Convolutional Neural Networks for Human Pose Estimation

no code implementations CVPR 2016 Wei Yang, Wanli Ouyang, Hongsheng Li, Xiaogang Wang

In this paper, we propose a novel end-to-end framework for human pose estimation that combines DCNNs with the expressive deformable mixture of parts.

Pose Estimation

Learning Deep Feature Representations with Domain Guided Dropout for Person Re-identification

1 code implementation CVPR 2016 Tong Xiao, Hongsheng Li, Wanli Ouyang, Xiaogang Wang

Learning generic and robust feature representations with data from multiple domains for the same problem is of great value, especially for the problems that have multiple datasets but none of them are large enough to provide abundant data variations.

Person Re-Identification

Object Detection from Video Tubelets with Convolutional Neural Networks

1 code implementation CVPR 2016 Kai Kang, Wanli Ouyang, Hongsheng Li, Xiaogang Wang

Deep Convolution Neural Networks (CNNs) have shown impressive performance in various vision tasks such as image classification, object detection and semantic segmentation.

Image Classification Object Detection +2

Structured Feature Learning for Pose Estimation

no code implementations CVPR 2016 Xiao Chu, Wanli Ouyang, Hongsheng Li, Xiaogang Wang

In this paper, we propose a structured feature learning framework to reason the correlations among body joints at the feature level in human pose estimation.

Pose Estimation

Pedestrian Travel Time Estimation in Crowded Scenes

no code implementations ICCV 2015 Shuai Yi, Hongsheng Li, Xiaogang Wang

In this paper, we target on the problem of estimating the statistic of pedestrian travel time within a period from an entrance to a destination in a crowded scene.

Scene Understanding

Cross-Scene Crowd Counting via Deep Convolutional Neural Networks

no code implementations CVPR 2015 Cong Zhang, Hongsheng Li, Xiaogang Wang, Xiaokang Yang

To address this problem, we propose a deep convolutional neural network (CNN) for crowd counting, and it is trained alternatively with two related learning objectives, crowd density and crowd count.

Crowd Counting

Saliency Detection by Multi-Context Deep Learning

no code implementations CVPR 2015 Rui Zhao, Wanli Ouyang, Hongsheng Li, Xiaogang Wang

Low-level saliency cues or priors do not produce good enough saliency detection results especially when the salient object presents in a low-contrast background with confusing visual appearance.

Image Classification RGB Salient Object Detection +2

Understanding Pedestrian Behaviors From Stationary Crowd Groups

no code implementations CVPR 2015 Shuai Yi, Hongsheng Li, Xiaogang Wang

Pedestrian behavior modeling and analysis is important for crowd scene understanding and has various applications in video surveillance.

Event Detection Scene Understanding

Highly Efficient Forward and Backward Propagation of Convolutional Neural Networks for Pixelwise Classification

no code implementations15 Dec 2014 Hongsheng Li, Rui Zhao, Xiaogang Wang

The proposed algorithms eliminate all the redundant computation in convolution and pooling on images by introducing novel d-regularly sparse kernels.

Classification General Classification +3

Fast Iteratively Reweighted Least Squares Algorithms for Analysis-Based Sparsity Reconstruction

no code implementations18 Nov 2014 Chen Chen, Junzhou Huang, Lei He, Hongsheng Li

The convergence rate of the proposed algorithm is almost the same as that of the traditional IRLS algorithms, that is, exponentially fast.

Compressive Sensing

DeepID-Net: multi-stage and deformable deep convolutional neural networks for object detection

no code implementations11 Sep 2014 Wanli Ouyang, Ping Luo, Xingyu Zeng, Shi Qiu, Yonglong Tian, Hongsheng Li, Shuo Yang, Zhe Wang, Yuanjun Xiong, Chen Qian, Zhenyao Zhu, Ruohui Wang, Chen-Change Loy, Xiaogang Wang, Xiaoou Tang

In the proposed new deep architecture, a new deformation constrained pooling (def-pooling) layer models the deformation of object parts with geometric constraint and penalty.

Object Detection

Cannot find the paper you are looking for? You can Submit a new open access paper.