Search Results for author: Hongsheng Li

Found 262 papers, 162 papers with code

Preconditioning for Accelerated Iteratively Reweighted Least Squares in Structured Sparsity Reconstruction

no code implementations • CVPR 2014 • Chen Chen, Junzhou Huang, Lei He, Hongsheng Li

In this paper, we propose a novel algorithm for structured sparsity reconstruction.

Paper
Add Code

DeepID-Net: multi-stage and deformable deep convolutional neural networks for object detection

no code implementations • 11 Sep 2014 • Wanli Ouyang, Ping Luo, Xingyu Zeng, Shi Qiu, Yonglong Tian, Hongsheng Li, Shuo Yang, Zhe Wang, Yuanjun Xiong, Chen Qian, Zhenyao Zhu, Ruohui Wang, Chen-Change Loy, Xiaogang Wang, Xiaoou Tang

In the proposed new deep architecture, a new deformation constrained pooling (def-pooling) layer models the deformation of object parts with geometric constraint and penalty.

Object object-detection +1

Paper
Add Code

Fast Iteratively Reweighted Least Squares Algorithms for Analysis-Based Sparsity Reconstruction

no code implementations • 18 Nov 2014 • Chen Chen, Junzhou Huang, Lei He, Hongsheng Li

The convergence rate of the proposed algorithm is almost the same as that of the traditional IRLS algorithms, that is, exponentially fast.

Compressive Sensing

Paper
Add Code

Highly Efficient Forward and Backward Propagation of Convolutional Neural Networks for Pixelwise Classification

no code implementations • 15 Dec 2014 • Hongsheng Li, Rui Zhao, Xiaogang Wang

The proposed algorithms eliminate all the redundant computation in convolution and pooling on images by introducing novel d-regularly sparse kernels.

Classification General Classification +5

Paper
Add Code

DeepID-Net: Deformable Deep Convolutional Neural Networks for Object Detection

no code implementations • CVPR 2015 • Wanli Ouyang, Xiaogang Wang, Xingyu Zeng, Shi Qiu, Ping Luo, Yonglong Tian, Hongsheng Li, Shuo Yang, Zhe Wang, Chen-Change Loy, Xiaoou Tang

In this paper, we propose deformable deep convolutional neural networks for generic object detection.

Object object-detection +1

Paper
Add Code

Saliency Detection by Multi-Context Deep Learning

no code implementations • CVPR 2015 • Rui Zhao, Wanli Ouyang, Hongsheng Li, Xiaogang Wang

Low-level saliency cues or priors do not produce good enough saliency detection results especially when the salient object presents in a low-contrast background with confusing visual appearance.

Image Classification object-detection +3

Paper
Add Code

Understanding Pedestrian Behaviors From Stationary Crowd Groups

no code implementations • CVPR 2015 • Shuai Yi, Hongsheng Li, Xiaogang Wang

Pedestrian behavior modeling and analysis is important for crowd scene understanding and has various applications in video surveillance.

Event Detection Scene Understanding

Paper
Add Code

Cross-Scene Crowd Counting via Deep Convolutional Neural Networks

no code implementations • CVPR 2015 • Cong Zhang, Hongsheng Li, Xiaogang Wang, Xiaokang Yang

To address this problem, we propose a deep convolutional neural network (CNN) for crowd counting, and it is trained alternatively with two related learning objectives, crowd density and crowd count.

Ranked #15 on Crowd Counting on WorldExpo’10

Crowd Counting

Paper
Add Code

Pedestrian Travel Time Estimation in Crowded Scenes

no code implementations • ICCV 2015 • Shuai Yi, Hongsheng Li, Xiaogang Wang

In this paper, we target on the problem of estimating the statistic of pedestrian travel time within a period from an entrance to a destination in a crowded scene.

Blocking Scene Understanding +1

Paper
Add Code

Structured Feature Learning for Pose Estimation

no code implementations • CVPR 2016 • Xiao Chu, Wanli Ouyang, Hongsheng Li, Xiaogang Wang

In this paper, we propose a structured feature learning framework to reason the correlations among body joints at the feature level in human pose estimation.

Pose Estimation

Paper
Add Code

T-CNN: Tubelets with Convolutional Neural Networks for Object Detection from Videos

1 code implementation • 9 Apr 2016 • Kai Kang, Hongsheng Li, Junjie Yan, Xingyu Zeng, Bin Yang, Tong Xiao, Cong Zhang, Zhe Wang, Ruohui Wang, Xiaogang Wang, Wanli Ouyang

Temporal and contextual information of videos are not fully investigated and utilized.

Novel Object Detection Object +3

369

Paper
Code

Object Detection from Video Tubelets with Convolutional Neural Networks

1 code implementation • CVPR 2016 • Kai Kang, Wanli Ouyang, Hongsheng Li, Xiaogang Wang

Deep Convolution Neural Networks (CNNs) have shown impressive performance in various vision tasks such as image classification, object detection and semantic segmentation.

Image Classification Object +4

182

Paper
Code

Learning Deep Feature Representations with Domain Guided Dropout for Person Re-identification

1 code implementation • CVPR 2016 • Tong Xiao, Hongsheng Li, Wanli Ouyang, Xiaogang Wang

Learning generic and robust feature representations with data from multiple domains for the same problem is of great value, especially for the problems that have multiple datasets but none of them are large enough to provide abundant data variations.

Person Re-Identification

232

Paper
Code

End-To-End Learning of Deformable Mixture of Parts and Deep Convolutional Neural Networks for Human Pose Estimation

no code implementations • CVPR 2016 • Wei Yang, Wanli Ouyang, Hongsheng Li, Xiaogang Wang

In this paper, we propose a novel end-to-end framework for human pose estimation that combines DCNNs with the expressive deformable mixture of parts.

Pose Estimation

Paper
Add Code

Crafting GBD-Net for Object Detection

1 code implementation • 8 Oct 2016 • Xingyu Zeng, Wanli Ouyang, Junjie Yan, Hongsheng Li, Tong Xiao, Kun Wang, Yu Liu, Yucong Zhou, Bin Yang, Zhe Wang, Hui Zhou, Xiaogang Wang

The effectiveness of GBD-Net is shown through experiments on three object detection datasets, ImageNet, Pascal VOC2007 and Microsoft COCO.

Object object-detection +1

182

Paper
Code

CRF-CNN: Modeling Structured Information in Human Pose Estimation

no code implementations • NeurIPS 2016 • Xiao Chu, Wanli Ouyang, Hongsheng Li, Xiaogang Wang

In a classical neural network, there is no message passing between neurons in the same layer.

Pose Estimation

Paper
Add Code

StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks

21 code implementations • ICCV 2017 • Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, Dimitris Metaxas

Synthesizing high-quality images from text descriptions is a challenging problem in computer vision and has many practical applications.

Ranked #3 on Text-to-Image Generation on Oxford 102 Flowers (Inception score metric)

Text-to-Image Generation

1,849

Paper
Code

Person Search with Natural Language Description

1 code implementation • CVPR 2017 • Shuang Li, Tong Xiao, Hongsheng Li, Bolei Zhou, Dayu Yue, Xiaogang Wang

Searching persons in large-scale image databases with the query of natural language description has important applications in video surveillance.

Attribute Person Search +1

142

Paper
Code

Learning Spatial Regularization with Image-level Supervisions for Multi-label Image Classification

2 code implementations • CVPR 2017 • Feng Zhu, Hongsheng Li, Wanli Ouyang, Nenghai Yu, Xiaogang Wang

Analysis of the learned SRN model demonstrates that it can effectively capture both semantic and spatial relations of labels for improving classification performance.

Ranked #6 on Multi-Label Classification on NUS-WIDE

Classification General Classification +2

142

Paper
Code

Object Detection in Videos with Tubelet Proposal Networks

1 code implementation • CVPR 2017 • Kai Kang, Hongsheng Li, Tong Xiao, Wanli Ouyang, Junjie Yan, Xihui Liu, Xiaogang Wang

Object detection in videos has drawn increasing attention recently with the introduction of the large-scale ImageNet VID dataset.

Object object-detection +2

Paper
Code

Learning Deep Representations for Scene Labeling with Semantic Context Guided Supervision

no code implementations • 8 Jun 2017 • Zhe Wang, Hongsheng Li, Wanli Ouyang, Xiaogang Wang

The experiments show that our proposed method makes deep models learn more discriminative feature representations without increasing model size or complexity.

Scene Labeling

Paper
Add Code

Zoom-in-Net: Deep Mining Lesions for Diabetic Retinopathy Detection

no code implementations • 14 Jun 2017 • Zhe Wang, Yanxin Yin, Jianping Shi, Wei Fang, Hongsheng Li, Xiaogang Wang

We propose a convolution neural network based algorithm for simultaneously diagnosing diabetic retinopathy and highlighting suspicious regions.

Clustering Diabetic Retinopathy Detection

Paper
Add Code

Learning Feature Pyramids for Human Pose Estimation

4 code implementations • ICCV 2017 • Wei Yang, Shuang Li, Wanli Ouyang, Hongsheng Li, Xiaogang Wang

We investigate our method on two standard benchmarks for human pose estimation.

Ranked #6 on Pose Estimation on Leeds Sports Poses

Pose Estimation

221

Paper
Code

Identity-Aware Textual-Visual Matching with Latent Co-attention

no code implementations • ICCV 2017 • Shuang Li, Tong Xiao, Hongsheng Li, Wei Yang, Xiaogang Wang

The stage-2 CNN-LSTM network refines the matching results with a latent co-attention mechanism.

Sentence Text based Person Retrieval

Paper
Add Code

Online Multi-Object Tracking Using CNN-based Single Object Tracker with Spatial-Temporal Attention Mechanism

no code implementations • ICCV 2017 • Qi Chu, Wanli Ouyang, Hongsheng Li, Xiaogang Wang, Bin Liu, Nenghai Yu

The visibility map of the target is learned and used for inferring the spatial attention map.

Computational Efficiency Multi-Object Tracking +2

Paper
Add Code

Learning Deep Neural Networks for Vehicle Re-ID with Visual-spatio-temporal Path Proposals

no code implementations • ICCV 2017 • Yantao Shen, Tong Xiao, Hongsheng Li, Shuai Yi, Xiaogang Wang

Vehicle re-identification is an important problem and has many applications in video surveillance and intelligent transportation.

Person Re-Identification Vehicle Re-Identification

Paper
Add Code

Orientation Invariant Feature Embedding and Spatial Temporal Regularization for Vehicle Re-Identification

no code implementations • ICCV 2017 • Zhongdao Wang, Luming Tang, Xihui Liu, Zhuliang Yao, Shuai Yi, Jing Shao, Junjie Yan, Shengjin Wang, Hongsheng Li, Xiaogang Wang

In our vehicle ReID framework, an orientation invariant feature embedding module and a spatial-temporal regularization module are proposed.

Retrieval Vehicle Re-Identification

Paper
Add Code

StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks

16 code implementations • 19 Oct 2017 • Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, Dimitris Metaxas

In this paper, we propose Stacked Generative Adversarial Networks (StackGAN) aiming at generating high-resolution photo-realistic images.

Ranked #5 on Text-to-Image Generation on Oxford 102 Flowers

Generative Adversarial Network Text-to-Image Generation

1,849

Paper
Code

Co-attending Free-form Regions and Detections with Multi-modal Multiplicative Feature Embedding for Visual Question Answering

1 code implementation • 18 Nov 2017 • Pan Lu, Hongsheng Li, Wei zhang, Jianyong Wang, Xiaogang Wang

Existing VQA methods mainly adopt the visual attention mechanism to associate the input question with corresponding image regions for effective question answering.

Ranked #2 on Visual Question Answering (VQA) on COCO Visual Question Answering (VQA) real images 1.0 open ended

Visual Question Answering

Paper
Code

Single View Stereo Matching

1 code implementation • CVPR 2018 • Yue Luo, Jimmy Ren, Mude Lin, Jiahao Pang, Wenxiu Sun, Hongsheng Li, Liang Lin

The resulting model outperforms all the previous monocular depth estimation methods as well as the stereo block matching method in the challenging KITTI dataset by only using a small number of real training data.

Ranked #42 on Monocular Depth Estimation on KITTI Eigen split

Monocular Depth Estimation Stereo Matching +1

280

Paper
Code

Show, Tell and Discriminate: Image Captioning by Self-retrieval with Partially Labeled Data

no code implementations • ECCV 2018 • Xihui Liu, Hongsheng Li, Jing Shao, Dapeng Chen, Xiaogang Wang

The aim of image captioning is to generate captions by machine to describe image contents.

Image Captioning Retrieval

Paper
Add Code

3D Human Pose Estimation in the Wild by Adversarial Learning

no code implementations • CVPR 2018 • Wei Yang, Wanli Ouyang, Xiaolong Wang, Jimmy Ren, Hongsheng Li, Xiaogang Wang

Instead of defining hard-coded rules to constrain the pose estimation results, we design a novel multi-source discriminator to distinguish the predicted 3D poses from the ground-truth, which helps to enforce the pose estimator to generate anthropometrically valid poses even with images in the wild.

Ranked #1 on Monocular 3D Human Pose Estimation on Human3.6M (Use Video Sequence metric)

Monocular 3D Human Pose Estimation valid

Paper
Add Code

Learnable Histogram: Statistical Context Features for Deep Neural Networks

no code implementations • 25 Apr 2018 • Zhe Wang, Hongsheng Li, Wanli Ouyang, Xiaogang Wang

Statistical features, such as histogram, Bag-of-Words (BoW) and Fisher Vector, were commonly used with hand-crafted features in conventional classification methods, but attract less attention since the popularity of deep learning methods.

General Classification object-detection +2

Paper
Add Code

Video Person Re-Identification With Competitive Snippet-Similarity Aggregation and Co-Attentive Snippet Embedding

no code implementations • CVPR 2018 • Dapeng Chen, Hongsheng Li, Tong Xiao, Shuai Yi, Xiaogang Wang

The attention weights are obtained based on a query feature, which is learned from the whole probe snippet by an LSTM network, making the resulting embeddings less affected by noisy frames.

Ranked #4 on Person Re-Identification on PRID2011

Video-Based Person Re-Identification

Paper
Add Code

Eliminating Background-Bias for Robust Person Re-Identification

no code implementations • CVPR 2018 • Maoqing Tian, Shuai Yi, Hongsheng Li, Shihua Li, Xuesen Zhang, Jianping Shi, Junjie Yan, Xiaogang Wang

State-of-the-art methods mainly utilize deep learning based approaches for learning visual features for describing person appearances.

Human Parsing Person Re-Identification

Paper
Add Code

Group Consistent Similarity Learning via Deep CRF for Person Re-Identification

no code implementations • CVPR 2018 • Dapeng Chen, Dan Xu, Hongsheng Li, Nicu Sebe, Xiaogang Wang

Extensive experiments demonstrate the effectiveness of our model that combines DNN and CRF for learning robust multi-scale local similarities.

Person Re-Identification

Paper
Add Code

Deep Continuous Conditional Random Fields with Asymmetric Inter-object Constraints for Online Multi-object Tracking

no code implementations • 4 Jun 2018 • Hui Zhou, Wanli Ouyang, Jian Cheng, Xiaogang Wang, Hongsheng Li

In addition, inter-object relations are mostly modeled in a symmetric way, which we argue is not an optimal setting.

Autonomous Driving Multi-Object Tracking +3

Paper
Add Code

Person Re-identification with Deep Similarity-Guided Graph Neural Network

no code implementations • ECCV 2018 • Yantao Shen, Hongsheng Li, Shuai Yi, Dapeng Chen, Xiaogang Wang

However, existing person re-identification models mostly estimate the similarities of different image pairs of probe and gallery images independently while ignores the relationship information between different probe-gallery pairs.

Ranked #2 on Person Re-Identification on CUHK03

Person Re-Identification Relation

Paper
Add Code

End-to-End Deep Kronecker-Product Matching for Person Re-identification

1 code implementation • CVPR 2018 • Yantao Shen, Tong Xiao, Hongsheng Li, Shuai Yi, Xiaogang Wang

Person re-identification aims to robustly measure similarities between person images.

Person Re-Identification

103

Paper
Code

Deep Group-shuffling Random Walk for Person Re-identification

1 code implementation • CVPR 2018 • Yantao Shen, Hongsheng Li, Tong Xiao, Shuai Yi, Dapeng Chen, Xiaogang Wang

Person re-identification aims at finding a person of interest in an image gallery by comparing the probe image of this person with all the gallery images.

Person Re-Identification Retrieval

103

Paper
Code

Generative Adversarial Frontal View to Bird View Synthesis

no code implementations • 1 Aug 2018 • Xinge Zhu, Zhichao Yin, Jianping Shi, Hongsheng Li, Dahua Lin

Due to the large gap and severe deformation between the frontal view and bird view, generating a bird view image from a single frontal view is challenging.

Bird View Synthesis Homography Estimation +1

Paper
Add Code

Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language Association

no code implementations • ECCV 2018 • Dapeng Chen, Hongsheng Li, Xihui Liu, Yantao Shen, Zejian yuan, Xiaogang Wang

Person re-identification is an important task that requires learning discriminative visual features for distinguishing different person identities.

Ranked #22 on Text based Person Retrieval on CUHK-PEDES

Person Re-Identification Text based Person Retrieval

Paper
Add Code

Question-Guided Hybrid Convolution for Visual Question Answering

no code implementations • ECCV 2018 • Peng Gao, Pan Lu, Hongsheng Li, Shuang Li, Yikang Li, Steven Hoi, Xiaogang Wang

Most state-of-the-art VQA methods fuse the high-level textual and visual features from the neural network and abandon the visual spatial information when learning multi-modal features. To address these problems, question-guided kernels generated from the input question are designed to convolute with visual features for capturing the textual and visual relationship in the early stage.

Ranked #14 on Visual Question Answering (VQA) on CLEVR

Question Answering Visual Question Answering

Paper
Add Code

Learning Monocular Depth by Distilling Cross-domain Stereo Networks

1 code implementation • ECCV 2018 • Xiaoyang Guo, Hongsheng Li, Shuai Yi, Jimmy Ren, Xiaogang Wang

Monocular depth estimation aims at estimating a pixelwise depth map for a single image, which has wide applications in scene understanding and autonomous driving.

Autonomous Driving Monocular Depth Estimation +3

Paper
Code

HMS-Net: Hierarchical Multi-scale Sparsity-invariant Network for Sparse Depth Completion

no code implementations • 27 Aug 2018 • Zixuan Huang, Junming Fan, Shenggan Cheng, Shuai Yi, Xiaogang Wang, Hongsheng Li

Dense depth cues are important and have wide applications in various computer vision tasks.

Ranked #10 on Depth Completion on KITTI Depth Completion

Autonomous Driving Depth Completion

Paper
Add Code

FD-GAN: Pose-guided Feature Distilling GAN for Robust Person Re-identification

2 code implementations • NeurIPS 2018 • Yixiao Ge, Zhuowan Li, Haiyu Zhao, Guojun Yin, Shuai Yi, Xiaogang Wang, Hongsheng Li

Our proposed FD-GAN achieves state-of-the-art performance on three person reID datasets, which demonstrates that the effectiveness and robust feature distilling capability of the proposed FD-GAN.

Ranked #3 on Person Re-Identification on CUHK03

Generative Adversarial Network Person Re-Identification

1,267

Paper
Code

Efficient Attention: Attention with Linear Complexities

12 code implementations • 4 Dec 2018 • Zhuoran Shen, Mingyuan Zhang, Haiyu Zhao, Shuai Yi, Hongsheng Li

Dot-product attention has wide applications in computer vision and natural language processing.

Ranked #2 on Extractive Text Summarization on GovReport

Extractive Text Summarization Image Classification +5

10,812

Paper
Code

PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud

13 code implementations • CVPR 2019 • Shaoshuai Shi, Xiaogang Wang, Hongsheng Li

In this paper, we propose PointRCNN for 3D object detection from raw point cloud.

Ranked #2 on Object Detection on KITTI Cars Moderate

object-detection Object Proposal Generation +1

1,670

Paper
Code

Dynamic Fusion with Intra- and Inter- Modality Attention Flow for Visual Question Answering

no code implementations • 13 Dec 2018 • Gao Peng, Zhengkai Jiang, Haoxuan You, Pan Lu, Steven Hoi, Xiaogang Wang, Hongsheng Li

It can robustly capture the high-level interactions between language and vision domains, thus significantly improves the performance of visual question answering.

Question Answering Visual Question Answering

Paper
Add Code

A^2-Net: Molecular Structure Estimation from Cryo-EM Density Volumes

no code implementations • 3 Jan 2019 • Kui Xu, Zhe Wang, Jiangping Shi, Hongsheng Li, Qiangfeng Cliff Zhang

Constructing of molecular structural models from Cryo-Electron Microscopy (Cryo-EM) density volumes is the critical last step of structure determination by Cryo-EM technologies.

Pose Estimation Translation

Paper
Add Code

Improving Referring Expression Grounding with Cross-modal Attention-guided Erasing

no code implementations • CVPR 2019 • Xihui Liu, ZiHao Wang, Jing Shao, Xiaogang Wang, Hongsheng Li

Referring expression grounding aims at locating certain objects or persons in an image with a referring expression, where the key challenge is to comprehend and align various types of information from visual and textual domain, such as visual attributes, location and interactions with surrounding regions.

Referring Expression

Paper
Add Code

Unsupervised Cross-spectral Stereo Matching by Learning to Synthesize

1 code implementation • 4 Mar 2019 • Mingyang Liang, Xiaoyang Guo, Hongsheng Li, Xiaogang Wang, You Song

Unsupervised cross-spectral stereo matching aims at recovering disparity given cross-spectral image pairs without any supervision in the form of ground truth disparity or depth.

Image-to-Image Translation Stereo Matching +2

Paper
Code

Group-wise Correlation Stereo Network

2 code implementations • CVPR 2019 • Xiaoyang Guo, Kai Yang, Wukui Yang, Xiaogang Wang, Hongsheng Li

Previous works built cost volumes with cross-correlation or concatenation of left and right features across all disparity levels, and then a 2D or 3D convolutional neural network is utilized to regress the disparity maps.

Autonomous Driving Stereo Matching +1

316

Paper
Code

Conditional Adversarial Generative Flow for Controllable Image Synthesis

no code implementations • CVPR 2019 • Rui Liu, Yu Liu, Xinyu Gong, Xiaogang Wang, Hongsheng Li

Flow-based generative models show great potential in image synthesis due to its reversible pipeline and exact log-likelihood target, yet it suffers from weak ability for conditional image synthesis, especially for multi-label or unaware conditions.

Image Generation

Paper
Add Code

Generalizing Monocular 3D Human Pose Estimation in the Wild

1 code implementation • 11 Apr 2019 • Luyang Wang, Yan Chen, Zhenhua Guo, Keyuan Qian, Mude Lin, Hongsheng Li, Jimmy S. Ren

We observe that recent innovation in this area mainly focuses on new techniques that explicitly address the generalization issue when using this dataset, because this database is constructed in a highly controlled environment with limited human subjects and background variations.

Ranked #64 on 3D Human Pose Estimation on Human3.6M

3D Pose Estimation Monocular 3D Human Pose Estimation

145

Paper
Code

AdaCos: Adaptively Scaling Cosine Logits for Effectively Learning Deep Face Representations

3 code implementations • CVPR 2019 • Xiao Zhang, Rui Zhao, Yu Qiao, Xiaogang Wang, Hongsheng Li

Our results show that training deep neural networks with the AdaCos loss is stable and able to achieve high face recognition accuracy.

Ranked #6 on Face Verification on MegaFace

Face Recognition Face Verification

207

Paper
Code

P2SGrad: Refined Gradients for Optimizing Deep Face Models

no code implementations • CVPR 2019 • Xiao Zhang, Rui Zhao, Junjie Yan, Mengya Gao, Yu Qiao, Xiaogang Wang, Hongsheng Li

Cosine-based softmax losses significantly improve the performance of deep face recognition networks.

Face Recognition

Paper
Add Code

From Points to Parts: 3D Object Detection from Point Cloud with Part-aware and Part-aggregation Network

6 code implementations • 8 Jul 2019 • Shaoshuai Shi, Zhe Wang, Jianping Shi, Xiaogang Wang, Hongsheng Li

3D object detection from LiDAR point cloud is a challenging problem in 3D scene understanding and has many practical applications.

3D Object Detection Object +2

4,778

Paper
Code

Signet Ring Cell Detection With a Semi-supervised Learning Framework

1 code implementation • 9 Jul 2019 • Jiahui Li, Shuang Yang, Xiaodi Huang, Qian Da, Xiaoqun Yang, Zhiqiang Hu, Qi Duan, Chaofu Wang, Hongsheng Li

Our framework achieves accurate signet ring cell detection and can be readily applied in the clinical trails.

Cell Detection

Paper
Code

FocusNet: Imbalanced Large and Small Organ Segmentation with an End-to-End Deep Neural Network for Head and Neck CT Images

no code implementations • 28 Jul 2019 • Yunhe Gao, Rui Huang, Ming Chen, Zhe Wang, Jincheng Deng, YuanYuan Chen, Yiwei Yang, Jie Zhang, Chanjuan Tao, Hongsheng Li

In this paper, we propose an end-to-end deep neural network for solving the problem of imbalanced large and small organ segmentation in head and neck (HaN) CT images.

Organ Segmentation Segmentation

Paper
Add Code

Multi-modality Latent Interaction Network for Visual Question Answering

no code implementations • ICCV 2019 • Peng Gao, Haoxuan You, Zhanpeng Zhang, Xiaogang Wang, Hongsheng Li

The proposed module learns the cross-modality relationships between latent visual and language summarizations, which summarize visual regions and question into a small number of latent representations to avoid modeling uninformative individual region-word relations.

Language Modelling Question Answering +1

Paper
Add Code

Interpolated Convolutional Networks for 3D Point Cloud Understanding

no code implementations • ICCV 2019 • Jiageng Mao, Xiaogang Wang, Hongsheng Li

Our InterpConv is shown to be permutation and sparsity invariant, and can directly handle irregular inputs.

Ranked #27 on 3D Part Segmentation on ShapeNet-Part

3D Part Segmentation 3D Point Cloud Classification +1

Paper
Add Code

CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval

1 code implementation • ICCV 2019 • Zihao Wang, Xihui Liu, Hongsheng Li, Lu Sheng, Junjie Yan, Xiaogang Wang, Jing Shao

Text-image cross-modal retrieval is a challenging task in the field of language and vision.

Ranked #9 on Image Retrieval on Flickr30K 1K test

Cross-Modal Retrieval Image Retrieval +1

123

Paper
Code

Learning to Predict Layout-to-image Conditional Convolutions for Semantic Image Synthesis

1 code implementation • NeurIPS 2019 • Xihui Liu, Guojun Yin, Jing Shao, Xiaogang Wang, Hongsheng Li

Semantic image synthesis aims at generating photorealistic images from semantic layouts.

Ranked #5 on Image-to-Image Translation on Cityscapes Labels-to-Photo

Image-to-Image Translation Semantic Segmentation

128

Paper
Code

Depth Completion from Sparse LiDAR Data with Depth-Normal Constraints

no code implementations • ICCV 2019 • Yan Xu, Xinge Zhu, Jianping Shi, Guofeng Zhang, Hujun Bao, Hongsheng Li

Most of existing methods directly train a network to learn a mapping from sparse depth inputs to dense depth maps, which has difficulties in utilizing the 3D geometric constraints and handling the practical sensor noises.

Autonomous Driving Depth Completion

Paper
Add Code

PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection

12 code implementations • CVPR 2020 • Shaoshuai Shi, Chaoxu Guo, Li Jiang, Zhe Wang, Jianping Shi, Xiaogang Wang, Hongsheng Li

We present a novel and high-performance 3D object detection framework, named PointVoxel-RCNN (PV-RCNN), for accurate 3D object detection from point clouds.

Ranked #1 on Birds Eye View Object Detection on KITTI Cyclists Easy

Object object-detection +1

4,297

Paper
Code

Mutual Mean-Teaching: Pseudo Label Refinery for Unsupervised Domain Adaptation on Person Re-identification

2 code implementations • ICLR 2020 • Yixiao Ge, Dapeng Chen, Hongsheng Li

In order to mitigate the effects of noisy pseudo labels, we propose to softly refine the pseudo labels in the target domain by proposing an unsupervised framework, Mutual Mean-Teaching (MMT), to learn better features from the target domain via off-line refined hard pseudo labels and on-line refined soft pseudo labels in an alternative training manner.

Ranked #1 on Unsupervised Person Re-Identification on Market-1501->MSMT17

Clustering Pseudo Label +2

3,126

Paper
Code

Structure-Feature based Graph Self-adaptive Pooling

1 code implementation • 30 Jan 2020 • Liang Zhang, Xudong Wang, Hongsheng Li, Guangming Zhu, Peiyi Shen, Ping Li, Xiaoyuan Lu, Syed Afaq Ali Shah, Mohammed Bennamoun

To solve these problems mentioned above, we propose a novel graph self-adaptive pooling method with the following objectives: (1) to construct a reasonable pooled graph topology, structure and feature information of the graph are considered simultaneously, which provide additional veracity and objectivity in node selection; and (2) to make the pooled nodes contain sufficiently effective graph information, node feature information is aggregated before discarding the unimportant nodes; thus, the selected nodes contain information from neighbor nodes, which can enhance the use of features of the unselected nodes.

Graph Classification

Paper
Code

Monocular 3D Object Detection with Decoupled Structured Polygon Estimation and Height-Guided Depth Estimation

no code implementations • 5 Feb 2020 • Yingjie Cai, Buyu Li, Zeyu Jiao, Hongsheng Li, Xingyu Zeng, Xiaogang Wang

Monocular 3D object detection task aims to predict the 3D bounding boxes of objects based on monocular RGB images.

Depth Estimation Monocular 3D Object Detection +2

Paper
Add Code

MagnifierNet: Towards Semantic Adversary and Fusion for Person Re-identification

1 code implementation • 25 Feb 2020 • Yushi Lan, Yu-An Liu, Maoqing Tian, Xinchi Zhou, Xuesen Zhang, Shuai Yi, Hongsheng Li

Meanwhile, we introduce "Semantic Fusion Branch" to filter out irrelevant noises by selectively fusing semantic region information sequentially.

Person Re-Identification

Paper
Code

Structured Domain Adaptation with Online Relation Regularization for Unsupervised Person Re-ID

4 code implementations • 14 Mar 2020 • Yixiao Ge, Feng Zhu, Dapeng Chen, Rui Zhao, Xiaogang Wang, Hongsheng Li

To tackle the challenges, we propose an end-to-end structured domain adaptation framework with an online relation-consistency regularization term.

Ranked #4 on Unsupervised Domain Adaptation on Market to MSMT

Pseudo Label Relation +3

Paper
Code

3D Sketch-aware Semantic Scene Completion via Semi-supervised Structure Prior

2 code implementations • CVPR 2020 • Xiaokang Chen, Kwan-Yee Lin, Chen Qian, Gang Zeng, Hongsheng Li

To this end, we first propose a novel 3D sketch-aware feature embedding to explicitly encode geometric information effectively and efficiently.

Ranked #3 on 3D Semantic Scene Completion from a single RGB image on NYUv2

3D Semantic Scene Completion from a single RGB image Hallucination

Paper
Code

Learning to Predict Context-adaptive Convolution for Semantic Segmentation

no code implementations • ECCV 2020 • Jianbo Liu, Junjun He, Jimmy S. Ren, Yu Qiao, Hongsheng Li

Long-range contextual information is essential for achieving high-performance semantic segmentation.

Segmentation Semantic Segmentation

Paper
Add Code

StereoGAN: Bridging Synthetic-to-Real Domain Gap by Joint Optimization of Domain Translation and Stereo Matching

no code implementations • CVPR 2020 • Rui Liu, Chengxi Yang, Wenxiu Sun, Xiaogang Wang, Hongsheng Li

Large-scale synthetic datasets are beneficial to stereo matching but usually introduce known domain bias.

Disparity Estimation Stereo Matching +3

Paper
Add Code

Self-paced Contrastive Learning with Hybrid Memory for Domain Adaptive Object Re-ID

3 code implementations • NeurIPS 2020 • Yixiao Ge, Feng Zhu, Dapeng Chen, Rui Zhao, Hongsheng Li

To solve these problems, we propose a novel self-paced contrastive learning framework with hybrid memory.

Ranked #3 on Unsupervised Domain Adaptation on Market to MSMT

Clustering Contrastive Learning +4

389

Paper
Code

Self-supervising Fine-grained Region Similarities for Large-scale Image Localization

3 code implementations • ECCV 2020 • Yixiao Ge, Haibo Wang, Feng Zhu, Rui Zhao, Hongsheng Li

The task of large-scale retrieval-based image localization is to estimate the geographical location of a query image by recognizing its nearest reference images from a city-scale dataset.

Image Retrieval Retrieval

264

Paper
Code

Actor-Context-Actor Relation Network for Spatio-Temporal Action Localization

3 code implementations • CVPR 2021 • Junting Pan, Siyu Chen, Mike Zheng Shou, Yu Liu, Jing Shao, Hongsheng Li

We propose to explicitly model the Actor-Context-Actor Relation, which is the relation between two actors based on their interactions with the context.

Ranked #2 on Action Recognition on AVA v2.1

Action Detection Action Recognition +5

2,968

Paper
Code

1st place solution for AVA-Kinetics Crossover in AcitivityNet Challenge 2020

2 code implementations • 16 Jun 2020 • Siyu Chen, Junting Pan, Guanglu Song, Manyuan Zhang, Hao Shao, Ziyi Lin, Jing Shao, Hongsheng Li, Yu Liu

This technical report introduces our winning solution to the spatio-temporal action localization track, AVA-Kinetics Crossover, in ActivityNet Challenge 2020.

Relation Network Spatio-Temporal Action Localization +1

197

Paper
Code

Dynamic Graph Representation Learning for Video Dialog via Multi-Modal Shuffled Transformers

no code implementations • 8 Jul 2020 • Shijie Geng, Peng Gao, Moitreya Chatterjee, Chiori Hori, Jonathan Le Roux, Yongfeng Zhang, Hongsheng Li, Anoop Cherian

Given an input video, its associated audio, and a brief caption, the audio-visual scene aware dialog (AVSD) task requires an agent to indulge in a question-answer dialog with a human about the audio-visual content.

Answer Generation Graph Representation Learning

Paper
Add Code

Bi-directional Cross-Modality Feature Propagation with Separation-and-Aggregation Gate for RGB-D Semantic Segmentation

2 code implementations • ECCV 2020 • Xiaokang Chen, Kwan-Yee Lin, Jingbo Wang, Wayne Wu, Chen Qian, Hongsheng Li, Gang Zeng

Depth information has proven to be a useful cue in the semantic segmentation of RGB-D images for providing a geometric counterpart to the RGB representation.

Ranked #3 on Semantic Segmentation on Event-based Segmentation Dataset

Segmentation Semantic Segmentation +2

272

Paper
Code

Complementary Boundary Generator with Scale-Invariant Relation Modeling for Temporal Action Localization: Submission to ActivityNet Challenge 2020

no code implementations • 20 Jul 2020 • Haisheng Su, Jinyuan Feng, Hao Shao, Zhenyu Jiang, Manyuan Zhang, Wei Wu, Yu Liu, Hongsheng Li, Junjie Yan

Specifically, in order to generate high-quality proposals, we consider several factors including the video feature encoder, the proposal generator, the proposal-proposal relations, the scale imbalance, and ensemble strategy.

Temporal Action Localization

Paper
Add Code

Balanced Meta-Softmax for Long-Tailed Visual Recognition

1 code implementation • NeurIPS 2020 • Jiawei Ren, Cunjun Yu, Shunan Sheng, Xiao Ma, Haiyu Zhao, Shuai Yi, Hongsheng Li

In our experiments, we demonstrate that Balanced Meta-Softmax outperforms state-of-the-art long-tailed classification solutions on both visual recognition and instance segmentation tasks.

Ranked #7 on Long-tail Learning on CIFAR-10-LT (ρ=10)

General Classification Instance Segmentation +2

Paper
Code

Cylinder3D: An Effective 3D Framework for Driving-scene LiDAR Semantic Segmentation

3 code implementations • 4 Aug 2020 • Hui Zhou, Xinge Zhu, Xiao Song, Yuexin Ma, Zhe Wang, Hongsheng Li, Dahua Lin

A straightforward solution to tackle the issue of 3D-to-2D projection is to keep the 3D representation and process the points in the 3D space.

Ranked #11 on LIDAR Semantic Segmentation on nuScenes

3D Semantic Segmentation LIDAR Semantic Segmentation

806

Paper
Code

Open-Edit: Open-Domain Image Manipulation with Open-Vocabulary Instructions

1 code implementation • ECCV 2020 • Xihui Liu, Zhe Lin, Jianming Zhang, Handong Zhao, Quan Tran, Xiaogang Wang, Hongsheng Li

We propose a novel algorithm, named Open-Edit, which is the first attempt on open-domain image manipulation with open-vocabulary instructions.

Image Manipulation

Paper
Code

Multi-organ Segmentation via Co-training Weight-averaged Models from Few-organ Datasets

no code implementations • 17 Aug 2020 • Rui Huang, Yuanjie Zheng, Zhiqiang Hu, Shaoting Zhang, Hongsheng Li

In most scenarios, one might obtain annotations of a single or a few organs from one training set, and obtain annotations of the the other organs from another set of training images.

Organ Segmentation

Paper
Add Code

EfficientFCN: Holistically-guided Decoding for Semantic Segmentation

no code implementations • ECCV 2020 • Jianbo Liu, Junjun He, Jiawei Zhang, Jimmy S. Ren, Hongsheng Li

State-of-the-art semantic segmentation algorithms are mostly based on dilated Fully Convolutional Networks (dilatedFCN), which adopt dilated convolutions in the backbone networks to extract high-resolution feature maps for achieving high-performance segmentation performance.

Segmentation Semantic Segmentation

Paper
Add Code

PV-RCNN: The Top-Performing LiDAR-only Solutions for 3D Detection / 3D Tracking / Domain Adaptation of Waymo Open Dataset Challenges

1 code implementation • 28 Aug 2020 • Shaoshuai Shi, Chaoxu Guo, Jihan Yang, Hongsheng Li

In this technical report, we present the top-performing LiDAR-only solutions for 3D detection, 3D tracking and domain adaptation three tracks in Waymo Open Dataset Challenges 2020.

3D Object Detection Domain Adaptation +1

4,297

Paper
Code

SelfVoxeLO: Self-supervised LiDAR Odometry with Voxel-based Deep Neural Networks

no code implementations • 19 Oct 2020 • Yan Xu, Zhaoyang Huang, Kwan-Yee Lin, Xinge Zhu, Jianping Shi, Hujun Bao, Guofeng Zhang, Hongsheng Li

To suit our network to self-supervised learning, we design several novel loss functions that utilize the inherent properties of LiDAR point clouds.

Self-Supervised Learning

Paper
Add Code

End-to-End Object Detection with Adaptive Clustering Transformer

1 code implementation • 18 Nov 2020 • Minghang Zheng, Peng Gao, Renrui Zhang, Kunchang Li, Xiaogang Wang, Hongsheng Li, Hao Dong

In this paper, a novel variant of transformer named Adaptive Clustering Transformer(ACT) has been proposed to reduce the computation cost for high-resolution input.

Clustering Object +2

164

Paper
Code

Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation

2 code implementations • CVPR 2021 • Xinge Zhu, Hui Zhou, Tai Wang, Fangzhou Hong, Yuexin Ma, Wei Li, Hongsheng Li, Dahua Lin

However, we found that in the outdoor point cloud, the improvement obtained in this way is quite limited.

Ranked #2 on 3D Semantic Segmentation on ScribbleKITTI

LIDAR Semantic Segmentation Panoptic Segmentation +3

806

Paper
Code

LiDAR-based Panoptic Segmentation via Dynamic Shifting Network

1 code implementation • CVPR 2021 • Fangzhou Hong, Hui Zhou, Xinge Zhu, Hongsheng Li, Ziwei Liu

2) Dynamic Shifting for complex point distributions.

Ranked #2 on Panoptic Segmentation on SemanticKITTI

Autonomous Driving Clustering +1

230

Paper
Code

REFINE: Prediction Fusion Network for Panoptic Segmentation

no code implementations • 15 Dec 2020 • Jiawei Ren, Cunjun Yu, Zhongang Cai, Mingyuan Zhang, Chongsong Chen, Haiyu Zhao, Shuai Yi, Hongsheng Li

Panoptic segmentation aims at generating pixel-wise class and instance predictions for each pixel in the input image, which is a challenging task and far more complicated than naively fusing the semantic and instance segmentation results.

Ranked #11 on Panoptic Segmentation on COCO test-dev

Instance Segmentation Panoptic Segmentation +1

Paper
Add Code

A Holistically-Guided Decoder for Deep Representation Learning with Applications to Semantic Segmentation and Object Detection

no code implementations • 18 Dec 2020 • Jianbo Liu, Sijie Ren, Yuanjie Zheng, Xiaogang Wang, Hongsheng Li

With the proposed holistically-guided decoder, we implement the EfficientFCN architecture for semantic segmentation and HGD-FPN for object detection and instance segmentation.

Instance Segmentation object-detection +4

Paper
Add Code

Towards Overcoming False Positives in Visual Relationship Detection

no code implementations • 23 Dec 2020 • Daisheng Jin, Xiao Ma, Chongzhi Zhang, Yizhuo Zhou, Jiashu Tao, Mingyuan Zhang, Haiyu Zhao, Shuai Yi, Zhoujun Li, Xianglong Liu, Hongsheng Li

We observe that during training, the relationship proposal distribution is highly imbalanced: most of the negative relationship proposals are easy to identify, e. g., the inaccurate object detection, which leads to the under-fitting of low-frequency difficult proposals.

Graph Attention Human-Object Interaction Detection +4

Paper
Add Code

Self-supervised Temporal Learning

no code implementations • 1 Jan 2021 • Hao Shao, Yu Liu, Hongsheng Li

Inspired by spatial-based contrastive SSL, we show that significant improvement can be achieved by a proposed temporal-based contrastive learning approach, which includes three novel and efficient modules: temporal augmentations, temporal memory bank and SSTL loss.

Contrastive Learning Retrieval +3

Paper
Add Code

Progressive Correspondence Pruning by Consensus Learning

1 code implementation • ICCV 2021 • Chen Zhao, Yixiao Ge, Feng Zhu, Rui Zhao, Hongsheng Li, Mathieu Salzmann

Correspondence selection aims to correctly select the consistent matches (inliers) from an initial set of putative correspondences.

Denoising Pose Estimation +1

Paper
Code

Fast Convergence of DETR with Spatially Modulated Co-Attention

2 code implementations • 19 Jan 2021 • Peng Gao, Minghang Zheng, Xiaogang Wang, Jifeng Dai, Hongsheng Li

The recently proposed Detection Transformer (DETR) model successfully applies Transformer to objects detection and achieves comparable performance with two-stage object detection frameworks, such as Faster-RCNN.

object-detection Object Detection

164

Paper
Code

PV-RCNN++: Point-Voxel Feature Set Abstraction With Local Vector Representation for 3D Object Detection

1 code implementation • 31 Jan 2021 • Shaoshuai Shi, Li Jiang, Jiajun Deng, Zhe Wang, Chaoxu Guo, Jianping Shi, Xiaogang Wang, Hongsheng Li

3D object detection is receiving increasing attention from both industry and academia thanks to its wide applications in various fields.

Ranked #2 on 3D Object Detection on KITTI Cars Easy val

3D Object Detection Object +1

4,297

Paper
Code

Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch

4 code implementations • ICLR 2021 • Aojun Zhou, Yukun Ma, Junnan Zhu, Jianbo Liu, Zhijie Zhang, Kun Yuan, Wenxiu Sun, Hongsheng Li

In this paper, we are the first to study training from scratch an N:M fine-grained structured sparse network, which can maintain the advantages of both unstructured fine-grained sparsity and structured coarse-grained sparsity simultaneously on specifically designed GPUs.

196

Paper
Code

ST3D: Self-training for Unsupervised Domain Adaptation on 3D Object Detection

1 code implementation • CVPR 2021 • Jihan Yang, Shaoshuai Shi, Zhe Wang, Hongsheng Li, Xiaojuan Qi

Then, the detector is iteratively improved on the target domain by alternatively conducting two steps, which are the pseudo label updating with the developed quality-aware triplet memory bank and the model training with curriculum data augmentation.

3D Object Detection Data Augmentation +4

282

Paper
Code

DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial Network

1 code implementation • CVPR 2021 • Rui Liu, Yixiao Ge, Ching Lam Choi, Xiaogang Wang, Hongsheng Li

Conditional generative adversarial networks (cGANs) target at synthesizing diverse images given the input conditions and latent codes, but unfortunately, they usually suffer from the issue of mode collapse.

Contrastive Learning Generative Adversarial Network +1

Paper
Code

AutoLoss-Zero: Searching Loss Functions from Scratch for Generic Tasks

no code implementations • CVPR 2022 • Hao Li, Tianwen Fu, Jifeng Dai, Hongsheng Li, Gao Huang, Xizhou Zhu

However, the automatic design of loss functions for generic tasks with various evaluation metrics remains under-investigated.

Paper
Add Code

Fixing the Teacher-Student Knowledge Discrepancy in Distillation

no code implementations • 31 Mar 2021 • Jiangfan Han, Mengya Gao, Yujie Wang, Quanquan Li, Hongsheng Li, Xiaogang Wang

To solve this problem, in this paper, we propose a novel student-dependent distillation method, knowledge consistent distillation, which makes teacher's knowledge more consistent with the student and provides the best suitable knowledge to different student networks for distillation.

Image Classification Knowledge Distillation +2

Paper
Add Code

FocusNetv2: Imbalanced Large and Small Organ Segmentation with Adversarial Shape Constraint for Head and Neck CT Images

1 code implementation • 5 Apr 2021 • Yunhe Gao, Rui Huang, Yiwei Yang, Jie Zhang, Kainan Shao, Changjuan Tao, YuanYuan Chen, Dimitris N. Metaxas, Hongsheng Li, Ming Chen

Radiotherapy is a treatment where radiation is used to eliminate cancer cells.

Organ Segmentation Segmentation

Paper
Code

LIFE: Lighting Invariant Flow Estimation

no code implementations • 7 Apr 2021 • Zhaoyang Huang, Xiaokun Pan, Runsen Xu, Yan Xu, Ka Chun Cheung, Guofeng Zhang, Hongsheng Li

However, local image contents are inevitably ambiguous and error-prone during the cross-image feature matching process, which hinders downstream tasks.

Paper
Add Code

Semantic Scene Completion via Integrating Instances and Scene in-the-Loop

1 code implementation • CVPR 2021 • Yingjie Cai, Xuesong Chen, Chao Zhang, Kwan-Yee Lin, Xiaogang Wang, Hongsheng Li

The key insight is that we decouple the instances from a coarsely completed semantic scene instead of a raw input image to guide the reconstruction of instances and the overall scene.

Ranked #1 on 3D Semantic Scene Completion on NYUv2

3D Semantic Scene Completion Scene Understanding

Paper
Code

Decoupled Spatial-Temporal Transformer for Video Inpainting

1 code implementation • 14 Apr 2021 • Rui Liu, Hanming Deng, Yangyi Huang, Xiaoyu Shi, Lewei Lu, Wenxiu Sun, Xiaogang Wang, Jifeng Dai, Hongsheng Li

Seamless combination of these two novel designs forms a better spatial-temporal attention scheme and our proposed model achieves better performance than state-of-the-art video inpainting approaches with significant boosted efficiency.

Video Inpainting

Paper
Code

Self-distillation with Batch Knowledge Ensembling Improves ImageNet Classification

no code implementations • 27 Apr 2021 • Yixiao Ge, Xiao Zhang, Ching Lam Choi, Ka Chun Cheung, Peipei Zhao, Feng Zhu, Xiaogang Wang, Rui Zhao, Hongsheng Li

In this way, our BAKE framework achieves online knowledge ensembling across multiple samples with only a single network.

Classification General Classification +1

Paper
Add Code

Inverting Generative Adversarial Renderer for Face Reconstruction

no code implementations • CVPR 2021 • Jingtan Piao, Keqiang Sun, KwanYee Lin, Quan Wang, Hongsheng Li

Since the GAR learns to model the complicated real-world image, instead of relying on the simplified graphics rules, it is capable of producing realistic images, which essentially inhibits the domain-shift noise in training and optimization.

Face Reconstruction

Paper
Add Code

VS-Net: Voting with Segmentation for Visual Localization

1 code implementation • CVPR 2021 • Zhaoyang Huang, Han Zhou, Yijin Li, Bangbang Yang, Yan Xu, Xiaowei Zhou, Hujun Bao, Guofeng Zhang, Hongsheng Li

To address this problem, we propose a novel visual localization framework that establishes 2D-to-3D correspondences between the query image and the 3D map with a series of learnable scene-specific landmarks.

Segmentation Semantic Segmentation +1

Paper
Code

FNAS: Uncertainty-Aware Fast Neural Architecture Search

no code implementations • 25 May 2021 • Jihao Liu, Ming Zhang, Yangting Sun, Boxiao Liu, Guanglu Song, Yu Liu, Hongsheng Li

Further, an architecture knowledge pool together with a block similarity function is proposed to utilize parameter knowledge and reduces the searching time by 2 times.

Fairness Neural Architecture Search +1

Paper
Add Code

Container: Context Aggregation Network

4 code implementations • 2 Jun 2021 • Peng Gao, Jiasen Lu, Hongsheng Li, Roozbeh Mottaghi, Aniruddha Kembhavi

Convolutional neural networks (CNNs) are ubiquitous in computer vision, with a myriad of effective and efficient variations.

Ranked #462 on Image Classification on ImageNet

Image Classification Inductive Bias +5

Paper
Code

Scalable Transformers for Neural Machine Translation

no code implementations • 4 Jun 2021 • Peng Gao, Shijie Geng, Yu Qiao, Xiaogang Wang, Jifeng Dai, Hongsheng Li

In this paper, we propose a novel Scalable Transformers, which naturally contains sub-Transformers of different scales and have shared parameters.

Machine Translation NMT +1

Paper
Add Code

Refining Pseudo Labels with Clustering Consensus over Generations for Unsupervised Object Re-identification

1 code implementation • CVPR 2021 • Xiao Zhang, Yixiao Ge, Yu Qiao, Hongsheng Li

Unsupervised object re-identification targets at learning discriminative representations for object retrieval without any annotations.

Clustering Pseudo Label +1

Paper
Code

Hybrid Supervision Learning for Pathology Whole Slide Image Classification

1 code implementation • 2 Jul 2021 • Jiahui Li, Wen Chen, Xiaodi Huang, Zhiqiang Hu, Qi Duan, Hongsheng Li, Dimitris N. Metaxas, Shaoting Zhang

To handle this problem, we propose a hybrid supervision learning framework for this kind of high resolution images with sufficient image-level coarse annotations and a few pixel-level fine labels.

Classification Image Classification +3

Paper
Code

Categorical Relation-Preserving Contrastive Knowledge Distillation for Medical Image Classification

1 code implementation • 7 Jul 2021 • Xiaohan Xing, Yuenan Hou, Hang Li, Yixuan Yuan, Hongsheng Li, Max Q. -H. Meng

With the contribution of the CCD and CRP, our CRCKD algorithm can distill the relational knowledge more comprehensively.

Image Classification Knowledge Distillation +2

Paper
Code

Unsupervised Domain Adaptive 3D Detection with Multi-Level Consistency

1 code implementation • ICCV 2021 • Zhipeng Luo, Zhongang Cai, Changqing Zhou, Gongjie Zhang, Haiyu Zhao, Shuai Yi, Shijian Lu, Hongsheng Li, Shanghang Zhang, Ziwei Liu

In addition, existing 3D domain adaptive detection methods often assume prior access to the target domain annotations, which is rarely feasible in the real world.

3D Object Detection Autonomous Driving +1

Paper
Code

Fast Convergence of DETR with Spatially Modulated Co-Attention

1 code implementation • ICCV 2021 • Peng Gao, Minghang Zheng, Xiaogang Wang, Jifeng Dai, Hongsheng Li

However, DETR suffers from its slow convergence.

object-detection Object Detection

164

Paper
Code

Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization

1 code implementation • ICCV 2021 • Linjiang Huang, Liang Wang, Hongsheng Li

In this paper, we present a framework named FAC-Net based on the I3D backbone, on which three branches are appended, named class-wise foreground classification branch, class-agnostic attention branch and multiple instance learning branch.

Multiple Instance Learning Video Understanding +2

Paper
Code

ST3D++: Denoised Self-training for Unsupervised Domain Adaptation on 3D Object Detection

no code implementations • 15 Aug 2021 • Jihan Yang, Shaoshuai Shi, Zhe Wang, Hongsheng Li, Xiaojuan Qi

These specific designs enable the detector to be trained on meticulously refined pseudo labeled target data with denoised training signals, and thus effectively facilitate adapting an object detector to a target domain without requiring annotations.

3D Object Detection Data Augmentation +5

Paper
Add Code

LIF-Seg: LiDAR and Camera Image Fusion for 3D LiDAR Semantic Segmentation

no code implementations • 17 Aug 2021 • Lin Zhao, Hui Zhou, Xinge Zhu, Xiao Song, Hongsheng Li, Wenbing Tao

However, two major issues of the fusion between camera and LiDAR hinder its performance, \ie, how to effectively fuse these two modalities and how to precisely align them (suffering from the weak spatiotemporal synchronization problem).

Autonomous Driving LIDAR Semantic Segmentation +1

Paper
Add Code

LIGA-Stereo: Learning LiDAR Geometry Aware Representations for Stereo-based 3D Detector

1 code implementation • ICCV 2021 • Xiaoyang Guo, Shaoshuai Shi, Xiaogang Wang, Hongsheng Li

Compared with the state-of-the-art stereo detector, our method has improved the 3D detection performance of cars, pedestrians, cyclists by 10. 44%, 5. 69%, 5. 97% mAP respectively on the official KITTI benchmark.

Ranked #2 on 3D Object Detection From Stereo Images on KITTI Cyclists Moderate

3D Object Detection From Stereo Images Stereo Matching

Paper
Code

Spatio-Temporal Interaction Graph Parsing Networks for Human-Object Interaction Recognition

no code implementations • 19 Aug 2021 • Ning Wang, Guangming Zhu, Liang Zhang, Peiyi Shen, Hongsheng Li, Cong Hua

With the effective spatio-temporal relationship modeling, it is possible not only to uncover contextual information in each frame but also to directly capture inter-time dependencies.

Human-Object Interaction Detection Object

Paper
Add Code

Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation

1 code implementation • ICCV 2021 • Ziniu Wan, Zhengjia Li, Maoqing Tian, Jianbo Liu, Shuai Yi, Hongsheng Li

To this end, we propose Multi-level Attention Encoder-Decoder Network (MAED), including a Spatial-Temporal Encoder (STE) and a Kinematic Topology Decoder (KTD) to model multi-level attentions in a unified framework.

Ranked #40 on 3D Human Pose Estimation on MPI-INF-3DHP

3D Absolute Human Pose Estimation

201

Paper
Code

FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting

1 code implementation • ICCV 2021 • Rui Liu, Hanming Deng, Yangyi Huang, Xiaoyu Shi, Lewei Lu, Wenxiu Sun, Xiaogang Wang, Jifeng Dai, Hongsheng Li

On the contrary, the soft composition operates by stitching different patches into a whole feature map where pixels in overlapping regions are summed up.

Ranked #3 on Video Inpainting on DAVIS

Seeing Beyond the Visible Video Inpainting

102

Paper
Code

Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR-based Perception

1 code implementation • 12 Sep 2021 • Xinge Zhu, Hui Zhou, Tai Wang, Fangzhou Hong, Wei Li, Yuexin Ma, Hongsheng Li, Ruigang Yang, Dahua Lin

In this paper, we benchmark our model on these three tasks.

Panoptic Segmentation Segmentation

806

Paper
Code

UniFormer: Unified Transformer for Efficient Spatial-Temporal Representation Learning

3 code implementations • ICLR 2022 • Kunchang Li, Yali Wang, Gao Peng, Guanglu Song, Yu Liu, Hongsheng Li, Yu Qiao

For Something-Something V1 and V2, our UniFormer achieves new state-of-the-art performances of 60. 8% and 71. 4% top-1 accuracy respectively.

Ranked #8 on Action Recognition on Something-Something V1

Action Classification Action Recognition +1

2,968

Paper
Code

UniNet: Unified Architecture Search with Convolution, Transformer, and MLP

no code implementations • 8 Oct 2021 • Jihao Liu, Hongsheng Li, Guanglu Song, Xin Huang, Yu Liu

Recently, transformer and multi-layer perceptron (MLP) architectures have achieved impressive results on various vision tasks.

Ranked #236 on Image Classification on ImageNet

Image Classification object-detection +2

Paper
Add Code

CLIP-Adapter: Better Vision-Language Models with Feature Adapters

2 code implementations • 9 Oct 2021 • Peng Gao, Shijie Geng, Renrui Zhang, Teli Ma, Rongyao Fang, Yongfeng Zhang, Hongsheng Li, Yu Qiao

Large-scale contrastive vision-language pre-training has shown significant progress in visual representation learning.

Prompt Engineering Representation Learning

384

Paper
Code

Rethinking Noise Synthesis and Modeling in Raw Denoising

1 code implementation • ICCV 2021 • Yi Zhang, Hongwei Qin, Xiaogang Wang, Hongsheng Li

However, the real raw image noise is contributed by many noise sources and varies greatly among different sensors.

Ranked #2 on Image Denoising on SID SonyA7S2 x100

Image Denoising

105

Paper
Code

Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling

1 code implementation • 6 Nov 2021 • Renrui Zhang, Rongyao Fang, Wei zhang, Peng Gao, Kunchang Li, Jifeng Dai, Yu Qiao, Hongsheng Li

To further enhance CLIP's few-shot capability, CLIP-Adapter proposed to fine-tune a lightweight residual feature adapter and significantly improves the performance for few-shot classification.

Language Modelling Transfer Learning

464

Paper
Code

IDR: Self-Supervised Image Denoising via Iterative Data Refinement

1 code implementation • CVPR 2022 • Yi Zhang, Dasong Li, Ka Lung Law, Xiaogang Wang, Hongwei Qin, Hongsheng Li

To evaluate raw image denoising performance in real-world applications, we build a high-quality raw image dataset SenseNoise-500 that contains 500 real-life scenes.

Image Denoising

110

Paper
Code

A Simple Long-Tailed Recognition Baseline via Vision-Language Model

1 code implementation • 29 Nov 2021 • Teli Ma, Shijie Geng, Mengmeng Wang, Jing Shao, Jiasen Lu, Hongsheng Li, Peng Gao, Yu Qiao

Recent advances in large-scale contrastive visual-language pretraining shed light on a new pathway for visual recognition.

Ranked #4 on Long-tail Learning on Places-LT (using extra training data)

Contrastive Learning Language Modelling +3

Paper
Code

DominoSearch: Find layer-wise fine-grained N:M sparse schemes from dense neural networks

1 code implementation • NeurIPS 2021 • Wei Sun, Aojun Zhou, Sander Stuijk, Rob Wijnhoven, Andrew Oakleigh Nelson, Hongsheng Li, Henk Corporaal

However, the existing N:M algorithms only address the challenge of how to train N:M sparse neural networks in a uniform fashion (i. e. every layer has the same N:M sparsity) and suffer from a significant accuracy drop for high sparsity (i. e. when sparsity > 80\%).

Network Pruning

Paper
Code

Container: Context Aggregation Networks

2 code implementations • NeurIPS 2021 • Peng Gao, Jiasen Lu, Hongsheng Li, Roozbeh Mottaghi, Aniruddha Kembhavi

Convolutional neural networks (CNNs) are ubiquitous in computer vision, with a myriad of effective and efficient variations.

Inductive Bias Instance Segmentation +4

Paper
Code

Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks

1 code implementation • CVPR 2022 • Xizhou Zhu, Jinguo Zhu, Hao Li, Xiaoshi Wu, Xiaogang Wang, Hongsheng Li, Xiaohua Wang, Jifeng Dai

The model is pre-trained on several uni-modal and multi-modal tasks, and evaluated on a variety of downstream tasks, including novel tasks that did not appear in the pre-training stage.

250

Paper
Code

PointCLIP: Point Cloud Understanding by CLIP

2 code implementations • CVPR 2022 • Renrui Zhang, Ziyu Guo, Wei zhang, Kunchang Li, Xupeng Miao, Bin Cui, Yu Qiao, Peng Gao, Hongsheng Li

On top of that, we design an inter-view adapter to better extract the global feature and adaptively fuse the few-shot knowledge learned from 3D into CLIP pre-trained in 2D.

Ranked #3 on 3D Open-Vocabulary Instance Segmentation on STPLS3D

3D Open-Vocabulary Instance Segmentation Few-Shot Learning +6

290

Paper
Code

Pyramid Fusion Transformer for Semantic Segmentation

no code implementations • 11 Jan 2022 • Zipeng Qin, Jianbo Liu, Xiaolin Zhang, Maoqing Tian, Aojun Zhou, Shuai Yi, Hongsheng Li

The recently proposed MaskFormer gives a refreshed perspective on the task of semantic segmentation: it shifts from the popular pixel-level classification paradigm to a mask-level classification method.

Segmentation Semantic Segmentation

Paper
Add Code

UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning

2 code implementations • 12 Jan 2022 • Kunchang Li, Yali Wang, Peng Gao, Guanglu Song, Yu Liu, Hongsheng Li, Yu Qiao

For Something-Something V1 and V2, our UniFormer achieves new state-of-the-art performances of 60. 9% and 71. 2% top-1 accuracy respectively.

Representation Learning

2,968

Paper
Code

UniFormer: Unifying Convolution and Self-attention for Visual Recognition

7 code implementations • 24 Jan 2022 • Kunchang Li, Yali Wang, Junhao Zhang, Peng Gao, Guanglu Song, Yu Liu, Hongsheng Li, Yu Qiao

Different from the typical transformer blocks, the relation aggregators in our UniFormer block are equipped with local and global token affinity respectively in shallow and deep layers, allowing to tackle both redundancy and dependency for efficient and effective representation learning.

Ranked #151 on Image Classification on ImageNet

Image Classification object-detection +5

776

Paper
Code

Distillation with Contrast is All You Need for Self-Supervised Point Cloud Representation Learning

no code implementations • 9 Feb 2022 • Kexue Fu, Peng Gao, Renrui Zhang, Hongsheng Li, Yu Qiao, Manning Wang

Especially, we develop a variant of ViT for 3D point cloud feature extraction, which also achieves comparable results with existing backbones when combined with our framework, and visualization of the attention maps show that our model does understand the point cloud by combining the global shape information and multiple local structural information, which is consistent with the inspiration of our representation learning method.

Contrastive Learning Knowledge Distillation +1

Paper
Add Code

Meta Knowledge Distillation

no code implementations • 16 Feb 2022 • Jihao Liu, Boxiao Liu, Hongsheng Li, Yu Liu

Recent studies pointed out that knowledge distillation (KD) suffers from two degradation problems, the teacher-student gap and the incompatibility with strong data augmentations, making it not applicable to training state-of-the-art models, which are trained with advanced augmentations.

Ranked #133 on Image Classification on ImageNet

Data Augmentation Image Classification +1

Paper
Add Code

Robust Self-Supervised LiDAR Odometry via Representative Structure Discovery and 3D Inherent Error Modeling

1 code implementation • 27 Feb 2022 • Yan Xu, Junyi Lin, Jianping Shi, Guofeng Zhang, Xiaogang Wang, Hongsheng Li

The correct ego-motion estimation basically relies on the understanding of correspondences between adjacent LiDAR scans.

Motion Estimation

Paper
Code

Weakly Supervised Temporal Action Localization via Representative Snippet Knowledge Propagation

1 code implementation • CVPR 2022 • Linjiang Huang, Liang Wang, Hongsheng Li

Our method seeks to mine the representative snippets in each video for propagating information between video snippets to generate better pseudo labels.

Pseudo Label Weakly-supervised Temporal Action Localization +1

Paper
Code

LiDAR-based 4D Panoptic Segmentation via Dynamic Shifting Network

1 code implementation • 14 Mar 2022 • Fangzhou Hong, Hui Zhou, Xinge Zhu, Hongsheng Li, Ziwei Liu

In this work, we address the task of LiDAR-based panoptic segmentation, which aims to parse both objects and scenes in a unified manner.

4D Panoptic Segmentation Autonomous Driving +3

230

Paper
Code

MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection

1 code implementation • ICCV 2023 • Renrui Zhang, Han Qiu, Tai Wang, Ziyu Guo, Xuanzhuo Xu, Ziteng Cui, Yu Qiao, Peng Gao, Hongsheng Li

In this paper, we introduce the first DETR framework for Monocular DEtection with a depth-guided TRansformer, named MonoDETR.

Ranked #9 on 3D Object Detection From Monocular Images on KITTI-360

3D Object Detection From Monocular Images Autonomous Driving +3

308

Paper
Code

RNNPose: Recurrent 6-DoF Object Pose Refinement with Robust Correspondence Field Estimation and Pose Optimization

1 code implementation • CVPR 2022 • Yan Xu, Kwan-Yee Lin, Guofeng Zhang, Xiaogang Wang, Hongsheng Li

The correspondence field estimation and pose refinement are conducted alternatively in each iteration to recover the object poses.

Ranked #1 on 6D Pose Estimation using RGB on LineMOD

6D Pose Estimation using RGB Object

135

Paper
Code

Learning a Structured Latent Space for Unsupervised Point Cloud Completion

no code implementations • CVPR 2022 • Yingjie Cai, Kwan-Yee Lin, Chao Zhang, Qiang Wang, Xiaogang Wang, Hongsheng Li

Specifically, we map a series of related partial point clouds into multiple complete shape and occlusion code pairs and fuse the codes to obtain their representations in the unified latent space.

Point Cloud Completion

Paper
Add Code

FlowFormer: A Transformer Architecture for Optical Flow

1 code implementation • 30 Mar 2022 • Zhaoyang Huang, Xiaoyu Shi, Chao Zhang, Qiang Wang, Ka Chun Cheung, Hongwei Qin, Jifeng Dai, Hongsheng Li

We introduce optical Flow transFormer, dubbed as FlowFormer, a transformer-based neural network architecture for learning optical flow.

Ranked #1 on Optical Flow Estimation on Sintel-final

Optical Flow Estimation

373

Paper
Code

RBGNet: Ray-based Grouping for 3D Object Detection

1 code implementation • CVPR 2022 • Haiyang Wang, Shaoshuai Shi, Ze Yang, Rongyao Fang, Qi Qian, Hongsheng Li, Bernt Schiele, LiWei Wang

In order to learn better representations of object shape to enhance cluster features for predicting 3D boxes, we propose a ray-based feature grouping module, which aggregates the point-wise features on object surfaces using a group of determined rays uniformly emitted from cluster centers.

Ranked #13 on 3D Object Detection on ScanNetV2

3D Object Detection Object +1

Paper
Code

Simulating Fluids in Real-World Still Images

1 code implementation • ICCV 2023 • Siming Fan, Jingtan Piao, Chen Qian, Kwan-Yee Lin, Hongsheng Li

In this work, we tackle the problem of real-world fluid animation from a still image.

Motion Estimation

102

Paper
Code

Generalizable Neural Performer: Learning Robust Radiance Fields for Human Novel View Synthesis

1 code implementation • 25 Apr 2022 • Wei Cheng, Su Xu, Jingtan Piao, Chen Qian, Wayne Wu, Kwan-Yee Lin, Hongsheng Li

Specifically, we compress the light fields for novel view human rendering as conditional implicit neural radiance fields from both geometry and appearance aspects.

Novel View Synthesis

174

Paper
Code

EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision Transformers

1 code implementation • 6 May 2022 • Junting Pan, Adrian Bulat, Fuwen Tan, Xiatian Zhu, Lukasz Dudziak, Hongsheng Li, Georgios Tzimiropoulos, Brais Martinez

In this work, pushing further along this under-studied direction we introduce EdgeViTs, a new family of light-weight ViTs that, for the first time, enable attention-based vision models to compete with the best light-weight CNNs in the tradeoff between accuracy and on-device efficiency.

Paper
Code

ConvMAE: Masked Convolution Meets Masked Autoencoders

4 code implementations • 8 May 2022 • Peng Gao, Teli Ma, Hongsheng Li, Ziyi Lin, Jifeng Dai, Yu Qiao

Masked auto-encoding for feature pretraining and multi-scale hybrid convolution-transformer architectures can further unleash the potentials of ViT, leading to state-of-the-art performances on image classification, detection and semantic segmentation.

Computational Efficiency Image Classification +2

452

Paper
Code

Efficient Burst Raw Denoising with Variance Stabilization and Multi-frequency Denoising Network

no code implementations • 10 May 2022 • Dasong Li, Yi Zhang, Ka Lung Law, Xiaogang Wang, Hongwei Qin, Hongsheng Li

As for each sub-network, we propose an efficient multi-frequency denoising network to remove noise of different frequencies.

Denoising

Paper
Add Code

MPPNet: Multi-Frame Feature Intertwining with Proxy Points for 3D Temporal Object Detection

1 code implementation • 12 May 2022 • Xuesong Chen, Shaoshuai Shi, Benjin Zhu, Ka Chun Cheung, Hang Xu, Hongsheng Li

Accurate and reliable 3D detection is vital for many applications including autonomous driving vehicles and service robots.

Autonomous Driving object-detection +1

4,297

Paper
Code

MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers

1 code implementation • CVPR 2023 • Jihao Liu, Xin Huang, Jinliang Zheng, Yu Liu, Hongsheng Li

In this paper, we propose Mixed and Masked AutoEncoder (MixMAE), a simple but efficient pretraining method that is applicable to various hierarchical Vision Transformers.

Ranked #2 on Image Classification on Places205

Image Classification Object Detection +2

122

Paper
Code

Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training

3 code implementations • 28 May 2022 • Renrui Zhang, Ziyu Guo, Rongyao Fang, Bin Zhao, Dong Wang, Yu Qiao, Hongsheng Li, Peng Gao

By fine-tuning on downstream tasks, Point-M2AE achieves 86. 43% accuracy on ScanObjectNN, +3. 36% to the second-best, and largely benefits the few-shot classification, part segmentation and 3D object detection with the hierarchical pre-training scheme.

Ranked #4 on 3D Point Cloud Linear Classification on ModelNet40 (using extra training data)

3D Object Detection 3D Point Cloud Linear Classification +5

197

Paper
Code

Spatial Parsing and Dynamic Temporal Pooling networks for Human-Object Interaction detection

no code implementations • 7 Jun 2022 • Hongsheng Li, Guangming Zhu, Wu Zhen, Lan Ni, Peiyi Shen, Liang Zhang, Ning Wang, Cong Hua

However, there is still room for improvement in video HOI detection performance.

Human-Object Interaction Detection Object +1

Paper
Add Code

Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs

1 code implementation • 9 Jun 2022 • Jinguo Zhu, Xizhou Zhu, Wenhai Wang, Xiaohua Wang, Hongsheng Li, Xiaogang Wang, Jifeng Dai

To mitigate such interference, we introduce the Conditional Mixture-of-Experts (Conditional MoEs) to generalist models.

Image Captioning Image Classification +6

250

Paper
Code

Controllable 3D Face Synthesis with Conditional Generative Occupancy Fields

no code implementations • 16 Jun 2022 • Keqiang Sun, Shangzhe Wu, Zhaoyang Huang, Ning Zhang, Quan Wang, Hongsheng Li

Capitalizing on the recent advances in image generation models, existing controllable face image synthesis methods are able to generate high-fidelity images with some levels of controllability, e. g., controlling the shapes, expressions, textures, and poses of the generated face images.

Face Generation

Paper
Add Code

3D Object Detection for Autonomous Driving: A Comprehensive Survey

1 code implementation • 19 Jun 2022 • Jiageng Mao, Shaoshuai Shi, Xiaogang Wang, Hongsheng Li

Autonomous driving, in recent years, has been receiving increasing attention for its potential to relieve drivers' burdens and improve the safety of driving.

3D Object Detection Autonomous Driving +1

489

Paper
Code

A Simple Baseline for Video Restoration with Grouped Spatial-temporal Shift

1 code implementation • CVPR 2023 • Dasong Li, Xiaoyu Shi, Yi Zhang, Ka Chun Cheung, Simon See, Xiaogang Wang, Hongwei Qin, Hongsheng Li

In this study, we propose a simple yet effective framework for video restoration.

Ranked #1 on Deblurring on GoPro (using extra training data)

Deblurring Denoising +3

Paper
Code

ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning

1 code implementation • 27 Jun 2022 • Junting Pan, Ziyi Lin, Xiatian Zhu, Jing Shao, Hongsheng Li

This has led to a new research direction in parameter-efficient transfer learning.

Ranked #23 on Action Recognition on Something-Something V2 (using extra training data)

Action Classification Action Recognition +3

Paper
Code

UniNet: Unified Architecture Search with Convolution, Transformer, and MLP

2 code implementations • 12 Jul 2022 • Jihao Liu, Xin Huang, Guanglu Song, Hongsheng Li, Yu Liu

Finally, we integrate configurable operators and DSMs into a unified search space and search with a Reinforcement Learning-based search algorithm to fully explore the optimal combination of the operators.

Ranked #12 on Neural Architecture Search on ImageNet

Image Classification Neural Architecture Search

Paper
Code

TokenMix: Rethinking Image Mixing for Data Augmentation in Vision Transformers

1 code implementation • 18 Jul 2022 • Jihao Liu, Boxiao Liu, Hang Zhou, Hongsheng Li, Yu Liu

In this paper, we propose a novel data augmentation technique TokenMix to improve the performance of vision transformers.

Data Augmentation

Paper
Code

Tip-Adapter: Training-free Adaption of CLIP for Few-shot Classification

3 code implementations • 19 Jul 2022 • Renrui Zhang, Zhang Wei, Rongyao Fang, Peng Gao, Kunchang Li, Jifeng Dai, Yu Qiao, Hongsheng Li

On top of that, the performance of Tip-Adapter can be further boosted to be state-of-the-art on ImageNet by fine-tuning the cache model for 10$\times$ fewer epochs than existing methods, which is both effective and efficient.

Retrieval Transfer Learning

464

Paper
Code

Safety-Enhanced Autonomous Driving Using Interpretable Sensor Fusion Transformer

1 code implementation • 28 Jul 2022 • Hao Shao, Letian Wang, RuoBing Chen, Hongsheng Li, Yu Liu

Large-scale deployment of autonomous vehicles has been continually delayed due to safety concerns.

Ranked #2 on Autonomous Driving on CARLA Leaderboard

Autonomous Driving CARLA longest6 +3

452

Paper
Code

Frozen CLIP Models are Efficient Video Learners

2 code implementations • 6 Aug 2022 • Ziyi Lin, Shijie Geng, Renrui Zhang, Peng Gao, Gerard de Melo, Xiaogang Wang, Jifeng Dai, Yu Qiao, Hongsheng Li

Video recognition has been dominated by the end-to-end learning paradigm -- first initializing a video recognition model with weights of a pretrained image model and then conducting end-to-end training on videos.

Ranked #26 on Action Classification on Kinetics-400 (using extra training data)

Action Classification Video Recognition

155

Paper
Code

Learning Degradation Representations for Image Deblurring

1 code implementation • 10 Aug 2022 • Dasong Li, Yi Zhang, Ka Chun Cheung, Xiaogang Wang, Hongwei Qin, Hongsheng Li

With the integration, MSDI-Net can handle various and complicated blurry patterns adaptively.

Ranked #13 on Image Deblurring on GoPro

Deblurring Image Deblurring +3

Paper
Code

Towards Robust Face Recognition with Comprehensive Search

no code implementations • 29 Aug 2022 • Manyuan Zhang, Guanglu Song, Yu Liu, Hongsheng Li

To eliminate the bias of single-aspect research and provide an overall understanding of the face recognition model design, we first carefully design the search space for each aspect, then a comprehensive search method is introduced to jointly search optimal data cleaning, architecture, and loss function design.

Face Recognition Robust Face Recognition

Paper
Add Code

Magnetic Resonance Fingerprinting with compressed sensing and distance metric learning

no code implementations • 19 Sep 2022 • Zhe Wang, Hongsheng Li, Qinwei Zhang, Jing Yuan, Xiaogang Wang

Adaptively learning a distance metric from the undersampled training data can significantly improve the matching accuracy of the query fingerprints.

Magnetic Resonance Fingerprinting Metric Learning

Paper
Add Code

NeuralMarker: A Framework for Learning General Marker Correspondence

no code implementations • 19 Sep 2022 • Zhaoyang Huang, Xiaokun Pan, Weihong Pan, Weikang Bian, Yan Xu, Ka Chun Cheung, Guofeng Zhang, Hongsheng Li

We tackle the problem of estimating correspondences from a general marker, such as a movie poster, to an image that captures such a marker.

Video Editing

Paper
Add Code

Collaboration of Pre-trained Models Makes Better Few-shot Learner

no code implementations • 25 Sep 2022 • Renrui Zhang, Bohao Li, Wei zhang, Hao Dong, Hongsheng Li, Peng Gao, Yu Qiao

In this paper, we propose CoMo, a Collaboration of pre-trained Models that incorporates diverse prior knowledge from various pre-training paradigms for better few-shot learning.

Few-Shot Learning Representation Learning

Paper
Add Code

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

2 code implementations • CVPR 2023 • Wenhai Wang, Jifeng Dai, Zhe Chen, Zhenhang Huang, Zhiqi Li, Xizhou Zhu, Xiaowei Hu, Tong Lu, Lewei Lu, Hongsheng Li, Xiaogang Wang, Yu Qiao

Compared to the great progress of large-scale vision transformers (ViTs) in recent years, large-scale models based on convolutional neural networks (CNNs) are still in an early state.

Ranked #1 on Instance Segmentation on COCO test-dev (AP50 metric, using extra training data)

Classification Image Classification +3

2,292

Paper
Code

Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks

2 code implementations • CVPR 2023 • Hao Li, Jinguo Zhu, Xiaohu Jiang, Xizhou Zhu, Hongsheng Li, Chun Yuan, Xiaohua Wang, Yu Qiao, Xiaogang Wang, Wenhai Wang, Jifeng Dai

In this paper, we propose Uni-Perceiver v2, which is the first generalist model capable of handling major large-scale vision and vision-language tasks with competitive performance.

Language Modelling Multi-Task Learning

2,292

Paper
Code

Teach-DETR: Better Training DETR with Teachers

1 code implementation • 22 Nov 2022 • Linjiang Huang, Kaixin Lu, Guanglu Song, Liang Wang, Si Liu, Yu Liu, Hongsheng Li

In this paper, we present a novel training scheme, namely Teach-DETR, to learn better DETR-based detectors from versatile teacher detectors.

Paper
Code

CGOF++: Controllable 3D Face Synthesis with Conditional Generative Occupancy Fields

no code implementations • 23 Nov 2022 • Keqiang Sun, Shangzhe Wu, Ning Zhang, Zhaoyang Huang, Quan Wang, Hongsheng Li

Face Generation

Paper
Add Code

Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders

2 code implementations • CVPR 2023 • Renrui Zhang, Liuhui Wang, Yu Qiao, Peng Gao, Hongsheng Li

Pre-training by numerous image data has become de-facto for robust 2D representations.

Ranked #2 on 3D Point Cloud Linear Classification on ModelNet40 (using extra training data)

3D Point Cloud Linear Classification Few-Shot 3D Point Cloud Classification

197

Paper
Code

ConQueR: Query Contrast Voxel-DETR for 3D Object Detection

1 code implementation • CVPR 2023 • Benjin Zhu, Zhe Wang, Shaoshuai Shi, Hang Xu, Lanqing Hong, Hongsheng Li

We thus propose a Query Contrast mechanism to explicitly enhance queries towards their best-matched GTs over all unmatched query predictions.

3D Object Detection Object +1

101

Paper
Code

Starting From Non-Parametric Networks for 3D Point Cloud Analysis

1 code implementation • CVPR 2023 • Renrui Zhang, Liuhui Wang, Yali Wang, Peng Gao, Hongsheng Li, Jianbo Shi

We present a Non-parametric Network for 3D point cloud analysis, Point-NN, which consists of purely non-learnable components: farthest point sampling (FPS), k-nearest neighbors (k-NN), and pooling operations, with trigonometric functions.

435

Paper
Code

Adaptive Zone-Aware Hierarchical Planner for Vision-Language Navigation

1 code implementation • CVPR 2023 • Chen Gao, Xingyu Peng, Mi Yan, He Wang, Lirong Yang, Haibing Ren, Hongsheng Li, Si Liu

In this paper, we propose an Adaptive Zone-aware Hierarchical Planner (AZHP) to explicitly divides the navigation process into two heterogeneous phases, i. e., sub-goal setting via zone partition/selection (high-level action) and sub-goal executing (low-level action), for hierarchical planning.

Vision-Language Navigation

Paper
Code

SparseMAE: Sparse Training Meets Masked Autoencoders

no code implementations • ICCV 2023 • Aojun Zhou, Yang Li, Zipeng Qin, Jianbo Liu, Junting Pan, Renrui Zhang, Rui Zhao, Peng Gao, Hongsheng Li

In this paper, we aim to reduce model complexity from large vision transformers pretrained by MAE with assistant of sparse training.

Paper
Add Code

FlowFormer++: Masked Cost Volume Autoencoding for Pretraining Optical Flow Estimation

1 code implementation • CVPR 2023 • Xiaoyu Shi, Zhaoyang Huang, Dasong Li, Manyuan Zhang, Ka Chun Cheung, Simon See, Hongwei Qin, Jifeng Dai, Hongsheng Li

FlowFormer introduces a transformer architecture into optical flow estimation and achieves state-of-the-art performance.

Optical Flow Estimation

Paper
Code

FeatAug-DETR: Enriching One-to-Many Matching for DETRs with Feature Augmentation

1 code implementation • 2 Mar 2023 • Rongyao Fang, Peng Gao, Aojun Zhou, Yingjie Cai, Si Liu, Jifeng Dai, Hongsheng Li

The first method is One-to-many Matching via Data Augmentation (denoted as DataAug-DETR).

Data Augmentation object-detection +1

Paper
Code

Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners

3 code implementations • CVPR 2023 • Renrui Zhang, Xiangfei Hu, Bohao Li, Siyuan Huang, Hanqiu Deng, Hongsheng Li, Yu Qiao, Peng Gao

Our CaFo incorporates CLIP's language-contrastive knowledge, DINO's vision-contrastive knowledge, DALL-E's vision-generative knowledge, and GPT-3's language-generative knowledge.

Few-Shot Learning Representation Learning

464

Paper
Code

KBNet: Kernel Basis Network for Image Restoration

1 code implementation • 6 Mar 2023 • Yi Zhang, Dasong Li, Xiaoyu Shi, Dailan He, Kangning Song, Xiaogang Wang, Hongwei Qin, Hongsheng Li

In this paper, we propose a kernel basis attention (KBA) module, which introduces learnable kernel bases to model representative image patterns for spatial information aggregation.

Ranked #1 on Color Image Denoising on McMaster sigma50

Color Image Denoising Deblurring +4

173

Paper
Code

Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking

1 code implementation • 9 Mar 2023 • Peng Gao, Renrui Zhang, Rongyao Fang, Ziyi Lin, Hongyang Li, Hongsheng Li, Qiao Yu

To alleviate this, previous methods simply replace the pixel reconstruction targets of 75% masked tokens by encoded features from pre-trained image-image (DINO) or image-language (CLIP) contrastive learning.

Contrastive Learning

452

Paper
Code

PATS: Patch Area Transportation with Subdivision for Local Feature Matching

no code implementations • CVPR 2023 • Junjie Ni, Yijin Li, Zhaoyang Huang, Hongsheng Li, Hujun Bao, Zhaopeng Cui, Guofeng Zhang

However, estimating scale differences between these patches is non-trivial since the scale differences are determined by both relative camera poses and scene structures, and thus spatially varying over image pairs.

Graph Matching Optical Flow Estimation +2

Paper
Add Code

BlinkFlow: A Dataset to Push the Limits of Event-based Optical Flow Estimation

no code implementations • 14 Mar 2023 • Yijin Li, Zhaoyang Huang, Shuo Chen, Xiaoyu Shi, Hongsheng Li, Hujun Bao, Zhaopeng Cui, Guofeng Zhang

BlinkSim consists of a configurable rendering engine and a flexible engine for event data simulation.

Event-based Optical Flow Optical Flow Estimation

Paper
Add Code

Parameter is Not All You Need: Starting from Non-Parametric Networks for 3D Point Cloud Analysis

2 code implementations • 14 Mar 2023 • Renrui Zhang, Liuhui Wang, Ziyu Guo, Yali Wang, Peng Gao, Hongsheng Li, Jianbo Shi

Ranked #1 on Training-free 3D Part Segmentation on ShapeNet-Part

3D Point Cloud Classification Training-free 3D Part Segmentation +1

435

Paper
Code

VideoFlow: Exploiting Temporal Cues for Multi-frame Optical Flow Estimation

1 code implementation • ICCV 2023 • Xiaoyu Shi, Zhaoyang Huang, Weikang Bian, Dasong Li, Manyuan Zhang, Ka Chun Cheung, Simon See, Hongwei Qin, Jifeng Dai, Hongsheng Li

We first propose a TRi-frame Optical Flow (TROF) module that estimates bi-directional optical flows for the center frame in a three-frame manner.

Optical Flow Estimation

207

Paper
Code

GeoMIM: Towards Better 3D Knowledge Transfer via Masked Image Modeling for Multi-view 3D Understanding

1 code implementation • ICCV 2023 • Jihao Liu, Tai Wang, Boxiao Liu, Qihang Zhang, Yu Liu, Hongsheng Li

In this paper, we propose Geometry Enhanced Masked Image Modeling (GeoMIM) to transfer the knowledge of the LiDAR model in a pretrain-finetune paradigm for improving the multi-view camera-based 3D detection.

3D Object Detection object-detection +1

Paper
Code

CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching

1 code implementation • CVPR 2023 • Xiaoshi Wu, Feng Zhu, Rui Zhao, Hongsheng Li

To overcome these obstacles, we propose CORA, a DETR-style framework that adapts CLIP for Open-vocabulary detection by Region prompting and Anchor pre-matching.

Ranked #6 on Open Vocabulary Object Detection on MSCOCO (using extra training data)

Described Object Detection object-detection +2

152

Paper
Code

Human Preference Score: Better Aligning Text-to-Image Models with Human Preference

1 code implementation • ICCV 2023 • Xiaoshi Wu, Keqiang Sun, Feng Zhu, Rui Zhao, Hongsheng Li

To address this issue, we collect a dataset of human choices on generated images from the Stable Foundation Discord channel.

240

Paper
Code

LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention

7 code implementations • 28 Mar 2023 • Renrui Zhang, Jiaming Han, Chris Liu, Peng Gao, Aojun Zhou, Xiangfei Hu, Shilin Yan, Pan Lu, Hongsheng Li, Yu Qiao

We present LLaMA-Adapter, a lightweight adaption method to efficiently fine-tune LLaMA into an instruction-following model.

Ranked #2 on Music Question Answering on MusicQA

Instruction Following Language Modelling +3

5,776

Paper
Code

Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction

1 code implementation • ICCV 2023 • Zhuofan Zong, Dongzhi Jiang, Guanglu Song, Zeyue Xue, Jingyong Su, Hongsheng Li, Yu Liu

The HoP approach is straightforward: given the current timestamp t, we generate a pseudo Bird's-Eye View (BEV) feature of timestamp t-k from its adjacent frames and utilize this feature to predict the object set at timestamp t-k. Our approach is motivated by the observation that enforcing the detector to capture both the spatial location and temporal motion of objects occurring at historical timestamps can lead to more accurate BEV feature learning.

Ranked #3 on 3D Object Detection on nuScenes Camera Only

3D Object Detection Object

171

Paper
Code

Improving Weakly Supervised Temporal Action Localization by Bridging Train-Test Gap in Pseudo Labels

1 code implementation • CVPR 2023 • Jingqiu Zhou, Linjiang Huang, Liang Wang, Si Liu, Hongsheng Li

Besides, the generated pseudo-labels can be fluctuating and inaccurate at the early stage of training.

Pseudo Label Weakly-supervised Temporal Action Localization +1

Paper
Code

Perception Imitation: Towards Synthesis-free Simulator for Autonomous Vehicles

no code implementations • 19 Apr 2023 • Xiaoliang Ju, Yiyang Sun, Yiming Hao, Yikang Li, Yu Qiao, Hongsheng Li

We propose a perception imitation method to simulate results of a certain perception model, and discuss a new heuristic route of autonomous driving simulator without data synthesis.

Autonomous Driving

Paper
Add Code

LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model

3 code implementations • 28 Apr 2023 • Peng Gao, Jiaming Han, Renrui Zhang, Ziyi Lin, Shijie Geng, Aojun Zhou, Wei zhang, Pan Lu, Conghui He, Xiangyu Yue, Hongsheng Li, Yu Qiao

This strategy effectively alleviates the interference between the two tasks of image-text alignment and instruction following and achieves strong multi-modal reasoning with only a small-scale image-text and instruction dataset.

Ranked #6 on Visual Question Answering (VQA) on InfiMM-Eval

Instruction Following Optical Character Recognition (OCR) +7

5,485

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.