2 code implementations • 1 Dec 2022 • Yanghao Li, Haoqi Fan, Ronghang Hu, Christoph Feichtenhofer, Kaiming He
We present Fast Language-Image Pre-training (FLIP), a simple and more efficient method for training CLIP.
2 code implementations • 18 May 2022 • Christoph Feichtenhofer, Haoqi Fan, Yanghao Li, Kaiming He
We randomly mask out spacetime patches in videos and learn an autoencoder to reconstruct them in pixels.
4 code implementations • 30 Mar 2022 • Yanghao Li, Hanzi Mao, Ross Girshick, Kaiming He
This design enables the original ViT architecture to be fine-tuned for object detection without needing to redesign a hierarchical backbone for pre-training.
Ranked #2 on
Object Detection
on LVIS v1.0 val
1 code implementation • 22 Nov 2021 • Yanghao Li, Saining Xie, Xinlei Chen, Piotr Dollar, Kaiming He, Ross Girshick
The complexity of object detection methods can make this benchmarking non-trivial when new architectures, such as Vision Transformer (ViT) models, arrive.
33 code implementations • CVPR 2022 • Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick
Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels.
Ranked #1 on
Out-of-Distribution Generalization
on ImageNet-W
2 code implementations • CVPR 2021 • Christoph Feichtenhofer, Haoqi Fan, Bo Xiong, Ross Girshick, Kaiming He
We present a large-scale study on unsupervised spatiotemporal representation learning from videos.
Ranked #2 on
Self-Supervised Action Recognition
on HMDB51
Representation Learning
Self-Supervised Action Recognition
+1
6 code implementations • ICCV 2021 • Xinlei Chen, Saining Xie, Kaiming He
In this work, we go back to basics and investigate the effects of several fundamental components for training self-supervised ViT.
Ranked #1 on
Out-of-Distribution Generalization
on ImageNet-W
Out-of-Distribution Generalization
Self-Supervised Image Classification
+1
25 code implementations • CVPR 2021 • Xinlei Chen, Kaiming He
Our experiments show that collapsing solutions do exist for the loss and structure, but a stop-gradient operation plays an essential role in preventing collapsing.
Ranked #69 on
Self-Supervised Image Classification
on ImageNet
Representation Learning
Self-Supervised Image Classification
3 code implementations • ICML 2020 • Jiaxuan You, Jure Leskovec, Kaiming He, Saining Xie
Neural networks are often represented as graphs of connections between neurons.
21 code implementations • CVPR 2020 • Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollár
In this work, we present a new network design paradigm.
Ranked #1 on
Out-of-Distribution Generalization
on ImageNet-W
2 code implementations • ECCV 2020 • Chenxi Liu, Piotr Dollár, Kaiming He, Ross Girshick, Alan Yuille, Saining Xie
Existing neural network architectures in computer vision -- whether designed by humans or by machines -- were typically found using both images and their associated labels.
34 code implementations • 9 Mar 2020 • Xinlei Chen, Haoqi Fan, Ross Girshick, Kaiming He
Contrastive unsupervised learning has recently shown encouraging progress, e. g., in Momentum Contrast (MoCo) and SimCLR.
Ranked #3 on
Contrastive Learning
on imagenet-1k
13 code implementations • CVPR 2020 • Alexander Kirillov, Yuxin Wu, Kaiming He, Ross Girshick
We present a new method for efficient high-quality image segmentation of objects and scenes.
Ranked #3 on
Instance Segmentation
on COCO 2017 val
3 code implementations • CVPR 2020 • Chao-yuan Wu, Ross Girshick, Kaiming He, Christoph Feichtenhofer, Philipp Krähenbühl
We empirically demonstrate a general and robust grid schedule that yields a significant out-of-the-box training speedup without a loss in accuracy for different models (I3D, non-local, SlowFast), datasets (Kinetics, Something-Something, Charades), and training settings (with and without pre-training, 128 GPUs or 1 GPU).
Ranked #1 on
Video Classification
on Charades
44 code implementations • CVPR 2020 • Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, Ross Girshick
This enables building a large and consistent dictionary on-the-fly that facilitates contrastive unsupervised learning.
Ranked #11 on
Contrastive Learning
on imagenet-1k
12 code implementations • ICCV 2019 • Charles R. Qi, Or Litany, Kaiming He, Leonidas J. Guibas
Current 3D object detection methods are heavily influenced by 2D detectors.
Ranked #11 on
3D Object Detection
on SUN-RGBD val
10 code implementations • ICCV 2019 • Saining Xie, Alexander Kirillov, Ross Girshick, Kaiming He
In this paper, we explore a more diverse set of connectivity patterns through the lens of randomly wired neural networks.
Ranked #114 on
Neural Architecture Search
on ImageNet
2 code implementations • ICCV 2019 • Xinlei Chen, Ross Girshick, Kaiming He, Piotr Dollár
To formalize this, we treat dense instance segmentation as a prediction task over 4D tensors and present a general framework called TensorMask that explicitly captures this geometry and enables novel operators on 4D tensors.
Ranked #76 on
Instance Segmentation
on COCO test-dev
10 code implementations • CVPR 2019 • Alexander Kirillov, Ross Girshick, Kaiming He, Piotr Dollár
In this work, we perform a detailed study of this minimally extended version of Mask R-CNN with FPN, which we refer to as Panoptic FPN, and show it is a robust and accurate baseline for both tasks.
Ranked #4 on
Panoptic Segmentation
on KITTI Panoptic Segmentation
4 code implementations • CVPR 2019 • Chao-yuan Wu, Christoph Feichtenhofer, Haoqi Fan, Kaiming He, Philipp Krähenbühl, Ross Girshick
To understand the world, we humans constantly need to relate the present to the past, and put events in context.
Ranked #4 on
Action Recognition
on AVA v2.1
12 code implementations • ICCV 2019 • Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, Kaiming He
We present SlowFast networks for video recognition.
Ranked #3 on
Action Recognition
on AVA v2.1
2 code implementations • CVPR 2019 • Cihang Xie, Yuxin Wu, Laurens van der Maaten, Alan Yuille, Kaiming He
This study suggests that adversarial perturbations on images lead to noise in the features constructed by these networks.
no code implementations • NeurIPS 2018 • Zhilin Yang, Jake Zhao, Bhuwan Dhingra, Kaiming He, William W. Cohen, Ruslan R. Salakhutdinov, Yann Lecun
We also show that the learned graphs are generic enough to be transferred to different embeddings on which the graphs have not been trained (including GloVe embeddings, ELMo embeddings, and task-specific RNN hidden units), or embedding-free units such as image pixels.
1 code implementation • ICCV 2019 • Kaiming He, Ross Girshick, Piotr Dollár
We report competitive results on object detection and instance segmentation on the COCO dataset using standard models trained from random initialization.
Ranked #64 on
Object Detection
on COCO minival
1 code implementation • 14 Jun 2018 • Zhilin Yang, Jake Zhao, Bhuwan Dhingra, Kaiming He, William W. Cohen, Ruslan Salakhutdinov, Yann Lecun
We also show that the learned graphs are generic enough to be transferred to different embeddings on which the graphs have not been trained (including GloVe embeddings, ELMo embeddings, and task-specific RNN hidden unit), or embedding-free units such as image pixels.
4 code implementations • ECCV 2018 • Dhruv Mahajan, Ross Girshick, Vignesh Ramanathan, Kaiming He, Manohar Paluri, Yixuan Li, Ashwin Bharambe, Laurens van der Maaten
ImageNet classification is the de facto pretraining task for these models.
Ranked #185 on
Image Classification
on ImageNet
18 code implementations • ECCV 2018 • Yuxin Wu, Kaiming He
FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.
Ranked #124 on
Object Detection
on COCO minival
6 code implementations • CVPR 2019 • Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, Piotr Dollár
We propose and study a task we name panoptic segmentation (PS).
Ranked #21 on
Panoptic Segmentation
on Cityscapes val
(using extra training data)
4 code implementations • CVPR 2018 • Ilija Radosavovic, Piotr Dollár, Ross Girshick, Georgia Gkioxari, Kaiming He
We investigate omni-supervised learning, a special regime of semi-supervised learning in which the learner exploits all available labeled data plus internet-scale sources of unlabeled data.
3 code implementations • CVPR 2018 • Ronghang Hu, Piotr Dollár, Kaiming He, Trevor Darrell, Ross Girshick
Most methods for object instance segmentation require all training examples to be labeled with segmentation masks.
24 code implementations • CVPR 2018 • Xiaolong Wang, Ross Girshick, Abhinav Gupta, Kaiming He
Both convolutional and recurrent operations are building blocks that process one local neighborhood at a time.
Ranked #8 on
Action Classification
on Toyota Smarthome dataset
(using extra training data)
no code implementations • ICCV 2017 • Xiaolong Wang, Kaiming He, Abhinav Gupta
The objects are connected by two types of edges which correspond to two types of invariance: "different instances but a similar viewpoint and category" and "different viewpoints of the same instance".
219 code implementations • ICCV 2017 • Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollár
Our novel Focal Loss focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training.
Ranked #3 on
Long-tail Learning
on EGTEA
61 code implementations • 8 Jun 2017 • Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, Kaiming He
To achieve this result, we adopt a hyper-parameter-free linear scaling rule for adjusting learning rates as a function of minibatch size and develop a new warmup scheme that overcomes optimization challenges early in training.
2 code implementations • CVPR 2018 • Georgia Gkioxari, Ross Girshick, Piotr Dollár, Kaiming He
Our hypothesis is that the appearance of a person -- their pose, clothing, action -- is a powerful cue for localizing the objects they are interacting with.
Ranked #38 on
Human-Object Interaction Detection
on HICO-DET
162 code implementations • ICCV 2017 • Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick
Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance.
Ranked #1 on
Keypoint Estimation
on GRIT
83 code implementations • CVPR 2017 • Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, Serge Belongie
Feature pyramids are a basic component in recognition systems for detecting objects at different scales.
Ranked #3 on
Pedestrian Detection
on TJU-Ped-campus
48 code implementations • CVPR 2017 • Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, Kaiming He
Our simple design results in a homogeneous, multi-branch architecture that has only a few hyper-parameters to set.
Ranked #3 on
Image Classification
on GasHisSDB
no code implementations • 24 Jul 2016 • Liliang Zhang, Liang Lin, Xiaodan Liang, Kaiming He
Detecting pedestrian has been arguably addressed as a special topic beyond general object detection.
Ranked #17 on
Pedestrian Detection
on Caltech
44 code implementations • NeurIPS 2016 • Jifeng Dai, Yi Li, Kaiming He, Jian Sun
In contrast to previous region-based detectors such as Fast/Faster R-CNN that apply a costly per-region subnetwork hundreds of times, our region-based detector is fully convolutional with almost all computation shared on the entire image.
Ranked #4 on
Real-Time Object Detection
on PASCAL VOC 2007
no code implementations • CVPR 2016 • Di Lin, Jifeng Dai, Jiaya Jia, Kaiming He, Jian Sun
Large-scale data is of crucial importance for learning semantic segmentation models, but annotating per-pixel masks is a tedious and inefficient procedure.
no code implementations • 29 Mar 2016 • Jifeng Dai, Kaiming He, Yi Li, Shaoqing Ren, Jian Sun
In contrast to the previous FCN that generates one score map, our FCN is designed to compute a small set of instance-sensitive score maps, each of which is the outcome of a pixel-wise classifier of a relative position to instances.
52 code implementations • 16 Mar 2016 • Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
Deep residual networks have emerged as a family of extremely deep architectures showing compelling accuracy and nice convergence behaviors.
Ranked #16 on
Image Classification
on Kuzushiji-MNIST
2 code implementations • CVPR 2016 • Jifeng Dai, Kaiming He, Jian Sun
We develop an algorithm for the nontrivial end-to-end training of this causal, cascaded structure.
Ranked #3 on
Multi-Human Parsing
on PASCAL-Part
413 code implementations • CVPR 2016 • Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.
Ranked #1 on
Out-of-Distribution Generalization
on ImageNet-W
173 code implementations • NeurIPS 2015 • Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun
In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals.
Ranked #5 on
Real-Time Object Detection
on PASCAL VOC 2007
no code implementations • CVPR 2015 • Yan Xia, Kaiming He, Pushmeet Kohli, Jian Sun
This paper addresses the problem of learning long binary codes from high-dimensional data.
no code implementations • CVPR 2015 • Dongping Li, Kaiming He, Jian Sun, Kun Zhou
The image projections will turn the straight lines into curved "geodesic lines", and it is fundamentally impossible to keep all these lines straight.
no code implementations • 26 May 2015 • Xiangyu Zhang, Jianhua Zou, Kaiming He, Jian Sun
This paper aims to accelerate the test-time computation of convolutional neural networks (CNNs), especially very deep CNNs that have substantially impacted the computer vision community.
6 code implementations • 5 May 2015 • Kaiming He, Jian Sun
The guided filter is a technique for edge-aware image filtering.
no code implementations • 23 Apr 2015 • Shaoqing Ren, Kaiming He, Ross Girshick, Xiangyu Zhang, Jian Sun
We discover that aside from deep feature maps, a deep and convolutional per-region classifier is of particular importance for object detection, whereas latest superior image classification models (such as ResNets and GoogLeNets) do not directly lead to good detection accuracy without using such a per-region classifier.
no code implementations • ICCV 2015 • Jifeng Dai, Kaiming He, Jian Sun
Recent leading approaches to semantic segmentation rely on deep convolutional networks trained with human-annotated, pixel-level segmentation masks.
Ranked #46 on
Semantic Segmentation
on PASCAL VOC 2012 test
15 code implementations • ICCV 2015 • Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
In this work, we study rectifier neural networks for image classification from two aspects.
56 code implementations • 31 Dec 2014 • Chao Dong, Chen Change Loy, Kaiming He, Xiaoou Tang
We further show that traditional sparse-coding-based SR methods can also be viewed as a deep convolutional network.
Ranked #2 on
Video Super-Resolution
on Xiph HD - 4x upscaling
no code implementations • CVPR 2015 • Kaiming He, Jian Sun
Though recent advanced convolutional neural networks (CNNs) have been improving the image recognition accuracy, the models are getting more complex and time-consuming.
1 code implementation • CVPR 2015 • Jifeng Dai, Kaiming He, Jian Sun
The current leading approaches for semantic segmentation exploit shape information by extracting CNN features from masked image regions.
Ranked #59 on
Semantic Segmentation
on PASCAL Context
no code implementations • CVPR 2015 • Xiangyu Zhang, Jianhua Zou, Xiang Ming, Kaiming He, Jian Sun
This paper aims to accelerate the test-time computation of deep convolutional neural networks (CNNs).
13 code implementations • 18 Jun 2014 • Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
This requirement is "artificial" and may reduce the recognition accuracy for the images or sub-images of an arbitrary size/scale.
Ranked #24 on
Object Detection
on PASCAL VOC 2007
no code implementations • CVPR 2014 • Tiezheng Ge, Kaiming He, Jian Sun
In this paper, we study a special case of sparse coding in which the codebook is a Cartesian product of two subcodebooks.
no code implementations • CVPR 2013 • Kaiming He, Fang Wen, Jian Sun
We propose a novel Affinity-Preserving K-means algorithm which simultaneously performs k-means clustering and learns the binary indices of the quantized cells.
no code implementations • CVPR 2013 • Tiezheng Ge, Kaiming He, Qifa Ke, Jian Sun
Product quantization is an effective vector quantization approach to compactly encode high-dimensional vectors for fast approximate nearest neighbor (ANN) search.
1 code implementation • IEEE Transactions on Pattern Analysis and Machine Intelligence 2010 • Kaiming He, Jian Sun, Xiaoou Tang
The dark channel prior is a kind of statistics of outdoor haze-free images.
Ranked #1 on
Single Image Haze Removal
on RESIDE