18 code implementations • ICCV 2023 • Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, Ross Girshick
We introduce the Segment Anything (SA) project: a new task, model, and dataset for image segmentation.
Ranked #2 on Zero-Shot Instance Segmentation on LVIS v1.0 val
1 code implementation • ICCV 2023 • Mannat Singh, Quentin Duval, Kalyan Vasudev Alwala, Haoqi Fan, Vaibhav Aggarwal, Aaron Adcock, Armand Joulin, Piotr Dollár, Christoph Feichtenhofer, Ross Girshick, Rohit Girdhar, Ishan Misra
While MAE has only been shown to scale with the size of models, we find that it scales with the size of the training dataset as well.
Ranked #1 on Few-Shot Image Classification on ImageNet - 10-shot (using extra training data)
6 code implementations • 30 Mar 2022 • Yanghao Li, Hanzi Mao, Ross Girshick, Kaiming He
This design enables the original ViT architecture to be fine-tuned for object detection without needing to redesign a hierarchical backbone for pre-training.
Ranked #5 on Instance Segmentation on LVIS v1.0 val
2 code implementations • CVPR 2022 • Mannat Singh, Laura Gustafson, Aaron Adcock, Vinicius de Freitas Reis, Bugra Gedik, Raj Prateek Kosaraju, Dhruv Mahajan, Ross Girshick, Piotr Dollár, Laurens van der Maaten
Model pre-training is a cornerstone of modern visual recognition systems.
Ranked #1 on Out-of-Distribution Generalization on ImageNet-W (using extra training data)
Fine-Grained Image Classification Out-of-Distribution Generalization +3
2 code implementations • 22 Nov 2021 • Yanghao Li, Saining Xie, Xinlei Chen, Piotr Dollar, Kaiming He, Ross Girshick
The complexity of object detection methods can make this benchmarking non-trivial when new architectures, such as Vision Transformer (ViT) models, arrive.
1 code implementation • 18 Nov 2021 • Haoqi Fan, Tullie Murrell, Heng Wang, Kalyan Vasudev Alwala, Yanghao Li, Yilei Li, Bo Xiong, Nikhila Ravi, Meng Li, Haichuan Yang, Jitendra Malik, Ross Girshick, Matt Feiszli, Aaron Adcock, Wan-Yen Lo, Christoph Feichtenhofer
We introduce PyTorchVideo, an open-source deep-learning library that provides a rich set of modular, efficient, and reproducible components for a variety of video understanding tasks, including classification, detection, self-supervised learning, and low-level processing.
49 code implementations • CVPR 2022 • Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick
Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels.
Ranked #1 on Out-of-Distribution Generalization on ImageNet-W
1 code implementation • NeurIPS 2021 • Tete Xiao, Mannat Singh, Eric Mintun, Trevor Darrell, Piotr Dollár, Ross Girshick
To test whether this atypical design choice causes an issue, we analyze the optimization behavior of ViT models with their original patchify stem versus a simple counterpart where we replace the ViT stem by a small number of stacked stride-two 3*3 convolutions.
2 code implementations • CVPR 2021 • Christoph Feichtenhofer, Haoqi Fan, Bo Xiong, Ross Girshick, Kaiming He
We present a large-scale study on unsupervised spatiotemporal representation learning from videos.
Ranked #3 on Self-Supervised Action Recognition on HMDB51
Representation Learning Self-Supervised Action Recognition +1
2 code implementations • CVPR 2021 • Bowen Cheng, Ross Girshick, Piotr Dollár, Alexander C. Berg, Alexander Kirillov
We perform an extensive analysis across different error types and object sizes and show that Boundary IoU is significantly more sensitive than the standard Mask IoU measure to boundary errors for large objects and does not over-penalize errors on smaller objects.
4 code implementations • CVPR 2021 • Piotr Dollár, Mannat Singh, Ross Girshick
This leads us to propose a simple fast compound scaling strategy that encourages primarily scaling model width, while scaling depth and resolution to a lesser extent.
2 code implementations • 1 Feb 2021 • Achal Dave, Piotr Dollár, Deva Ramanan, Alexander Kirillov, Ross Girshick
On one hand, this is desirable as it treats all classes equally.
no code implementations • 16 May 2020 • Kritika Singh, Vimal Manohar, Alex Xiao, Sergey Edunov, Ross Girshick, Vitaliy Liptchinsky, Christian Fuegen, Yatharth Saraf, Geoffrey Zweig, Abdel-rahman Mohamed
Many semi- and weakly-supervised approaches have been investigated for overcoming the labeling cost of building high quality speech recognition systems.
24 code implementations • CVPR 2020 • Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollár
In this work, we present a new network design paradigm.
Ranked #1 on Out-of-Distribution Generalization on ImageNet-W
2 code implementations • ECCV 2020 • Chenxi Liu, Piotr Dollár, Kaiming He, Ross Girshick, Alan Yuille, Saining Xie
Existing neural network architectures in computer vision -- whether designed by humans or by machines -- were typically found using both images and their associated labels.
36 code implementations • 9 Mar 2020 • Xinlei Chen, Haoqi Fan, Ross Girshick, Kaiming He
Contrastive unsupervised learning has recently shown encouraging progress, e. g., in Momentum Contrast (MoCo) and SimCLR.
Ranked #3 on Contrastive Learning on imagenet-1k
14 code implementations • CVPR 2020 • Alexander Kirillov, Yuxin Wu, Kaiming He, Ross Girshick
We present a new method for efficient high-quality image segmentation of objects and scenes.
Ranked #3 on Instance Segmentation on COCO 2017 val
3 code implementations • CVPR 2020 • Chao-yuan Wu, Ross Girshick, Kaiming He, Christoph Feichtenhofer, Philipp Krähenbühl
We empirically demonstrate a general and robust grid schedule that yields a significant out-of-the-box training speedup without a loss in accuracy for different models (I3D, non-local, SlowFast), datasets (Kinetics, Something-Something, Charades), and training settings (with and without pre-training, 128 GPUs or 1 GPU).
Ranked #1 on Video Classification on Kinetics
45 code implementations • CVPR 2020 • Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, Ross Girshick
This enables building a large and consistent dictionary on-the-fly that facilitates contrastive unsupervised learning.
Ranked #11 on Contrastive Learning on imagenet-1k
no code implementations • 27 Oct 2019 • Kritika Singh, Dmytro Okhonko, Jun Liu, Yongqiang Wang, Frank Zhang, Ross Girshick, Sergey Edunov, Fuchun Peng, Yatharth Saraf, Geoffrey Zweig, Abdelrahman Mohamed
Supervised ASR models have reached unprecedented levels of accuracy, thanks in part to ever-increasing amounts of labelled training data.
2 code implementations • NeurIPS 2019 • Anton Bakhtin, Laurens van der Maaten, Justin Johnson, Laura Gustafson, Ross Girshick
The benchmark is designed to encourage the development of learning algorithms that are sample-efficient and generalize well across puzzles.
Ranked #3 on Visual Reasoning on PHYRE-1B-Within
3 code implementations • CVPR 2019 • Agrim Gupta, Piotr Dollár, Ross Girshick
We plan to collect ~2 million high-quality instance segmentation masks for over 1000 entry-level object categories in 164k images.
9 code implementations • ICCV 2019 • Saining Xie, Alexander Kirillov, Ross Girshick, Kaiming He
In this paper, we explore a more diverse set of connectivity patterns through the lens of randomly wired neural networks.
Ranked #118 on Neural Architecture Search on <h2>oi</h2>
2 code implementations • ICCV 2019 • Xinlei Chen, Ross Girshick, Kaiming He, Piotr Dollár
To formalize this, we treat dense instance segmentation as a prediction task over 4D tensors and present a general framework called TensorMask that explicitly captures this geometry and enables novel operators on 4D tensors.
Ranked #90 on Instance Segmentation on COCO test-dev
12 code implementations • CVPR 2019 • Alexander Kirillov, Ross Girshick, Kaiming He, Piotr Dollár
In this work, we perform a detailed study of this minimally extended version of Mask R-CNN with FPN, which we refer to as Panoptic FPN, and show it is a robust and accurate baseline for both tasks.
Ranked #4 on Panoptic Segmentation on Indian Driving Dataset
4 code implementations • CVPR 2019 • Chao-yuan Wu, Christoph Feichtenhofer, Haoqi Fan, Kaiming He, Philipp Krähenbühl, Ross Girshick
To understand the world, we humans constantly need to relate the present to the past, and put events in context.
Ranked #4 on Egocentric Activity Recognition on EPIC-KITCHENS-55
1 code implementation • ICCV 2019 • Kaiming He, Ross Girshick, Piotr Dollár
We report competitive results on object detection and instance segmentation on the COCO dataset using standard models trained from random initialization.
Ranked #81 on Object Detection on COCO minival
4 code implementations • ECCV 2018 • Dhruv Mahajan, Ross Girshick, Vignesh Ramanathan, Kaiming He, Manohar Paluri, Yixuan Li, Ashwin Bharambe, Laurens van der Maaten
ImageNet classification is the de facto pretraining task for these models.
Ranked #222 on Image Classification on <h2>oi</h2> (using extra training data)
1 code implementation • CVPR 2018 • Yu-Xiong Wang, Ross Girshick, Martial Hebert, Bharath Hariharan
Humans can quickly learn new visual concepts, perhaps because they can easily visualize or imagine what novel objects look like from different views.
9 code implementations • CVPR 2019 • Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, Piotr Dollár
We propose and study a task we name panoptic segmentation (PS).
Ranked #23 on Panoptic Segmentation on Cityscapes val (using extra training data)
4 code implementations • CVPR 2018 • Ilija Radosavovic, Piotr Dollár, Ross Girshick, Georgia Gkioxari, Kaiming He
We investigate omni-supervised learning, a special regime of semi-supervised learning in which the learner exploits all available labeled data plus internet-scale sources of unlabeled data.
no code implementations • CVPR 2018 • Ishan Misra, Ross Girshick, Rob Fergus, Martial Hebert, Abhinav Gupta, Laurens van der Maaten
We also show that our model asks questions that generalize to state-of-the-art VQA models and to novel test time distributions.
3 code implementations • CVPR 2018 • Ronghang Hu, Piotr Dollár, Kaiming He, Trevor Darrell, Ross Girshick
Most methods for object instance segmentation require all training examples to be labeled with segmentation masks.
32 code implementations • CVPR 2018 • Xiaolong Wang, Ross Girshick, Abhinav Gupta, Kaiming He
Both convolutional and recurrent operations are building blocks that process one local neighborhood at a time.
Ranked #8 on Action Classification on Toyota Smarthome dataset (using extra training data)
231 code implementations • ICCV 2017 • Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollár
Our novel Focal Loss focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training.
Ranked #3 on Region Proposal on COCO test-dev
71 code implementations • 8 Jun 2017 • Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, Kaiming He
To achieve this result, we adopt a hyper-parameter-free linear scaling rule for adjusting learning rates as a function of minibatch size and develop a new warmup scheme that overcomes optimization challenges early in training.
5 code implementations • ICCV 2017 • Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Judy Hoffman, Li Fei-Fei, C. Lawrence Zitnick, Ross Girshick
Existing methods for visual reasoning attempt to directly map inputs to outputs using black-box architectures without explicitly modeling the underlying reasoning processes.
Ranked #5 on Visual Question Answering (VQA) on CLEVR-Humans
2 code implementations • CVPR 2018 • Georgia Gkioxari, Ross Girshick, Piotr Dollár, Kaiming He
Our hypothesis is that the appearance of a person -- their pose, clothing, action -- is a powerful cue for localizing the objects they are interacting with.
Ranked #53 on Human-Object Interaction Detection on HICO-DET
172 code implementations • ICCV 2017 • Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick
Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance.
Ranked #1 on Keypoint Estimation on GRIT
5 code implementations • CVPR 2017 • Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Li Fei-Fei, C. Lawrence Zitnick, Ross Girshick
When building artificial intelligence systems that can reason and answer questions about visual data, we need diagnostic tests to analyze our progress and discover shortcomings.
1 code implementation • CVPR 2017 • Deepak Pathak, Ross Girshick, Piotr Dollár, Trevor Darrell, Bharath Hariharan
Given the extensive evidence that motion plays a key role in the development of the human visual system, we hope that this straightforward approach to unsupervised learning will be more effective than cleverly designed 'pretext' tasks studied in the literature.
85 code implementations • CVPR 2017 • Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, Serge Belongie
Feature pyramids are a basic component in recognition systems for detecting objects at different scales.
Ranked #3 on Pedestrian Detection on TJU-Ped-campus
58 code implementations • CVPR 2017 • Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, Kaiming He
Our simple design results in a homogeneous, multi-branch architecture that has only a few hyper-parameters to set.
Ranked #3 on Image Classification on GasHisSDB
4 code implementations • ICCV 2017 • Bharath Hariharan, Ross Girshick
Low-shot visual learning---the ability to recognize novel object categories from very few examples---is a hallmark of human visual intelligence.
1 code implementation • NAACL 2016 • Ting-Hao, Huang, Francis Ferraro, Nasrin Mostafazadeh, Ishan Misra, Aishwarya Agrawal, Jacob Devlin, Ross Girshick, Xiaodong He, Pushmeet Kohli, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh, Lucy Vanderwende, Michel Galley, Margaret Mitchell
We introduce the first dataset for sequential vision-to-language, and explore how this data may be used for the task of visual storytelling.
4 code implementations • 13 Apr 2016 • Junyuan Xie, Ross Girshick, Ali Farhadi
As 3D movie viewing becomes mainstream and Virtual Reality (VR) market emerges, the demand for 3D contents is growing rapidly.
5 code implementations • CVPR 2016 • Abhinav Shrivastava, Abhinav Gupta, Ross Girshick
Our motivation is the same as it has always been -- detection datasets contain an overwhelming number of easy examples and a small number of hard examples.
Ranked #6 on Face Verification on Trillion Pairs Dataset
no code implementations • CVPR 2016 • Ishan Misra, C. Lawrence Zitnick, Margaret Mitchell, Ross Girshick
When human annotators are given a choice about what to label in an image, they apply their own subjective judgments on what to ignore and what to mention.
no code implementations • CVPR 2016 • Sean Bell, C. Lawrence Zitnick, Kavita Bala, Ross Girshick
In this paper we present the Inside-Outside Net (ION), an object detector that exploits information both inside and outside the region of interest.
Ranked #224 on Object Detection on COCO test-dev
no code implementations • 19 Nov 2015 • Michael Cogswell, Faruk Ahmed, Ross Girshick, Larry Zitnick, Dhruv Batra
One major challenge in training Deep Neural Networks is preventing overfitting.
19 code implementations • 19 Nov 2015 • Junyuan Xie, Ross Girshick, Ali Farhadi
Clustering is central to many data-driven application domains and has been studied extensively in terms of distance functions and grouping algorithms.
Ranked #4 on Unsupervised Image Classification on SVHN (using extra training data)
145 code implementations • CVPR 2016 • Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi
A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation.
Ranked #1 on Real-Time Object Detection on PASCAL VOC 2007
194 code implementations • NeurIPS 2015 • Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun
In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals.
Ranked #2 on Vessel Detection on Vessel detection Dateset
no code implementations • CVPR 2015 • Saurabh Gupta, Pablo Arbelaez, Ross Girshick, Jitendra Malik
The goal of this work is to represent objects in an RGB-D scene with corresponding 3D models from a library.
1 code implementation • 17 May 2015 • Jacob Devlin, Saurabh Gupta, Ross Girshick, Margaret Mitchell, C. Lawrence Zitnick
We explore a variety of nearest neighbor baseline approaches for image captioning.
2 code implementations • ICCV 2015 • Georgia Gkioxari, Ross Girshick, Jitendra Malik
In this work, we exploit the simple observation that actions are accompanied by contextual cues to build a strong action recognition system.
Ranked #4 on Weakly Supervised Object Detection on HICO-DET
29 code implementations • ICCV 2015 • Ross Girshick
Fast R-CNN builds on previous work to efficiently classify object proposals using deep convolutional networks.
Ranked #23 on Object Detection on PASCAL VOC 2007 (using extra training data)
no code implementations • 23 Apr 2015 • Shaoqing Ren, Kaiming He, Ross Girshick, Xiangyu Zhang, Jian Sun
We discover that aside from deep feature maps, a deep and convolutional per-region classifier is of particular importance for object detection, whereas latest superior image classification models (such as ResNets and GoogLeNets) do not directly lead to good detection accuracy without using such a per-region classifier.
no code implementations • 16 Feb 2015 • Saurabh Gupta, Pablo Arbeláez, Ross Girshick, Jitendra Malik
The goal of this work is to replace objects in an RGB-D scene with corresponding 3D models from a library.
no code implementations • ICCV 2015 • Georgia Gkioxari, Ross Girshick, Jitendra Malik
We investigate the importance of parts for the tasks of action and attribute classification.
6 code implementations • CVPR 2015 • Bharath Hariharan, Pablo Arbeláez, Ross Girshick, Jitendra Malik
Recognition algorithms based on convolutional networks (CNNs) typically use the output of the last layer as feature representation.
1 code implementation • CVPR 2015 • Ross Girshick, Forrest Iandola, Trevor Darrell, Jitendra Malik
Deformable part models (DPMs) and convolutional neural networks (CNNs) are two widely used tools for visual recognition.
Ranked #28 on Object Detection on PASCAL VOC 2007
1 code implementation • 22 Jul 2014 • Saurabh Gupta, Ross Girshick, Pablo Arbeláez, Jitendra Malik
In this paper we study the problem of object detection for RGB-D images using semantically rich image and depth features.
Ranked #6 on Object Detection In Indoor Scenes on SUN RGB-D
1 code implementation • NeurIPS 2014 • Judy Hoffman, Sergio Guadarrama, Eric Tzeng, Ronghang Hu, Jeff Donahue, Ross Girshick, Trevor Darrell, Kate Saenko
A major challenge in scaling object detection is the difficulty of obtaining labeled images for large numbers of categories.
no code implementations • 15 Jul 2014 • Ning Zhang, Jeff Donahue, Ross Girshick, Trevor Darrell
Semantic part localization can facilitate fine-grained categorization by explicitly isolating subtle appearance differences associated with specific object parts.
Ranked #64 on Fine-Grained Image Classification on CUB-200-2011
no code implementations • 7 Jul 2014 • Bharath Hariharan, Pablo Arbeláez, Ross Girshick, Jitendra Malik
Unlike classical semantic segmentation, we require individual object instances.
Ranked #5 on Object Detection on PASCAL VOC 2012
no code implementations • 7 Jul 2014 • Pulkit Agrawal, Ross Girshick, Jitendra Malik
In the last two years, convolutional neural networks (CNNs) have achieved an impressive suite of results on standard recognition datasets and tasks.
2 code implementations • 20 Jun 2014 • Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, Trevor Darrell
The framework is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.
no code implementations • 19 Jun 2014 • Georgia Gkioxari, Bharath Hariharan, Ross Girshick, Jitendra Malik
We present convolutional neural networks for the tasks of keypoint (pose) prediction and action classification of people in unconstrained images.
no code implementations • CVPR 2014 • Georgia Gkioxari, Bharath Hariharan, Ross Girshick, Jitendra Malik
A k-poselet is a deformable part model (DPM) with k parts, where each of the parts is a poselet, aligned to a specific configuration of keypoints based on ground-truth annotations.
no code implementations • CVPR 2014 • Andrea Vedaldi, Siddharth Mahendran, Stavros Tsogkas, Subhransu Maji, Ross Girshick, Juho Kannala, Esa Rahtu, Iasonas Kokkinos, Matthew B. Blaschko, David Weiss, Ben Taskar, Karen Simonyan, Naomi Saphra, Sammy Mohamed
We show that the collected data can be used to study the relation between part detection and attribute prediction by diagnosing the performance of classifiers that pool information from different parts of an object.
35 code implementations • 1 May 2014 • Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, Piotr Dollár
We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding.
2 code implementations • 7 Apr 2014 • Forrest Iandola, Matt Moskewicz, Sergey Karayev, Ross Girshick, Trevor Darrell, Kurt Keutzer
Convolutional Neural Networks (CNNs) can provide accurate object classification.
no code implementations • 5 Mar 2014 • Hyun Oh Song, Ross Girshick, Stefanie Jegelka, Julien Mairal, Zaid Harchaoui, Trevor Darrell
Learning to localize objects with minimal supervision is an important problem in computer vision, since large fully annotated datasets are extremely costly to obtain.
Ranked #35 on Weakly Supervised Object Detection on PASCAL VOC 2007
29 code implementations • CVPR 2014 • Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik
We find that R-CNN outperforms OverFeat by a large margin on the 200-class ILSVRC2013 detection dataset.
Ranked #27 on Object Detection on PASCAL VOC 2007 (using extra training data)