Search Results for author: Ross Girshick

Found 75 papers, 58 papers with code

Exploring Plain Vision Transformer Backbones for Object Detection

6 code implementations30 Mar 2022 Yanghao Li, Hanzi Mao, Ross Girshick, Kaiming He

This design enables the original ViT architecture to be fine-tuned for object detection without needing to redesign a hierarchical backbone for pre-training.

Instance Segmentation Object +2

Benchmarking Detection Transfer Learning with Vision Transformers

2 code implementations22 Nov 2021 Yanghao Li, Saining Xie, Xinlei Chen, Piotr Dollar, Kaiming He, Ross Girshick

The complexity of object detection methods can make this benchmarking non-trivial when new architectures, such as Vision Transformer (ViT) models, arrive.

Benchmarking object-detection +3

PyTorchVideo: A Deep Learning Library for Video Understanding

1 code implementation18 Nov 2021 Haoqi Fan, Tullie Murrell, Heng Wang, Kalyan Vasudev Alwala, Yanghao Li, Yilei Li, Bo Xiong, Nikhila Ravi, Meng Li, Haichuan Yang, Jitendra Malik, Ross Girshick, Matt Feiszli, Aaron Adcock, Wan-Yen Lo, Christoph Feichtenhofer

We introduce PyTorchVideo, an open-source deep-learning library that provides a rich set of modular, efficient, and reproducible components for a variety of video understanding tasks, including classification, detection, self-supervised learning, and low-level processing.

Self-Supervised Learning Video Understanding

Early Convolutions Help Transformers See Better

1 code implementation NeurIPS 2021 Tete Xiao, Mannat Singh, Eric Mintun, Trevor Darrell, Piotr Dollár, Ross Girshick

To test whether this atypical design choice causes an issue, we analyze the optimization behavior of ViT models with their original patchify stem versus a simple counterpart where we replace the ViT stem by a small number of stacked stride-two 3*3 convolutions.

Boundary IoU: Improving Object-Centric Image Segmentation Evaluation

2 code implementations CVPR 2021 Bowen Cheng, Ross Girshick, Piotr Dollár, Alexander C. Berg, Alexander Kirillov

We perform an extensive analysis across different error types and object sizes and show that Boundary IoU is significantly more sensitive than the standard Mask IoU measure to boundary errors for large objects and does not over-penalize errors on smaller objects.

Image Segmentation Object +2

Fast and Accurate Model Scaling

4 code implementations CVPR 2021 Piotr Dollár, Mannat Singh, Ross Girshick

This leads us to propose a simple fast compound scaling strategy that encourages primarily scaling model width, while scaling depth and resolution to a lesser extent.

Are Labels Necessary for Neural Architecture Search?

2 code implementations ECCV 2020 Chenxi Liu, Piotr Dollár, Kaiming He, Ross Girshick, Alan Yuille, Saining Xie

Existing neural network architectures in computer vision -- whether designed by humans or by machines -- were typically found using both images and their associated labels.

Neural Architecture Search

Improved Baselines with Momentum Contrastive Learning

36 code implementations9 Mar 2020 Xinlei Chen, Haoqi Fan, Ross Girshick, Kaiming He

Contrastive unsupervised learning has recently shown encouraging progress, e. g., in Momentum Contrast (MoCo) and SimCLR.

Contrastive Learning Data Augmentation +3

A Multigrid Method for Efficiently Training Video Models

3 code implementations CVPR 2020 Chao-yuan Wu, Ross Girshick, Kaiming He, Christoph Feichtenhofer, Philipp Krähenbühl

We empirically demonstrate a general and robust grid schedule that yields a significant out-of-the-box training speedup without a loss in accuracy for different models (I3D, non-local, SlowFast), datasets (Kinetics, Something-Something, Charades), and training settings (with and without pre-training, 128 GPUs or 1 GPU).

Action Detection Action Recognition +2

PHYRE: A New Benchmark for Physical Reasoning

2 code implementations NeurIPS 2019 Anton Bakhtin, Laurens van der Maaten, Justin Johnson, Laura Gustafson, Ross Girshick

The benchmark is designed to encourage the development of learning algorithms that are sample-efficient and generalize well across puzzles.

Visual Reasoning

LVIS: A Dataset for Large Vocabulary Instance Segmentation

3 code implementations CVPR 2019 Agrim Gupta, Piotr Dollár, Ross Girshick

We plan to collect ~2 million high-quality instance segmentation masks for over 1000 entry-level object categories in 164k images.

Instance Segmentation Object +4

TensorMask: A Foundation for Dense Object Segmentation

2 code implementations ICCV 2019 Xinlei Chen, Ross Girshick, Kaiming He, Piotr Dollár

To formalize this, we treat dense instance segmentation as a prediction task over 4D tensors and present a general framework called TensorMask that explicitly captures this geometry and enables novel operators on 4D tensors.

Instance Segmentation Object +4

Panoptic Feature Pyramid Networks

12 code implementations CVPR 2019 Alexander Kirillov, Ross Girshick, Kaiming He, Piotr Dollár

In this work, we perform a detailed study of this minimally extended version of Mask R-CNN with FPN, which we refer to as Panoptic FPN, and show it is a robust and accurate baseline for both tasks.

Instance Segmentation Panoptic Segmentation +2

Rethinking ImageNet Pre-training

1 code implementation ICCV 2019 Kaiming He, Ross Girshick, Piotr Dollár

We report competitive results on object detection and instance segmentation on the COCO dataset using standard models trained from random initialization.

Instance Segmentation object-detection +2

Low-Shot Learning from Imaginary Data

1 code implementation CVPR 2018 Yu-Xiong Wang, Ross Girshick, Martial Hebert, Bharath Hariharan

Humans can quickly learn new visual concepts, perhaps because they can easily visualize or imagine what novel objects look like from different views.

General Classification

Data Distillation: Towards Omni-Supervised Learning

4 code implementations CVPR 2018 Ilija Radosavovic, Piotr Dollár, Ross Girshick, Georgia Gkioxari, Kaiming He

We investigate omni-supervised learning, a special regime of semi-supervised learning in which the learner exploits all available labeled data plus internet-scale sources of unlabeled data.

Keypoint Detection object-detection +1

Learning by Asking Questions

no code implementations CVPR 2018 Ishan Misra, Ross Girshick, Rob Fergus, Martial Hebert, Abhinav Gupta, Laurens van der Maaten

We also show that our model asks questions that generalize to state-of-the-art VQA models and to novel test time distributions.

Question Answering Visual Question Answering

Learning to Segment Every Thing

3 code implementations CVPR 2018 Ronghang Hu, Piotr Dollár, Kaiming He, Trevor Darrell, Ross Girshick

Most methods for object instance segmentation require all training examples to be labeled with segmentation masks.

Instance Segmentation Segmentation +1

Non-local Neural Networks

31 code implementations CVPR 2018 Xiaolong Wang, Ross Girshick, Abhinav Gupta, Kaiming He

Both convolutional and recurrent operations are building blocks that process one local neighborhood at a time.

Ranked #8 on Action Classification on Toyota Smarthome dataset (using extra training data)

Action Classification Action Recognition +5

Focal Loss for Dense Object Detection

231 code implementations ICCV 2017 Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollár

Our novel Focal Loss focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training.

Dense Object Detection Knowledge Distillation +5

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

70 code implementations8 Jun 2017 Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, Kaiming He

To achieve this result, we adopt a hyper-parameter-free linear scaling rule for adjusting learning rates as a function of minibatch size and develop a new warmup scheme that overcomes optimization challenges early in training.

Stochastic Optimization

Inferring and Executing Programs for Visual Reasoning

5 code implementations ICCV 2017 Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Judy Hoffman, Li Fei-Fei, C. Lawrence Zitnick, Ross Girshick

Existing methods for visual reasoning attempt to directly map inputs to outputs using black-box architectures without explicitly modeling the underlying reasoning processes.

Visual Question Answering (VQA) Visual Reasoning

Detecting and Recognizing Human-Object Interactions

2 code implementations CVPR 2018 Georgia Gkioxari, Ross Girshick, Piotr Dollár, Kaiming He

Our hypothesis is that the appearance of a person -- their pose, clothing, action -- is a powerful cue for localizing the objects they are interacting with.

Human-Object Interaction Detection Object

Mask R-CNN

172 code implementations ICCV 2017 Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick

Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance.

3D Instance Segmentation Human Part Segmentation +12

CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning

5 code implementations CVPR 2017 Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Li Fei-Fei, C. Lawrence Zitnick, Ross Girshick

When building artificial intelligence systems that can reason and answer questions about visual data, we need diagnostic tests to analyze our progress and discover shortcomings.

Question Answering Visual Question Answering +1

Learning Features by Watching Objects Move

1 code implementation CVPR 2017 Deepak Pathak, Ross Girshick, Piotr Dollár, Trevor Darrell, Bharath Hariharan

Given the extensive evidence that motion plays a key role in the development of the human visual system, we hope that this straightforward approach to unsupervised learning will be more effective than cleverly designed 'pretext' tasks studied in the literature.

object-detection Object Detection +1

Deep3D: Fully Automatic 2D-to-3D Video Conversion with Deep Convolutional Neural Networks

4 code implementations13 Apr 2016 Junyuan Xie, Ross Girshick, Ali Farhadi

As 3D movie viewing becomes mainstream and Virtual Reality (VR) market emerges, the demand for 3D contents is growing rapidly.

Depth Estimation

Training Region-based Object Detectors with Online Hard Example Mining

5 code implementations CVPR 2016 Abhinav Shrivastava, Abhinav Gupta, Ross Girshick

Our motivation is the same as it has always been -- detection datasets contain an overwhelming number of easy examples and a small number of hard examples.

object-detection Object Detection

Seeing through the Human Reporting Bias: Visual Classifiers from Noisy Human-Centric Labels

no code implementations CVPR 2016 Ishan Misra, C. Lawrence Zitnick, Margaret Mitchell, Ross Girshick

When human annotators are given a choice about what to label in an image, they apply their own subjective judgments on what to ignore and what to mention.

Image Captioning Image Classification

Unsupervised Deep Embedding for Clustering Analysis

19 code implementations19 Nov 2015 Junyuan Xie, Ross Girshick, Ali Farhadi

Clustering is central to many data-driven application domains and has been studied extensively in terms of distance functions and grouping algorithms.

Ranked #4 on Unsupervised Image Classification on SVHN (using extra training data)

Clustering Image Clustering +1

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

195 code implementations NeurIPS 2015 Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun

In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals.

Object Real-Time Object Detection +3

Aligning 3D Models to RGB-D Images of Cluttered Scenes

no code implementations CVPR 2015 Saurabh Gupta, Pablo Arbelaez, Ross Girshick, Jitendra Malik

The goal of this work is to represent objects in an RGB-D scene with corresponding 3D models from a library.

Contextual Action Recognition with R*CNN

2 code implementations ICCV 2015 Georgia Gkioxari, Ross Girshick, Jitendra Malik

In this work, we exploit the simple observation that actions are accompanied by contextual cues to build a strong action recognition system.

Action Recognition Attribute +3

Fast R-CNN

29 code implementations ICCV 2015 Ross Girshick

Fast R-CNN builds on previous work to efficiently classify object proposals using deep convolutional networks.

Ranked #23 on Object Detection on PASCAL VOC 2007 (using extra training data)

Object Object Detection

Object Detection Networks on Convolutional Feature Maps

no code implementations23 Apr 2015 Shaoqing Ren, Kaiming He, Ross Girshick, Xiangyu Zhang, Jian Sun

We discover that aside from deep feature maps, a deep and convolutional per-region classifier is of particular importance for object detection, whereas latest superior image classification models (such as ResNets and GoogLeNets) do not directly lead to good detection accuracy without using such a per-region classifier.

General Classification Image Classification +3

Inferring 3D Object Pose in RGB-D Images

no code implementations16 Feb 2015 Saurabh Gupta, Pablo Arbeláez, Ross Girshick, Jitendra Malik

The goal of this work is to replace objects in an RGB-D scene with corresponding 3D models from a library.


Hypercolumns for Object Segmentation and Fine-grained Localization

6 code implementations CVPR 2015 Bharath Hariharan, Pablo Arbeláez, Ross Girshick, Jitendra Malik

Recognition algorithms based on convolutional networks (CNNs) typically use the output of the last layer as feature representation.

Object Semantic Segmentation

Part-based R-CNNs for Fine-grained Category Detection

no code implementations15 Jul 2014 Ning Zhang, Jeff Donahue, Ross Girshick, Trevor Darrell

Semantic part localization can facilitate fine-grained categorization by explicitly isolating subtle appearance differences associated with specific object parts.

Fine-Grained Image Classification Object +2

Analyzing the Performance of Multilayer Neural Networks for Object Recognition

no code implementations7 Jul 2014 Pulkit Agrawal, Ross Girshick, Jitendra Malik

In the last two years, convolutional neural networks (CNNs) have achieved an impressive suite of results on standard recognition datasets and tasks.

Object Recognition

Caffe: Convolutional Architecture for Fast Feature Embedding

2 code implementations20 Jun 2014 Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, Trevor Darrell

The framework is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.

Clustering Dimensionality Reduction +1

R-CNNs for Pose Estimation and Action Detection

no code implementations19 Jun 2014 Georgia Gkioxari, Bharath Hariharan, Ross Girshick, Jitendra Malik

We present convolutional neural networks for the tasks of keypoint (pose) prediction and action classification of people in unconstrained images.

Action Classification Action Detection +3

Understanding Objects in Detail with Fine-Grained Attributes

no code implementations CVPR 2014 Andrea Vedaldi, Siddharth Mahendran, Stavros Tsogkas, Subhransu Maji, Ross Girshick, Juho Kannala, Esa Rahtu, Iasonas Kokkinos, Matthew B. Blaschko, David Weiss, Ben Taskar, Karen Simonyan, Naomi Saphra, Sammy Mohamed

We show that the collected data can be used to study the relation between part detection and attribute prediction by diagnosing the performance of classifiers that pool information from different parts of an object.

Attribute Object +2

Using k-Poselets for Detecting People and Localizing Their Keypoints

no code implementations CVPR 2014 Georgia Gkioxari, Bharath Hariharan, Ross Girshick, Jitendra Malik

A k-poselet is a deformable part model (DPM) with k parts, where each of the parts is a poselet, aligned to a specific configuration of keypoints based on ground-truth annotations.

Human Detection

Microsoft COCO: Common Objects in Context

36 code implementations1 May 2014 Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, Piotr Dollár

We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding.

Instance Segmentation Object +5

On learning to localize objects with minimal supervision

no code implementations5 Mar 2014 Hyun Oh Song, Ross Girshick, Stefanie Jegelka, Julien Mairal, Zaid Harchaoui, Trevor Darrell

Learning to localize objects with minimal supervision is an important problem in computer vision, since large fully annotated datasets are extremely costly to obtain.

Weakly Supervised Object Detection

Cannot find the paper you are looking for? You can Submit a new open access paper.