Search Results for author: Ross Girshick

Found 68 papers, 48 papers with code

Early Convolutions Help Transformers See Better

1 code implementation28 Jun 2021 Tete Xiao, Mannat Singh, Eric Mintun, Trevor Darrell, Piotr Dollár, Ross Girshick

To test whether this atypical design choice causes an issue, we analyze the optimization behavior of ViT models with their original patchify stem versus a simple counterpart where we replace the ViT stem by a small number of stacked stride-two 3x3 convolutions.

Boundary IoU: Improving Object-Centric Image Segmentation Evaluation

1 code implementation CVPR 2021 Bowen Cheng, Ross Girshick, Piotr Dollár, Alexander C. Berg, Alexander Kirillov

We perform an extensive analysis across different error types and object sizes and show that Boundary IoU is significantly more sensitive than the standard Mask IoU measure to boundary errors for large objects and does not over-penalize errors on smaller objects.

Panoptic Segmentation

Fast and Accurate Model Scaling

2 code implementations CVPR 2021 Piotr Dollár, Mannat Singh, Ross Girshick

This leads us to propose a simple fast compound scaling strategy that encourages primarily scaling model width, while scaling depth and resolution to a lesser extent.

Large scale weakly and semi-supervised learning for low-resource video ASR

no code implementations16 May 2020 Kritika Singh, Vimal Manohar, Alex Xiao, Sergey Edunov, Ross Girshick, Vitaliy Liptchinsky, Christian Fuegen, Yatharth Saraf, Geoffrey Zweig, Abdel-rahman Mohamed

Many semi- and weakly-supervised approaches have been investigated for overcoming the labeling cost of building high quality speech recognition systems.

Speech Recognition

Are Labels Necessary for Neural Architecture Search?

2 code implementations ECCV 2020 Chenxi Liu, Piotr Dollár, Kaiming He, Ross Girshick, Alan Yuille, Saining Xie

Existing neural network architectures in computer vision -- whether designed by humans or by machines -- were typically found using both images and their associated labels.

Neural Architecture Search

Improved Baselines with Momentum Contrastive Learning

21 code implementations9 Mar 2020 Xinlei Chen, Haoqi Fan, Ross Girshick, Kaiming He

Contrastive unsupervised learning has recently shown encouraging progress, e. g., in Momentum Contrast (MoCo) and SimCLR.

Contrastive Learning Data Augmentation +3

A Multigrid Method for Efficiently Training Video Models

3 code implementations CVPR 2020 Chao-yuan Wu, Ross Girshick, Kaiming He, Christoph Feichtenhofer, Philipp Krähenbühl

We empirically demonstrate a general and robust grid schedule that yields a significant out-of-the-box training speedup without a loss in accuracy for different models (I3D, non-local, SlowFast), datasets (Kinetics, Something-Something, Charades), and training settings (with and without pre-training, 128 GPUs or 1 GPU).

Action Detection Action Recognition +1

PHYRE: A New Benchmark for Physical Reasoning

1 code implementation NeurIPS 2019 Anton Bakhtin, Laurens van der Maaten, Justin Johnson, Laura Gustafson, Ross Girshick

The benchmark is designed to encourage the development of learning algorithms that are sample-efficient and generalize well across puzzles.

Visual Reasoning

LVIS: A Dataset for Large Vocabulary Instance Segmentation

3 code implementations CVPR 2019 Agrim Gupta, Piotr Dollár, Ross Girshick

We plan to collect ~2 million high-quality instance segmentation masks for over 1000 entry-level object categories in 164k images.

Instance Segmentation Object Detection +1

TensorMask: A Foundation for Dense Object Segmentation

2 code implementations ICCV 2019 Xinlei Chen, Ross Girshick, Kaiming He, Piotr Dollár

To formalize this, we treat dense instance segmentation as a prediction task over 4D tensors and present a general framework called TensorMask that explicitly captures this geometry and enables novel operators on 4D tensors.

Instance Segmentation Object Detection +1

Panoptic Feature Pyramid Networks

9 code implementations CVPR 2019 Alexander Kirillov, Ross Girshick, Kaiming He, Piotr Dollár

In this work, we perform a detailed study of this minimally extended version of Mask R-CNN with FPN, which we refer to as Panoptic FPN, and show it is a robust and accurate baseline for both tasks.

Instance Segmentation Panoptic Segmentation

Rethinking ImageNet Pre-training

1 code implementation ICCV 2019 Kaiming He, Ross Girshick, Piotr Dollár

We report competitive results on object detection and instance segmentation on the COCO dataset using standard models trained from random initialization.

Instance Segmentation Object Detection +1

Low-Shot Learning from Imaginary Data

no code implementations CVPR 2018 Yu-Xiong Wang, Ross Girshick, Martial Hebert, Bharath Hariharan

Humans can quickly learn new visual concepts, perhaps because they can easily visualize or imagine what novel objects look like from different views.

General Classification Meta-Learning

Data Distillation: Towards Omni-Supervised Learning

4 code implementations CVPR 2018 Ilija Radosavovic, Piotr Dollár, Ross Girshick, Georgia Gkioxari, Kaiming He

We investigate omni-supervised learning, a special regime of semi-supervised learning in which the learner exploits all available labeled data plus internet-scale sources of unlabeled data.

Keypoint Detection Object Detection

Learning by Asking Questions

no code implementations CVPR 2018 Ishan Misra, Ross Girshick, Rob Fergus, Martial Hebert, Abhinav Gupta, Laurens van der Maaten

We also show that our model asks questions that generalize to state-of-the-art VQA models and to novel test time distributions.

Question Answering Visual Question Answering

Learning to Segment Every Thing

3 code implementations CVPR 2018 Ronghang Hu, Piotr Dollár, Kaiming He, Trevor Darrell, Ross Girshick

Most methods for object instance segmentation require all training examples to be labeled with segmentation masks.

Instance Segmentation Semantic Segmentation

Non-local Neural Networks

22 code implementations CVPR 2018 Xiaolong Wang, Ross Girshick, Abhinav Gupta, Kaiming He

Both convolutional and recurrent operations are building blocks that process one local neighborhood at a time.

Ranked #7 on Action Classification on Toyota Smarthome dataset (using extra training data)

Action Classification Action Recognition +3

Focal Loss for Dense Object Detection

207 code implementations ICCV 2017 Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollár

Our novel Focal Loss focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training.

Dense Object Detection Long-tail Learning +2

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

50 code implementations8 Jun 2017 Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, Kaiming He

To achieve this result, we adopt a hyper-parameter-free linear scaling rule for adjusting learning rates as a function of minibatch size and develop a new warmup scheme that overcomes optimization challenges early in training.

Stochastic Optimization

Inferring and Executing Programs for Visual Reasoning

5 code implementations ICCV 2017 Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Judy Hoffman, Li Fei-Fei, C. Lawrence Zitnick, Ross Girshick

Existing methods for visual reasoning attempt to directly map inputs to outputs using black-box architectures without explicitly modeling the underlying reasoning processes.

Visual Question Answering Visual Reasoning

Detecting and Recognizing Human-Object Interactions

2 code implementations CVPR 2018 Georgia Gkioxari, Ross Girshick, Piotr Dollár, Kaiming He

Our hypothesis is that the appearance of a person -- their pose, clothing, action -- is a powerful cue for localizing the objects they are interacting with.

Human-Object Interaction Detection

Mask R-CNN

136 code implementations ICCV 2017 Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick

Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance.

3D Instance Segmentation Human Part Segmentation +7

CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning

4 code implementations CVPR 2017 Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Li Fei-Fei, C. Lawrence Zitnick, Ross Girshick

When building artificial intelligence systems that can reason and answer questions about visual data, we need diagnostic tests to analyze our progress and discover shortcomings.

Question Answering Visual Question Answering +1

Learning Features by Watching Objects Move

1 code implementation CVPR 2017 Deepak Pathak, Ross Girshick, Piotr Dollár, Trevor Darrell, Bharath Hariharan

Given the extensive evidence that motion plays a key role in the development of the human visual system, we hope that this straightforward approach to unsupervised learning will be more effective than cleverly designed 'pretext' tasks studied in the literature.

Object Detection Transfer Learning

Low-shot Visual Recognition by Shrinking and Hallucinating Features

4 code implementations ICCV 2017 Bharath Hariharan, Ross Girshick

Low-shot visual learning---the ability to recognize novel object categories from very few examples---is a hallmark of human visual intelligence.

Deep3D: Fully Automatic 2D-to-3D Video Conversion with Deep Convolutional Neural Networks

3 code implementations13 Apr 2016 Junyuan Xie, Ross Girshick, Ali Farhadi

As 3D movie viewing becomes mainstream and Virtual Reality (VR) market emerges, the demand for 3D contents is growing rapidly.

Virtual Reality

Training Region-based Object Detectors with Online Hard Example Mining

5 code implementations CVPR 2016 Abhinav Shrivastava, Abhinav Gupta, Ross Girshick

Our motivation is the same as it has always been -- detection datasets contain an overwhelming number of easy examples and a small number of hard examples.

Object Detection

Seeing through the Human Reporting Bias: Visual Classifiers from Noisy Human-Centric Labels

no code implementations CVPR 2016 Ishan Misra, C. Lawrence Zitnick, Margaret Mitchell, Ross Girshick

When human annotators are given a choice about what to label in an image, they apply their own subjective judgments on what to ignore and what to mention.

Image Captioning Image Classification

Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks

no code implementations CVPR 2016 Sean Bell, C. Lawrence Zitnick, Kavita Bala, Ross Girshick

In this paper we present the Inside-Outside Net (ION), an object detector that exploits information both inside and outside the region of interest.

Small Object Detection

Unsupervised Deep Embedding for Clustering Analysis

15 code implementations19 Nov 2015 Junyuan Xie, Ross Girshick, Ali Farhadi

Clustering is central to many data-driven application domains and has been studied extensively in terms of distance functions and grouping algorithms.

Image Clustering Unsupervised Image Classification

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

155 code implementations NeurIPS 2015 Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun

In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals.

Real-Time Object Detection Region Proposal

Aligning 3D Models to RGB-D Images of Cluttered Scenes

no code implementations CVPR 2015 Saurabh Gupta, Pablo Arbelaez, Ross Girshick, Jitendra Malik

The goal of this work is to represent objects in an RGB-D scene with corresponding 3D models from a library.

Contextual Action Recognition with R*CNN

2 code implementations ICCV 2015 Georgia Gkioxari, Ross Girshick, Jitendra Malik

In this work, we exploit the simple observation that actions are accompanied by contextual cues to build a strong action recognition system.

Action Recognition General Classification +1

Fast R-CNN

27 code implementations ICCV 2015 Ross Girshick

Fast R-CNN builds on previous work to efficiently classify object proposals using deep convolutional networks.

Object Detection

Object Detection Networks on Convolutional Feature Maps

no code implementations23 Apr 2015 Shaoqing Ren, Kaiming He, Ross Girshick, Xiangyu Zhang, Jian Sun

We discover that aside from deep feature maps, a deep and convolutional per-region classifier is of particular importance for object detection, whereas latest superior image classification models (such as ResNets and GoogLeNets) do not directly lead to good detection accuracy without using such a per-region classifier.

General Classification Image Classification +2

Inferring 3D Object Pose in RGB-D Images

no code implementations16 Feb 2015 Saurabh Gupta, Pablo Arbeláez, Ross Girshick, Jitendra Malik

The goal of this work is to replace objects in an RGB-D scene with corresponding 3D models from a library.

Hypercolumns for Object Segmentation and Fine-grained Localization

6 code implementations CVPR 2015 Bharath Hariharan, Pablo Arbeláez, Ross Girshick, Jitendra Malik

Recognition algorithms based on convolutional networks (CNNs) typically use the output of the last layer as feature representation.

Semantic Segmentation

Deformable Part Models are Convolutional Neural Networks

1 code implementation CVPR 2015 Ross Girshick, Forrest Iandola, Trevor Darrell, Jitendra Malik

Deformable part models (DPMs) and convolutional neural networks (CNNs) are two widely used tools for visual recognition.

Learning Rich Features from RGB-D Images for Object Detection and Segmentation

no code implementations22 Jul 2014 Saurabh Gupta, Ross Girshick, Pablo Arbeláez, Jitendra Malik

In this paper we study the problem of object detection for RGB-D images using semantically rich image and depth features.

Instance Segmentation Object Detection +1

Part-based R-CNNs for Fine-grained Category Detection

no code implementations15 Jul 2014 Ning Zhang, Jeff Donahue, Ross Girshick, Trevor Darrell

Semantic part localization can facilitate fine-grained categorization by explicitly isolating subtle appearance differences associated with specific object parts.

Fine-Grained Image Classification Object Detection

Analyzing the Performance of Multilayer Neural Networks for Object Recognition

1 code implementation7 Jul 2014 Pulkit Agrawal, Ross Girshick, Jitendra Malik

In the last two years, convolutional neural networks (CNNs) have achieved an impressive suite of results on standard recognition datasets and tasks.

Object Recognition

Caffe: Convolutional Architecture for Fast Feature Embedding

2 code implementations20 Jun 2014 Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, Trevor Darrell

The framework is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.

Dimensionality Reduction General Classification

R-CNNs for Pose Estimation and Action Detection

no code implementations19 Jun 2014 Georgia Gkioxari, Bharath Hariharan, Ross Girshick, Jitendra Malik

We present convolutional neural networks for the tasks of keypoint (pose) prediction and action classification of people in unconstrained images.

Action Classification Action Detection +3

Understanding Objects in Detail with Fine-Grained Attributes

no code implementations CVPR 2014 Andrea Vedaldi, Siddharth Mahendran, Stavros Tsogkas, Subhransu Maji, Ross Girshick, Juho Kannala, Esa Rahtu, Iasonas Kokkinos, Matthew B. Blaschko, David Weiss, Ben Taskar, Karen Simonyan, Naomi Saphra, Sammy Mohamed

We show that the collected data can be used to study the relation between part detection and attribute prediction by diagnosing the performance of classifiers that pool information from different parts of an object.

Object Detection

Using k-Poselets for Detecting People and Localizing Their Keypoints

no code implementations CVPR 2014 Georgia Gkioxari, Bharath Hariharan, Ross Girshick, Jitendra Malik

A k-poselet is a deformable part model (DPM) with k parts, where each of the parts is a poselet, aligned to a specific configuration of keypoints based on ground-truth annotations.

Human Detection

Microsoft COCO: Common Objects in Context

21 code implementations1 May 2014 Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, Piotr Dollár

We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding.

Instance Segmentation Object Localization +3

On learning to localize objects with minimal supervision

no code implementations5 Mar 2014 Hyun Oh Song, Ross Girshick, Stefanie Jegelka, Julien Mairal, Zaid Harchaoui, Trevor Darrell

Learning to localize objects with minimal supervision is an important problem in computer vision, since large fully annotated datasets are extremely costly to obtain.

Weakly Supervised Object Detection

Cannot find the paper you are looking for? You can Submit a new open access paper.