Search Results for author: Ross Girshick

Found 75 papers, 58 papers with code

Segment Anything

18 code implementations • ICCV 2023 • Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, Ross Girshick

We introduce the Segment Anything (SA) project: a new task, model, and dataset for image segmentation.

Ranked #2 on Zero-Shot Instance Segmentation on LVIS v1.0 val

Event-based Object Segmentation Image Segmentation +3

126,923

Paper
Code

The effectiveness of MAE pre-pretraining for billion-scale pretraining

1 code implementation • ICCV 2023 • Mannat Singh, Quentin Duval, Kalyan Vasudev Alwala, Haoqi Fan, Vaibhav Aggarwal, Aaron Adcock, Armand Joulin, Piotr Dollár, Christoph Feichtenhofer, Ross Girshick, Rohit Girdhar, Ishan Misra

While MAE has only been shown to scale with the size of models, we find that it scales with the size of the training dataset as well.

Ranked #1 on Few-Shot Image Classification on ImageNet - 10-shot (using extra training data)

Action Classification Action Recognition +6

Paper
Code

Exploring Plain Vision Transformer Backbones for Object Detection

6 code implementations • 30 Mar 2022 • Yanghao Li, Hanzi Mao, Ross Girshick, Kaiming He

This design enables the original ViT architecture to be fine-tuned for object detection without needing to redesign a hierarchical backbone for pre-training.

Ranked #5 on Instance Segmentation on LVIS v1.0 val

Instance Segmentation Object +2

29,035

Paper
Code

Revisiting Weakly Supervised Pre-Training of Visual Perception Models

2 code implementations • CVPR 2022 • Mannat Singh, Laura Gustafson, Aaron Adcock, Vinicius de Freitas Reis, Bugra Gedik, Raj Prateek Kosaraju, Dhruv Mahajan, Ross Girshick, Piotr Dollár, Laurens van der Maaten

Model pre-training is a cornerstone of modern visual recognition systems.

Ranked #1 on Out-of-Distribution Generalization on ImageNet-W (using extra training data)

Fine-Grained Image Classification Out-of-Distribution Generalization +3

169

Paper
Code

Benchmarking Detection Transfer Learning with Vision Transformers

2 code implementations • 22 Nov 2021 • Yanghao Li, Saining Xie, Xinlei Chen, Piotr Dollar, Kaiming He, Ross Girshick

The complexity of object detection methods can make this benchmarking non-trivial when new architectures, such as Vision Transformer (ViT) models, arrive.

Benchmarking object-detection +3

329

Paper
Code

PyTorchVideo: A Deep Learning Library for Video Understanding

1 code implementation • 18 Nov 2021 • Haoqi Fan, Tullie Murrell, Heng Wang, Kalyan Vasudev Alwala, Yanghao Li, Yilei Li, Bo Xiong, Nikhila Ravi, Meng Li, Haichuan Yang, Jitendra Malik, Ross Girshick, Matt Feiszli, Aaron Adcock, Wan-Yen Lo, Christoph Feichtenhofer

We introduce PyTorchVideo, an open-source deep-learning library that provides a rich set of modular, efficient, and reproducible components for a variety of video understanding tasks, including classification, detection, self-supervised learning, and low-level processing.

Self-Supervised Learning Video Understanding

3,205

Paper
Code

Masked Autoencoders Are Scalable Vision Learners

49 code implementations • CVPR 2022 • Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick

Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels.

Ranked #1 on Out-of-Distribution Generalization on ImageNet-W

Decoder Domain Generalization +5

6,862

Paper
Code

Early Convolutions Help Transformers See Better

1 code implementation • NeurIPS 2021 • Tete Xiao, Mannat Singh, Eric Mintun, Trevor Darrell, Piotr Dollár, Ross Girshick

To test whether this atypical design choice causes an issue, we analyze the optimization behavior of ViT models with their original patchify stem versus a simple counterpart where we replace the ViT stem by a small number of stacked stride-two 3*3 convolutions.

Paper
Code

A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning

2 code implementations • CVPR 2021 • Christoph Feichtenhofer, Haoqi Fan, Bo Xiong, Ross Girshick, Kaiming He

We present a large-scale study on unsupervised spatiotemporal representation learning from videos.

Ranked #3 on Self-Supervised Action Recognition on HMDB51

Representation Learning Self-Supervised Action Recognition +1

6,338

Paper
Code

Boundary IoU: Improving Object-Centric Image Segmentation Evaluation

2 code implementations • CVPR 2021 • Bowen Cheng, Ross Girshick, Piotr Dollár, Alexander C. Berg, Alexander Kirillov

We perform an extensive analysis across different error types and object sizes and show that Boundary IoU is significantly more sensitive than the standard Mask IoU measure to boundary errors for large objects and does not over-penalize errors on smaller objects.

Image Segmentation Object +2

208

Paper
Code

Fast and Accurate Model Scaling

4 code implementations • CVPR 2021 • Piotr Dollár, Mannat Singh, Ross Girshick

This leads us to propose a simple fast compound scaling strategy that encourages primarily scaling model width, while scaling depth and resolution to a lesser extent.

30,231

Paper
Code

Evaluating Large-Vocabulary Object Detectors: The Devil is in the Details

2 code implementations • 1 Feb 2021 • Achal Dave, Piotr Dollár, Deva Ramanan, Alexander Kirillov, Ross Girshick

On one hand, this is desirable as it treats all classes equally.

Benchmarking object-detection +2

2,011

Paper
Code

Large scale weakly and semi-supervised learning for low-resource video ASR

no code implementations • 16 May 2020 • Kritika Singh, Vimal Manohar, Alex Xiao, Sergey Edunov, Ross Girshick, Vitaliy Liptchinsky, Christian Fuegen, Yatharth Saraf, Geoffrey Zweig, Abdel-rahman Mohamed

Many semi- and weakly-supervised approaches have been investigated for overcoming the labeling cost of building high quality speech recognition systems.

Decoder speech-recognition +1

Paper
Add Code

Designing Network Design Spaces

24 code implementations • CVPR 2020 • Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollár

In this work, we present a new network design paradigm.

Ranked #1 on Out-of-Distribution Generalization on ImageNet-W

Image Classification Out-of-Distribution Generalization

30,231

Paper
Code

Are Labels Necessary for Neural Architecture Search?

2 code implementations • ECCV 2020 • Chenxi Liu, Piotr Dollár, Kaiming He, Ross Girshick, Alan Yuille, Saining Xie

Existing neural network architectures in computer vision -- whether designed by humans or by machines -- were typically found using both images and their associated labels.

Neural Architecture Search

2,116

Paper
Code

Improved Baselines with Momentum Contrastive Learning

36 code implementations • 9 Mar 2020 • Xinlei Chen, Haoqi Fan, Ross Girshick, Kaiming He

Contrastive unsupervised learning has recently shown encouraging progress, e. g., in Momentum Contrast (MoCo) and SimCLR.

Ranked #3 on Contrastive Learning on imagenet-1k

Contrastive Learning Data Augmentation +3

28,164

Paper
Code

PointRend: Image Segmentation as Rendering

14 code implementations • CVPR 2020 • Alexander Kirillov, Yuxin Wu, Kaiming He, Ross Girshick

We present a new method for efficient high-quality image segmentation of objects and scenes.

Ranked #3 on Instance Segmentation on COCO 2017 val

Image Segmentation Instance Segmentation +2

29,028

Paper
Code

A Multigrid Method for Efficiently Training Video Models

3 code implementations • CVPR 2020 • Chao-yuan Wu, Ross Girshick, Kaiming He, Christoph Feichtenhofer, Philipp Krähenbühl

We empirically demonstrate a general and robust grid schedule that yields a significant out-of-the-box training speedup without a loss in accuracy for different models (I3D, non-local, SlowFast), datasets (Kinetics, Something-Something, Charades), and training settings (with and without pre-training, 128 GPUs or 1 GPU).

Ranked #1 on Video Classification on Kinetics

Action Detection Action Recognition +2

6,338

Paper
Code

Momentum Contrast for Unsupervised Visual Representation Learning

45 code implementations • CVPR 2020 • Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, Ross Girshick

This enables building a large and consistent dictionary on-the-fly that facilitates contrastive unsupervised learning.

Ranked #11 on Contrastive Learning on imagenet-1k

Contrastive Learning Representation Learning +1

28,164

Paper
Code

Training ASR models by Generation of Contextual Information

no code implementations • 27 Oct 2019 • Kritika Singh, Dmytro Okhonko, Jun Liu, Yongqiang Wang, Frank Zhang, Ross Girshick, Sergey Edunov, Fuchun Peng, Yatharth Saraf, Geoffrey Zweig, Abdelrahman Mohamed

Supervised ASR models have reached unprecedented levels of accuracy, thanks in part to ever-increasing amounts of labelled training data.

Decoder speech-recognition +3

Paper
Add Code

PHYRE: A New Benchmark for Physical Reasoning

2 code implementations • NeurIPS 2019 • Anton Bakhtin, Laurens van der Maaten, Justin Johnson, Laura Gustafson, Ross Girshick

The benchmark is designed to encourage the development of learning algorithms that are sample-efficient and generalize well across puzzles.

Ranked #3 on Visual Reasoning on PHYRE-1B-Within

Visual Reasoning

425

Paper
Code

LVIS: A Dataset for Large Vocabulary Instance Segmentation

3 code implementations • CVPR 2019 • Agrim Gupta, Piotr Dollár, Ross Girshick

We plan to collect ~2 million high-quality instance segmentation masks for over 1000 entry-level object categories in 164k images.

Instance Segmentation Object +4

399

Paper
Code

Exploring Randomly Wired Neural Networks for Image Recognition

9 code implementations • ICCV 2019 • Saining Xie, Alexander Kirillov, Ross Girshick, Kaiming He

In this paper, we explore a more diverse set of connectivity patterns through the lens of randomly wired neural networks.

Ranked #118 on Neural Architecture Search on <h2>oi</h2>

Image Classification Neural Architecture Search

2,116

Paper
Code

TensorMask: A Foundation for Dense Object Segmentation

2 code implementations • ICCV 2019 • Xinlei Chen, Ross Girshick, Kaiming He, Piotr Dollár

To formalize this, we treat dense instance segmentation as a prediction task over 4D tensors and present a general framework called TensorMask that explicitly captures this geometry and enables novel operators on 4D tensors.

Ranked #90 on Instance Segmentation on COCO test-dev

Instance Segmentation Object +4

29,028

Paper
Code

Panoptic Feature Pyramid Networks

12 code implementations • CVPR 2019 • Alexander Kirillov, Ross Girshick, Kaiming He, Piotr Dollár

In this work, we perform a detailed study of this minimally extended version of Mask R-CNN with FPN, which we refer to as Panoptic FPN, and show it is a robust and accurate baseline for both tasks.

Ranked #4 on Panoptic Segmentation on Indian Driving Dataset

Instance Segmentation Panoptic Segmentation +2

29,028

Paper
Code

Long-Term Feature Banks for Detailed Video Understanding

4 code implementations • CVPR 2019 • Chao-yuan Wu, Christoph Feichtenhofer, Haoqi Fan, Kaiming He, Philipp Krähenbühl, Ross Girshick

To understand the world, we humans constantly need to relate the present to the past, and put events in context.

Ranked #4 on Egocentric Activity Recognition on EPIC-KITCHENS-55

Action Classification Action Recognition +2

3,977

Paper
Code

Rethinking ImageNet Pre-training

1 code implementation • ICCV 2019 • Kaiming He, Ross Girshick, Piotr Dollár

We report competitive results on object detection and instance segmentation on the COCO dataset using standard models trained from random initialization.

Ranked #81 on Object Detection on COCO minival

Instance Segmentation object-detection +2

6,299

Paper
Code

Exploring the Limits of Weakly Supervised Pretraining

4 code implementations • ECCV 2018 • Dhruv Mahajan, Ross Girshick, Vignesh Ramanathan, Kaiming He, Manohar Paluri, Yixuan Li, Ashwin Bharambe, Laurens van der Maaten

ImageNet classification is the de facto pretraining task for these models.

Ranked #222 on Image Classification on <h2>oi</h2> (using extra training data)

General Classification Image Classification +3

5,295

Paper
Code

Low-Shot Learning from Imaginary Data

1 code implementation • CVPR 2018 • Yu-Xiong Wang, Ross Girshick, Martial Hebert, Bharath Hariharan

Humans can quickly learn new visual concepts, perhaps because they can easily visualize or imagine what novel objects look like from different views.

General Classification

Paper
Code

Panoptic Segmentation

9 code implementations • CVPR 2019 • Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, Piotr Dollár

We propose and study a task we name panoptic segmentation (PS).

Ranked #23 on Panoptic Segmentation on Cityscapes val (using extra training data)

Image Segmentation Instance Segmentation +4

411

Paper
Code

Data Distillation: Towards Omni-Supervised Learning

4 code implementations • CVPR 2018 • Ilija Radosavovic, Piotr Dollár, Ross Girshick, Georgia Gkioxari, Kaiming He

We investigate omni-supervised learning, a special regime of semi-supervised learning in which the learner exploits all available labeled data plus internet-scale sources of unlabeled data.

Keypoint Detection object-detection +1

26,165

Paper
Code

Learning by Asking Questions

no code implementations • CVPR 2018 • Ishan Misra, Ross Girshick, Rob Fergus, Martial Hebert, Abhinav Gupta, Laurens van der Maaten

We also show that our model asks questions that generalize to state-of-the-art VQA models and to novel test time distributions.

Question Answering Visual Question Answering

Paper
Add Code

Learning to Segment Every Thing

3 code implementations • CVPR 2018 • Ronghang Hu, Piotr Dollár, Kaiming He, Trevor Darrell, Ross Girshick

Most methods for object instance segmentation require all training examples to be labeled with segmentation masks.

Instance Segmentation Segmentation +1

26,165

Paper
Code

Non-local Neural Networks

32 code implementations • CVPR 2018 • Xiaolong Wang, Ross Girshick, Abhinav Gupta, Kaiming He

Both convolutional and recurrent operations are building blocks that process one local neighborhood at a time.

Ranked #8 on Action Classification on Toyota Smarthome dataset (using extra training data)

Action Classification Action Recognition +5

26,165

Paper
Code

Focal Loss for Dense Object Detection

231 code implementations • ICCV 2017 • Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollár

Our novel Focal Loss focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training.

Ranked #3 on Region Proposal on COCO test-dev

Dense Object Detection Knowledge Distillation +5

76,675

Paper
Code

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

71 code implementations • 8 Jun 2017 • Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, Kaiming He

To achieve this result, we adopt a hyper-parameter-free linear scaling rule for adjusting learning rates as a function of minibatch size and develop a new warmup scheme that overcomes optimization challenges early in training.

Stochastic Optimization

4,378

Paper
Code

Inferring and Executing Programs for Visual Reasoning

5 code implementations • ICCV 2017 • Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Judy Hoffman, Li Fei-Fei, C. Lawrence Zitnick, Ross Girshick

Existing methods for visual reasoning attempt to directly map inputs to outputs using black-box architectures without explicitly modeling the underlying reasoning processes.

Ranked #5 on Visual Question Answering (VQA) on CLEVR-Humans

Visual Question Answering (VQA) Visual Reasoning

795

Paper
Code

Detecting and Recognizing Human-Object Interactions

2 code implementations • CVPR 2018 • Georgia Gkioxari, Ross Girshick, Piotr Dollár, Kaiming He

Our hypothesis is that the appearance of a person -- their pose, clothing, action -- is a powerful cue for localizing the objects they are interacting with.

Ranked #53 on Human-Object Interaction Detection on HICO-DET

Human-Object Interaction Detection Object

26,165

Paper
Code

Mask R-CNN

172 code implementations • ICCV 2017 • Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick

Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance.

Ranked #1 on Keypoint Estimation on GRIT

3D Instance Segmentation Human Part Segmentation +12

76,678

Paper
Code

CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning

5 code implementations • CVPR 2017 • Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Li Fei-Fei, C. Lawrence Zitnick, Ross Girshick

When building artificial intelligence systems that can reason and answer questions about visual data, we need diagnostic tests to analyze our progress and discover shortcomings.

Question Answering Visual Question Answering +1

305

Paper
Code

Learning Features by Watching Objects Move

1 code implementation • CVPR 2017 • Deepak Pathak, Ross Girshick, Piotr Dollár, Trevor Darrell, Bharath Hariharan

Given the extensive evidence that motion plays a key role in the development of the human visual system, we hope that this straightforward approach to unsupervised learning will be more effective than cleverly designed 'pretext' tasks studied in the literature.

object-detection Object Detection +1

260

Paper
Code

Feature Pyramid Networks for Object Detection

85 code implementations • CVPR 2017 • Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, Serge Belongie

Feature pyramids are a basic component in recognition systems for detecting objects at different scales.

Ranked #3 on Pedestrian Detection on TJU-Ped-campus

Object Object Detection +1

39,309

Paper
Code

Aggregated Residual Transformations for Deep Neural Networks

58 code implementations • CVPR 2017 • Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, Kaiming He

Our simple design results in a homogeneous, multi-branch architecture that has only a few hyper-parameters to set.

Ranked #3 on Image Classification on GasHisSDB

Domain Generalization General Classification +1

15,583

Paper
Code

Low-shot Visual Recognition by Shrinking and Hallucinating Features

4 code implementations • ICCV 2017 • Bharath Hariharan, Ross Girshick

Low-shot visual learning---the ability to recognize novel object categories from very few examples---is a hallmark of human visual intelligence.

Ranked #5 on Few-Shot Image Classification on ImageNet-FS (5-shot, all)

BIG-bench Machine Learning Few-Shot Image Classification

307

Paper
Code

Visual Storytelling

1 code implementation • NAACL 2016 • Ting-Hao, Huang, Francis Ferraro, Nasrin Mostafazadeh, Ishan Misra, Aishwarya Agrawal, Jacob Devlin, Ross Girshick, Xiaodong He, Pushmeet Kohli, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh, Lucy Vanderwende, Michel Galley, Margaret Mitchell

We introduce the first dataset for sequential vision-to-language, and explore how this data may be used for the task of visual storytelling.

Descriptive Visual Storytelling

Paper
Code

Deep3D: Fully Automatic 2D-to-3D Video Conversion with Deep Convolutional Neural Networks

4 code implementations • 13 Apr 2016 • Junyuan Xie, Ross Girshick, Ali Farhadi

As 3D movie viewing becomes mainstream and Virtual Reality (VR) market emerges, the demand for 3D contents is growing rapidly.

Depth Estimation

1,226

Paper
Code

Training Region-based Object Detectors with Online Hard Example Mining

5 code implementations • CVPR 2016 • Abhinav Shrivastava, Abhinav Gupta, Ross Girshick

Our motivation is the same as it has always been -- detection datasets contain an overwhelming number of easy examples and a small number of hard examples.

Ranked #6 on Face Verification on Trillion Pairs Dataset

object-detection Object Detection

422

Paper
Code

Seeing through the Human Reporting Bias: Visual Classifiers from Noisy Human-Centric Labels

no code implementations • CVPR 2016 • Ishan Misra, C. Lawrence Zitnick, Margaret Mitchell, Ross Girshick

When human annotators are given a choice about what to label in an image, they apply their own subjective judgments on what to ignore and what to mention.

Image Captioning Image Classification

Paper
Add Code

Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks

no code implementations • CVPR 2016 • Sean Bell, C. Lawrence Zitnick, Kavita Bala, Ross Girshick

In this paper we present the Inside-Outside Net (ION), an object detector that exploits information both inside and outside the region of interest.

Ranked #224 on Object Detection on COCO test-dev

Object object-detection +1

Paper
Add Code

Reducing Overfitting in Deep Networks by Decorrelating Representations

no code implementations • 19 Nov 2015 • Michael Cogswell, Faruk Ahmed, Ross Girshick, Larry Zitnick, Dhruv Batra

One major challenge in training Deep Neural Networks is preventing overfitting.

Data Augmentation

Paper
Add Code

Unsupervised Deep Embedding for Clustering Analysis

19 code implementations • 19 Nov 2015 • Junyuan Xie, Ross Girshick, Ali Farhadi

Clustering is central to many data-driven application domains and has been studied extensively in terms of distance functions and grouping algorithms.

Ranked #4 on Unsupervised Image Classification on SVHN (using extra training data)

Clustering Image Clustering +1

449

Paper
Code

You Only Look Once: Unified, Real-Time Object Detection

145 code implementations • CVPR 2016 • Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi

A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation.

Ranked #1 on Real-Time Object Detection on PASCAL VOC 2007

Object Object Counting +1

21,498

Paper
Code

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

194 code implementations • NeurIPS 2015 • Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun

In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals.

Ranked #2 on Vessel Detection on Vessel detection Dateset

Object Real-Time Object Detection +3

29,028

Paper
Code

Aligning 3D Models to RGB-D Images of Cluttered Scenes

no code implementations • CVPR 2015 • Saurabh Gupta, Pablo Arbelaez, Ross Girshick, Jitendra Malik

The goal of this work is to represent objects in an RGB-D scene with corresponding 3D models from a library.

Paper
Add Code

Exploring Nearest Neighbor Approaches for Image Captioning

1 code implementation • 17 May 2015 • Jacob Devlin, Saurabh Gupta, Ross Girshick, Margaret Mitchell, C. Lawrence Zitnick

We explore a variety of nearest neighbor baseline approaches for image captioning.

Image Captioning

Paper
Code

Contextual Action Recognition with R*CNN

2 code implementations • ICCV 2015 • Georgia Gkioxari, Ross Girshick, Jitendra Malik

In this work, we exploit the simple observation that actions are accompanied by contextual cues to build a strong action recognition system.

Ranked #4 on Weakly Supervised Object Detection on HICO-DET

Action Recognition Attribute +3

132

Paper
Code

Fast R-CNN

29 code implementations • ICCV 2015 • Ross Girshick

Fast R-CNN builds on previous work to efficiently classify object proposals using deep convolutional networks.

Ranked #23 on Object Detection on PASCAL VOC 2007 (using extra training data)

Object Object Detection

26,165

Paper
Code

Object Detection Networks on Convolutional Feature Maps

no code implementations • 23 Apr 2015 • Shaoqing Ren, Kaiming He, Ross Girshick, Xiangyu Zhang, Jian Sun

We discover that aside from deep feature maps, a deep and convolutional per-region classifier is of particular importance for object detection, whereas latest superior image classification models (such as ResNets and GoogLeNets) do not directly lead to good detection accuracy without using such a per-region classifier.

General Classification Image Classification +3

Paper
Add Code

Inferring 3D Object Pose in RGB-D Images

no code implementations • 16 Feb 2015 • Saurabh Gupta, Pablo Arbeláez, Ross Girshick, Jitendra Malik

The goal of this work is to replace objects in an RGB-D scene with corresponding 3D models from a library.

Object

Paper
Add Code

Actions and Attributes from Wholes and Parts

no code implementations • ICCV 2015 • Georgia Gkioxari, Ross Girshick, Jitendra Malik

We investigate the importance of parts for the tasks of action and attribute classification.

Attribute Classification +2

Paper
Add Code

Hypercolumns for Object Segmentation and Fine-grained Localization

6 code implementations • CVPR 2015 • Bharath Hariharan, Pablo Arbeláez, Ross Girshick, Jitendra Malik

Recognition algorithms based on convolutional networks (CNNs) typically use the output of the last layer as feature representation.

Object Semantic Segmentation

158

Paper
Code

Deformable Part Models are Convolutional Neural Networks

1 code implementation • CVPR 2015 • Ross Girshick, Forrest Iandola, Trevor Darrell, Jitendra Malik

Deformable part models (DPMs) and convolutional neural networks (CNNs) are two widely used tools for visual recognition.

Ranked #28 on Object Detection on PASCAL VOC 2007

Object Detection Rolling Shutter Correction

128

Paper
Code

Learning Rich Features from RGB-D Images for Object Detection and Segmentation

1 code implementation • 22 Jul 2014 • Saurabh Gupta, Ross Girshick, Pablo Arbeláez, Jitendra Malik

In this paper we study the problem of object detection for RGB-D images using semantically rich image and depth features.

Ranked #6 on Object Detection In Indoor Scenes on SUN RGB-D

Instance Segmentation Object +3

169

Paper
Code

LSDA: Large Scale Detection Through Adaptation

1 code implementation • NeurIPS 2014 • Judy Hoffman, Sergio Guadarrama, Eric Tzeng, Ronghang Hu, Jeff Donahue, Ross Girshick, Trevor Darrell, Kate Saenko

A major challenge in scaling object detection is the difficulty of obtaining labeled images for large numbers of categories.

Classification General Classification +2

Paper
Code

Part-based R-CNNs for Fine-grained Category Detection

no code implementations • 15 Jul 2014 • Ning Zhang, Jeff Donahue, Ross Girshick, Trevor Darrell

Semantic part localization can facilitate fine-grained categorization by explicitly isolating subtle appearance differences associated with specific object parts.

Ranked #64 on Fine-Grained Image Classification on CUB-200-2011

Fine-Grained Image Classification Object +2

Paper
Add Code

Simultaneous Detection and Segmentation

no code implementations • 7 Jul 2014 • Bharath Hariharan, Pablo Arbeláez, Ross Girshick, Jitendra Malik

Unlike classical semantic segmentation, we require individual object instances.

Ranked #5 on Object Detection on PASCAL VOC 2012

object-detection Object Detection +2

Paper
Add Code

Analyzing the Performance of Multilayer Neural Networks for Object Recognition

no code implementations • 7 Jul 2014 • Pulkit Agrawal, Ross Girshick, Jitendra Malik

In the last two years, convolutional neural networks (CNNs) have achieved an impressive suite of results on standard recognition datasets and tasks.

Object Recognition

Paper
Add Code

Caffe: Convolutional Architecture for Fast Feature Embedding

2 code implementations • 20 Jun 2014 • Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, Trevor Darrell

The framework is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.

Clustering Dimensionality Reduction +1

33,906

Paper
Code

R-CNNs for Pose Estimation and Action Detection

no code implementations • 19 Jun 2014 • Georgia Gkioxari, Bharath Hariharan, Ross Girshick, Jitendra Malik

We present convolutional neural networks for the tasks of keypoint (pose) prediction and action classification of people in unconstrained images.

Action Classification Action Detection +3

Paper
Add Code

Using k-Poselets for Detecting People and Localizing Their Keypoints

no code implementations • CVPR 2014 • Georgia Gkioxari, Bharath Hariharan, Ross Girshick, Jitendra Malik

A k-poselet is a deformable part model (DPM) with k parts, where each of the parts is a poselet, aligned to a specific configuration of keypoints based on ground-truth annotations.

Human Detection

Paper
Add Code

Understanding Objects in Detail with Fine-Grained Attributes

no code implementations • CVPR 2014 • Andrea Vedaldi, Siddharth Mahendran, Stavros Tsogkas, Subhransu Maji, Ross Girshick, Juho Kannala, Esa Rahtu, Iasonas Kokkinos, Matthew B. Blaschko, David Weiss, Ben Taskar, Karen Simonyan, Naomi Saphra, Sammy Mohamed

We show that the collected data can be used to study the relation between part detection and attribute prediction by diagnosing the performance of classifiers that pool information from different parts of an object.

Attribute Object +2

Paper
Add Code

Microsoft COCO: Common Objects in Context

35 code implementations • 1 May 2014 • Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, Piotr Dollár

We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding.

Instance Segmentation Object +5

12,218

Paper
Code

DenseNet: Implementing Efficient ConvNet Descriptor Pyramids

2 code implementations • 7 Apr 2014 • Forrest Iandola, Matt Moskewicz, Sergey Karayev, Ross Girshick, Trevor Darrell, Kurt Keutzer

Convolutional Neural Networks (CNNs) can provide accurate object classification.

General Classification Object +2

Paper
Code

On learning to localize objects with minimal supervision

no code implementations • 5 Mar 2014 • Hyun Oh Song, Ross Girshick, Stefanie Jegelka, Julien Mairal, Zaid Harchaoui, Trevor Darrell

Learning to localize objects with minimal supervision is an important problem in computer vision, since large fully annotated datasets are extremely costly to obtain.

Ranked #35 on Weakly Supervised Object Detection on PASCAL VOC 2007

Weakly Supervised Object Detection

Paper
Add Code

Rich feature hierarchies for accurate object detection and semantic segmentation

29 code implementations • CVPR 2014 • Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik

We find that R-CNN outperforms OverFeat by a large margin on the 200-class ILSVRC2013 detection dataset.

Ranked #27 on Object Detection on PASCAL VOC 2007 (using extra training data)

Object Detection Semantic Segmentation

2,347

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.