Search Results for author: Deva Ramanan

Found 148 papers, 72 papers with code

Predicting Long-horizon Futures by Conditioning on Geometry and Time

no code implementations • 17 Apr 2024 • Tarasha Khurana, Deva Ramanan

To address both challenges, our key insight is to leverage the large-scale pretraining of image diffusion models which can handle multi-modality.

Paper
Add Code

Evaluating Text-to-Visual Generation with Image-to-Text Generation

2 code implementations • 1 Apr 2024 • Zhiqiu Lin, Deepak Pathak, Baiqi Li, Jiayao Li, Xide Xia, Graham Neubig, Pengchuan Zhang, Deva Ramanan

For instance, the widely-used CLIPScore measures the alignment between a (generated) image and text prompt, but it fails to produce reliable scores for complex prompts involving compositions of objects, attributes, and relations.

Question Answering Text Generation +2

Paper
Code

Better Call SAL: Towards Learning to Segment Anything in Lidar

no code implementations • 19 Mar 2024 • Aljoša Ošep, Tim Meinhardt, Francesco Ferroni, Neehar Peri, Deva Ramanan, Laura Leal-Taixé

We propose $\texttt{SAL}$ ($\texttt{S}$egment $\texttt{A}$nything in $\texttt{L}$idar) method consisting of a text-promptable zero-shot model for segmenting and classifying any object in Lidar, and a pseudo-labeling engine that facilitates model training without manual supervision.

Panoptic Segmentation

Paper
Add Code

I Can't Believe It's Not Scene Flow!

1 code implementation • 7 Mar 2024 • Ishan Khatri, Kyle Vedder, Neehar Peri, Deva Ramanan, James Hays

Current scene flow methods broadly fail to describe motion on small objects, and current scene flow evaluation protocols hide this failure by averaging over many points, with most drawn larger objects.

Paper
Code

Cameras as Rays: Pose Estimation via Ray Diffusion

no code implementations • 22 Feb 2024 • Jason Y. Zhang, Amy Lin, Moneish Kumar, Tzu-Hsuan Yang, Deva Ramanan, Shubham Tulsiani

Estimating camera poses is a fundamental task for 3D reconstruction and remains challenging given sparsely sampled views (<10).

3D Reconstruction Denoising +2

Paper
Add Code

FlashTex: Fast Relightable Mesh Texturing with LightControlNet

no code implementations • 20 Feb 2024 • Kangle Deng, Timothy Omernick, Alexander Weiss, Deva Ramanan, Jun-Yan Zhu, Tinghui Zhou, Maneesh Agrawala

We introduce LightControlNet, a new text-to-image model based on the ControlNet architecture, which allows the specification of the desired lighting as a conditioning image to the model.

Paper
Add Code

The Neglected Tails of Vision-Language Models

no code implementations • 23 Jan 2024 • Shubham Parashar, Zhiqiu Lin, Tian Liu, Xiangjue Dong, Yanan Li, Deva Ramanan, James Caverlee, Shu Kong

We address this by using large language models (LLMs) to count the number of pretraining texts that contain synonyms of these concepts.

Retrieval Zero-Shot Learning

Paper
Add Code

Revisiting Few-Shot Object Detection with Vision-Language Models

no code implementations • 22 Dec 2023 • Anish Madan, Neehar Peri, Shu Kong, Deva Ramanan

Existing benchmarks repurpose well-established datasets like COCO by partitioning categories into base and novel classes for pre-training and fine-tuning respectively.

Autonomous Vehicles Few-Shot Object Detection +3

Paper
Add Code

TAO-Amodal: A Benchmark for Tracking Any Object Amodally

1 code implementation • 19 Dec 2023 • Cheng-Yen Hsieh, Kaihua Chen, Achal Dave, Tarasha Khurana, Deva Ramanan

Amodal perception, the ability to comprehend complete object structures from partial visibility, is a fundamental skill, even for infants.

Amodal Tracking Autonomous Driving +3

107

Paper
Code

Long-Tailed 3D Detection via 2D Late Fusion

no code implementations • 18 Dec 2023 • Yechi Ma, Neehar Peri, Shuoquan Wei, Wei Hua, Deva Ramanan, Yanan Li, Shu Kong

Autonomous vehicles (AVs) must accurately detect objects from both common and rare classes for safe navigation, motivating the problem of Long-Tailed 3D Object Detection (LT3D).

3D Object Detection Autonomous Vehicles +2

Paper
Add Code

HybridNeRF: Efficient Neural Rendering via Adaptive Volumetric Surfaces

no code implementations • 5 Dec 2023 • Haithem Turki, Vasu Agrawal, Samuel Rota Bulò, Lorenzo Porzi, Peter Kontschieder, Deva Ramanan, Michael Zollhöfer, Christian Richardt

Neural radiance fields provide state-of-the-art view synthesis quality but tend to be slow to render.

Neural Rendering

Paper
Add Code

SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM

1 code implementation • 4 Dec 2023 • Nikhil Keetha, Jay Karhade, Krishna Murthy Jatavallabhula, Gengshan Yang, Sebastian Scherer, Deva Ramanan, Jonathon Luiten

Dense simultaneous localization and mapping (SLAM) is crucial for robotics and augmented reality applications.

Novel View Synthesis Pose Estimation +2

1,215

Paper
Code

PyNeRF: Pyramidal Neural Radiance Fields

1 code implementation • NeurIPS 2023 • Haithem Turki, Michael Zollhöfer, Christian Richardt, Deva Ramanan

Compared to Mip-NeRF, we reduce error rates by 20% while training over 60x faster.

Paper
Code

Lidar Panoptic Segmentation and Tracking without Bells and Whistles

1 code implementation • 19 Oct 2023 • Abhinav Agarwalla, Xuhua Huang, Jason Ziglar, Francesco Ferroni, Laura Leal-Taixé, James Hays, Aljoša Ošep, Deva Ramanan

Our network is modular by design and optimized for all aspects of both the panoptic segmentation and tracking task.

Object Panoptic Segmentation +1

Paper
Code

Streaming Motion Forecasting for Autonomous Driving

1 code implementation • 2 Oct 2023 • Ziqi Pang, Deva Ramanan, Mengtian Li, Yu-Xiong Wang

Our benchmark inherently captures the disappearance and re-appearance of agents, presenting the emergent challenge of forecasting for occluded agents, which is a safety-critical problem yet overlooked by snapshot-based benchmarks.

Autonomous Navigation Motion Forecasting +1

Paper
Code

Language Models as Black-Box Optimizers for Vision-Language Models

1 code implementation • 12 Sep 2023 • Shihong Liu, Zhiqiu Lin, Samuel Yu, Ryan Lee, Tiffany Ling, Deepak Pathak, Deva Ramanan

We highlight the advantage of conversational feedback that incorporates both positive and negative prompts, suggesting that LLMs can utilize the implicit gradient direction in textual feedback for a more efficient search.

Few-Shot Image Classification

Paper
Code

Dynamic 3D Gaussians: Tracking by Persistent Dynamic View Synthesis

no code implementations • 18 Aug 2023 • Jonathon Luiten, Georgios Kopanas, Bastian Leibe, Deva Ramanan

We present a method that simultaneously addresses the tasks of dynamic scene novel-view synthesis and six degree-of-freedom (6-DOF) tracking of all dense scene elements.

Dynamic Reconstruction Novel View Synthesis +1

Paper
Add Code

Learning Lightweight Object Detectors via Multi-Teacher Progressive Distillation

no code implementations • 17 Aug 2023 • Shengcao Cao, Mengtian Li, James Hays, Deva Ramanan, Yi-Xiong Wang, Liang-Yan Gui

To distill knowledge from a highly accurate but complex teacher model, we construct a sequence of teachers to help the student gradually adapt.

Edge-computing Instance Segmentation +5

Paper
Add Code

An Empirical Analysis of Range for 3D Object Detection

no code implementations • 8 Aug 2023 • Neehar Peri, Mengtian Li, Benjamin Wilson, Yu-Xiong Wang, James Hays, Deva Ramanan

LiDAR-based 3D detection plays a vital role in autonomous navigation.

3D Object Detection Autonomous Navigation +3

Paper
Add Code

Thinking Like an Annotator: Generation of Dataset Labeling Instructions

no code implementations • 24 Jun 2023 • Nadine Chang, Francesco Ferroni, Michael J. Tarr, Martial Hebert, Deva Ramanan

In Labeling Instruction Generation, we take a reasonably annotated dataset and: 1) generate a set of examples that are visually representative of each category in the dataset; 2) provide a text label that corresponds to each of the examples.

Language Modelling Retrieval

Paper
Add Code

Revisiting the Role of Language Priors in Vision-Language Models

1 code implementation • 2 Jun 2023 • Zhiqiu Lin, Xinyue Chen, Deepak Pathak, Pengchuan Zhang, Deva Ramanan

Our first observation is that they can be repurposed for discriminative tasks (such as image-text retrieval) by simply computing the match score of generating a particular text string given an image.

Ranked #45 on Visual Reasoning on Winoground

Image-text matching Language Modelling +6

Paper
Code

ZeroFlow: Scalable Scene Flow via Distillation

1 code implementation • 17 May 2023 • Kyle Vedder, Neehar Peri, Nathaniel Chodosh, Ishan Khatri, Eric Eaton, Dinesh Jayaraman, Yang Liu, Deva Ramanan, James Hays

Scene flow estimation is the task of describing the 3D motion field between temporally successive point clouds.

Ranked #3 on Self-supervised Scene Flow Estimation on Argoverse 2

Self-supervised Scene Flow Estimation

Paper
Code

WEDGE: A multi-weather autonomous driving dataset built from generative vision-language models

1 code implementation • 12 May 2023 • Aboli Marathe, Deva Ramanan, Rahee Walambe, Ketan Kotecha

WEDGE consists of 3360 images in 16 extreme weather conditions manually annotated with 16513 bounding boxes, supporting research in the tasks of weather classification and 2D object detection.

Adversarial Robustness Autonomous Driving +2

Paper
Code

Joint Metrics Matter: A Better Standard for Trajectory Forecasting

1 code implementation • ICCV 2023 • Erica Weng, Hana Hoshino, Deva Ramanan, Kris Kitani

In response to the limitations of marginal metrics, we present the first comprehensive evaluation of state-of-the-art (SOTA) trajectory forecasting methods with respect to multi-agent metrics (joint metrics): JADE, JFDE, and collision rate.

Trajectory Forecasting

Paper
Code

Reconstructing Animatable Categories from Videos

2 code implementations • CVPR 2023 • Gengshan Yang, Chaoyang Wang, N Dinesh Reddy, Deva Ramanan

Building animatable 3D models is challenging due to the need for 3D scans, laborious registration, and manual rigging, which are difficult to scale to arbitrary categories.

3D Shape Reconstruction from Videos Dynamic Reconstruction +1

210

Paper
Code

RelPose++: Recovering 6D Poses from Sparse-view Observations

1 code implementation • 8 May 2023 • Amy Lin, Jason Y. Zhang, Deva Ramanan, Shubham Tulsiani

We address the task of estimating 6D camera poses from sparse-view image sets (2-8 images).

3D Reconstruction Pose Estimation +1

198

Paper
Code

Total-Recon: Deformable Scene Reconstruction for Embodied View Synthesis

no code implementations • ICCV 2023 • Chonghyuk Song, Gengshan Yang, Kangle Deng, Jun-Yan Zhu, Deva Ramanan

Given a minute-long RGBD video of people interacting with their pets, we render the scene from novel camera trajectories derived from the in-scene motion of actors: (1) egocentric cameras that simulate the point of view of a target actor and (2) 3rd-person cameras that follow the actor.

Paper
Add Code

Re-Evaluating LiDAR Scene Flow for Autonomous Driving

no code implementations • 4 Apr 2023 • Nathaniel Chodosh, Deva Ramanan, Simon Lucey

Popular benchmarks for self-supervised LiDAR scene flow (stereoKITTI, and FlyingThings3D) have unrealistic rates of dynamic motion, unrealistic correspondences, and unrealistic sampling patterns.

Autonomous Driving Motion Compensation +1

Paper
Add Code

Learning to Zoom and Unzoom

no code implementations • CVPR 2023 • Chittesh Thavamani, Mengtian Li, Francesco Ferroni, Deva Ramanan

In this work (LZU), we "learn to zoom" in on the input image, compute spatial features, and then "unzoom" to revert any deformations.

Autonomous Navigation Monocular 3D Object Detection +3

Paper
Add Code

SUDS: Scalable Urban Dynamic Scenes

no code implementations • CVPR 2023 • Haithem Turki, Jason Y. Zhang, Francesco Ferroni, Deva Ramanan

We extend neural radiance fields (NeRFs) to dynamic large-scale urban scenes.

3D Instance Segmentation Novel View Synthesis +2

Paper
Add Code

Point Cloud Forecasting as a Proxy for 4D Occupancy Forecasting

1 code implementation • CVPR 2023 • Tarasha Khurana, Peiyun Hu, David Held, Deva Ramanan

One promising self-supervised task is 3D point cloud forecasting from unannotated LiDAR sequences.

Motion Planning

184

Paper
Code

3D-aware Conditional Image Synthesis

2 code implementations • CVPR 2023 • Kangle Deng, Gengshan Yang, Deva Ramanan, Jun-Yan Zhu

We propose pix2pix3D, a 3D-aware conditional generative model for controllable photorealistic image synthesis.

Image Generation

1,644

Paper
Code

Multimodality Helps Unimodality: Cross-Modal Few-Shot Learning with Multimodal Models

1 code implementation • CVPR 2023 • Zhiqiu Lin, Samuel Yu, Zhiyi Kuang, Deepak Pathak, Deva Ramanan

By repurposing class names as additional one-shot training samples, we achieve SOTA results with an embarrassingly simple linear classifier for vision-language adaptation.

Audio Classification Few-Shot Learning

233

Paper
Code

Pix2Map: Cross-modal Retrieval for Inferring Street Maps from Images

no code implementations • CVPR 2023 • Xindi Wu, KwunFung Lau, Francesco Ferroni, Aljoša Ošep, Deva Ramanan

Moreover, we show that our retrieved maps can be used to update or expand existing maps and even show proof-of-concept results for visual localization and image retrieval from spatial graphs.

Autonomous Navigation Cross-Modal Retrieval +3

Paper
Add Code

TarViS: A Unified Approach for Target-based Video Segmentation

1 code implementation • CVPR 2023 • Ali Athar, Alexander Hermans, Jonathon Luiten, Deva Ramanan, Bastian Leibe

A single TarViS model can be trained jointly on a collection of datasets spanning different tasks, and can hot-swap between tasks during inference without any task-specific retraining.

Ranked #2 on Video Panoptic Segmentation on KITTI-STEP (using extra training data)

Instance Segmentation Segmentation +4

Paper
Code

Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting

1 code implementation • Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 2021 • Benjamin Wilson, William Qi, Tanmay Agarwal, John Lambert, Jagjeet Singh, Siddhesh Khandelwal, Bowen Pan, Ratnesh Kumar, Andrew Hartnett, Jhony Kaesemodel Pontes, Deva Ramanan, Peter Carr, James Hays

Models are tasked with the prediction of future motion for "scored actors" in each scenario and are provided with track histories that capture object location, heading, velocity, and category.

3D Object Detection Motion Forecasting +1

285

Paper
Code

PPR: Physically Plausible Reconstruction from Monocular Videos

no code implementations • ICCV 2023 • Gengshan Yang, Shuo Yang, John Z. Zhang, Zachary Manchester, Deva Ramanan

Given monocular videos, we build 3D models of articulated objects and environments whose 3D configurations satisfy dynamics and contact constraints.

Paper
Add Code

Distilling Neural Fields for Real-Time Articulated Shape Reconstruction

no code implementations • CVPR 2023 • Jeff Tan, Gengshan Yang, Deva Ramanan

We present a method for reconstructing articulated 3D models from videos in real-time, without test-time optimization or manual 3D supervision at training time.

motion prediction

Paper
Add Code

Far3Det: Towards Far-Field 3D Detection

no code implementations • 25 Nov 2022 • Shubham Gupta, Jeet Kanjani, Mengtian Li, Francesco Ferroni, James Hays, Deva Ramanan, Shu Kong

We focus on the task of far-field 3D detection (Far3Det) of objects beyond a certain distance from an observer, e. g., $>$50m.

Autonomous Vehicles Philosophy

Paper
Add Code

Towards Long-Tailed 3D Detection

1 code implementation • 16 Nov 2022 • Neehar Peri, Achal Dave, Deva Ramanan, Shu Kong

Moreover, semantic classes are often organized within a hierarchy, e. g., tail classes such as child and construction-worker are arguably subclasses of pedestrian.

Paper
Code

Soft Augmentation for Image Classification

1 code implementation • CVPR 2023 • Yang Liu, Shen Yan, Laura Leal-Taixé, James Hays, Deva Ramanan

We draw inspiration from human visual classification studies and propose generalizing augmentation with invariant transforms to soft augmentation where the learning target softens non-linearly as a function of the degree of the transform applied to the sample: e. g., more aggressive image crop augmentations produce less confident learning targets.

Classification Data Augmentation +1

Paper
Code

Learning to Discover and Detect Objects

1 code implementation • 19 Oct 2022 • Vladimir Fomenko, Ismail Elezi, Deva Ramanan, Laura Leal-Taixé, Aljoša Ošep

We then train our network to learn to classify each RoI, either as one of the known classes, seen in the source dataset, or one of the novel classes, with a long-tail distribution constraint on the class assignments, reflecting the natural frequency of classes in the real world.

Ranked #2 on Novel Object Detection on LVIS v1.0 val

Novel Class Discovery Novel Object Detection +3

108

Paper
Code

Continual Learning with Evolving Class Ontologies

no code implementations • 10 Oct 2022 • Zhiqiu Lin, Deepak Pathak, Yu-Xiong Wang, Deva Ramanan, Shu Kong

LECO requires learning classifiers in distinct time periods (TPs); each TP introduces a new ontology of "fine" labels that refines old ontologies of "coarse" labels (e. g., dog breeds that refine the previous ${\tt dog}$).

Class Incremental Learning Image Classification +3

Paper
Add Code

Differentiable Raycasting for Self-supervised Occupancy Forecasting

1 code implementation • 4 Oct 2022 • Tarasha Khurana, Peiyun Hu, Achal Dave, Jason Ziglar, David Held, Deva Ramanan

Self-supervised representations proposed for large-scale planning, such as ego-centric freespace, confound these two motions, making the representation difficult to use for downstream motion planners.

Autonomous Driving Motion Planning +1

Paper
Code

BURST: A Benchmark for Unifying Object Recognition, Segmentation and Tracking in Video

1 code implementation • 25 Sep 2022 • Ali Athar, Jonathon Luiten, Paul Voigtlaender, Tarasha Khurana, Achal Dave, Bastian Leibe, Deva Ramanan

Multiple existing benchmarks involve tracking and segmenting objects in video e. g., Video Object Segmentation (VOS) and Multi-Object Tracking and Segmentation (MOTS), but there is little interaction between them due to the use of disparate benchmark datasets and metrics (e. g. J&F, mAP, sMOTSA).

Ranked #4 on Long-tail Video Object Segmentation on BURST-val (using extra training data)

Long-tail Video Object Segmentation Multi-Object Tracking +6

Paper
Code

RelPose: Predicting Probabilistic Relative Rotation for Single Objects in the Wild

1 code implementation • 11 Aug 2022 • Jason Y. Zhang, Deva Ramanan, Shubham Tulsiani

We describe a data-driven method for inferring the camera viewpoints given multiple images of an arbitrary object.

Object Object Reconstruction

Paper
Code

Differentiable Soft-Masked Attention

1 code implementation • 1 Jun 2022 • Ali Athar, Jonathon Luiten, Alexander Hermans, Deva Ramanan, Bastian Leibe

Recently, "Masked Attention" was proposed in which a given object representation only attends to those image pixel features for which the segmentation mask of that object is active.

Object Segmentation +4

Paper
Code

Forecasting from LiDAR via Future Object Detection

1 code implementation • CVPR 2022 • Neehar Peri, Jonathon Luiten, Mengtian Li, Aljoša Ošep, Laura Leal-Taixé, Deva Ramanan

Object detection and forecasting are fundamental components of embodied perception.

Motion Forecasting Object +2

114

Paper
Code

Long-Tailed Recognition via Weight Balancing

1 code implementation • CVPR 2022 • Shaden Alshammari, Yu-Xiong Wang, Deva Ramanan, Shu Kong

In contrast, weight decay penalizes larger weights more heavily and so learns small balanced weights; the MaxNorm constraint encourages growing small weights within a norm ball but caps all the weights by the radius.

Ranked #9 on Long-tail Learning on CIFAR-100-LT (ρ=10)

Classification Long-tail Learning

116

Paper
Code

The CLEAR Benchmark: Continual LEArning on Real-World Imagery

1 code implementation • 17 Jan 2022 • Zhiqiu Lin, Jia Shi, Deepak Pathak, Deva Ramanan

The major strength of CLEAR over prior CL benchmarks is the smooth temporal evolution of visual concepts with real-world imagery, including both high-quality labeled data along with abundant unlabeled samples per time period for continual semi-supervised learning.

Continual Learning Image Classification +2

Paper
Code

Opening Up Open World Tracking

no code implementations • CVPR 2022 • Yang Liu, Idil Esen Zulfikar, Jonathon Luiten, Achal Dave, Deva Ramanan, Bastian Leibe, Aljoša Ošep, Laura Leal-Taixé

A benchmark that would allow us to perform an apple-to-apple comparison of existing efforts is a crucial first step towards advancing this important research field.

Ranked #3 on Open-World Video Segmentation on BURST-val (using extra training data)

Multi-Object Tracking Object +1

Paper
Add Code

BANMo: Building Animatable 3D Neural Models from Many Casual Videos

1 code implementation • CVPR 2022 • Gengshan Yang, Minh Vo, Natalia Neverova, Deva Ramanan, Andrea Vedaldi, Hanbyul Joo

Our key insight is to merge three schools of thought; (1) classic deformable shape models that make use of articulated bones and blend skinning, (2) volumetric neural radiance fields (NeRFs) that are amenable to gradient-based optimization, and (3) canonical embeddings that generate correspondences between pixels and an articulated model.

3D Shape Reconstruction from Videos Dynamic Reconstruction

509

Paper
Code

Mega-NeRF: Scalable Construction of Large-Scale NeRFs for Virtual Fly-Throughs

1 code implementation • CVPR 2022 • Haithem Turki, Deva Ramanan, Mahadev Satyanarayanan

We use neural radiance fields (NeRFs) to build interactive 3D environments from large-scale visual captures spanning buildings or even multiple city blocks collected primarily from drones.

434

Paper
Code

HODOR: High-level Object Descriptors for Object Re-segmentation in Video Learned from Static Images

1 code implementation • CVPR 2022 • Ali Athar, Jonathon Luiten, Alexander Hermans, Deva Ramanan, Bastian Leibe

Existing state-of-the-art methods for Video Object Segmentation (VOS) learn low-level pixel-to-pixel correspondences between frames to propagate object masks across video.

Object Semantic Segmentation +2

Paper
Code

ViSER: Video-Specific Surface Embeddings for Articulated 3D Shape Reconstruction

1 code implementation • NeurIPS 2021 • Gengshan Yang, Deqing Sun, Varun Jampani, Daniel Vlasic, Forrester Cole, Ce Liu, Deva Ramanan

The surface embeddings are implemented as coordinate-based MLPs that are fit to each video via consistency and contrastive reconstruction losses. Experimental results show that ViSER compares favorably against prior work on challenging videos of humans with loose clothing and unusual poses as well as animals videos from DAVIS and YTVOS.

3D Shape Reconstruction from Videos

Paper
Code

NeRS: Neural Reflectance Surfaces for Sparse-view 3D Reconstruction in the Wild

1 code implementation • NeurIPS 2021 • Jason Y. Zhang, Gengshan Yang, Shubham Tulsiani, Deva Ramanan

NeRS learns a neural shape representation of a closed surface that is diffeomorphic to a sphere, guaranteeing water-tight reconstructions.

3D Reconstruction Neural Rendering

299

Paper
Code

Low-Shot Validation: Active Importance Sampling for Estimating Classifier Performance on Rare Categories

no code implementations • ICCV 2021 • Fait Poms, Vishnu Sarukkai, Ravi Teja Mullapudi, Nimit S. Sohoni, William R. Mark, Deva Ramanan, Kayvon Fatahalian

For machine learning models trained with limited labeled training data, validation stands to become the main bottleneck to reducing overall annotation costs.

Paper
Add Code

FOVEA: Foveated Image Magnification for Autonomous Navigation

1 code implementation • ICCV 2021 • Chittesh Thavamani, Mengtian Li, Nicolas Cebron, Deva Ramanan

Efficient processing of high-res video streams is safety-critical for many robotics applications such as autonomous driving.

Autonomous Driving Autonomous Navigation +3

Paper
Code

Depth-supervised NeRF: Fewer Views and Faster Training for Free

1 code implementation • CVPR 2022 • Kangle Deng, Andrew Liu, Jun-Yan Zhu, Deva Ramanan

Crucially, SFM also produces sparse 3D points that can be used as "free" depth supervision during training: we add a loss to encourage the distribution of a ray's terminating depth matches a given 3D keypoint, incorporating depth uncertainty.

RGB-D Reconstruction

732

Paper
Code

Safe Local Motion Planning With Self-Supervised Freespace Forecasting

1 code implementation • CVPR 2021 • Peiyun Hu, Aaron Huang, John Dolan, David Held, Deva Ramanan

Finally, we propose future freespace as an additional source of annotation-free supervision.

Autonomous Driving Motion Planning +3

109

Paper
Code

LASR: Learning Articulated Shape Reconstruction from a Monocular Video

1 code implementation • CVPR 2021 • Gengshan Yang, Deqing Sun, Varun Jampani, Daniel Vlasic, Forrester Cole, Huiwen Chang, Deva Ramanan, William T. Freeman, Ce Liu

Remarkable progress has been made in 3D reconstruction of rigid structures from a video or a collection of images.

3D Shape Reconstruction from Videos Object

166

Paper
Code

Opening up Open-World Tracking

no code implementations • 22 Apr 2021 • Yang Liu, Idil Esen Zulfikar, Jonathon Luiten, Achal Dave, Deva Ramanan, Bastian Leibe, Aljoša Ošep, Laura Leal-Taixé

We hope to open a new front in multi-object tracking research that will hopefully bring us a step closer to intelligent systems that can operate safely in the real world.

Multi-Object Tracking Object

Paper
Add Code

OpenGAN: Open-Set Recognition via Open Data Generation

1 code implementation • ICCV 2021 • Shu Kong, Deva Ramanan

However, the former generalizes poorly to diverse open test data due to overfitting to the training outliers, which are unlikely to exhaustively span the open-world.

Open Set Learning

111

Paper
Code

Multimodal Object Detection via Probabilistic Ensembling

2 code implementations • 7 Apr 2021 • Yi-Ting Chen, Jinghao Shi, Zelin Ye, Christoph Mertz, Deva Ramanan, Shu Kong

Object detection with multimodal inputs can improve many safety-critical systems such as autonomous vehicles (AVs).

Autonomous Vehicles Object +2

118

Paper
Code

Streaming Self-Training via Domain-Agnostic Unlabeled Images

no code implementations • 7 Apr 2021 • Zhiqiu Lin, Deva Ramanan, Aayush Bansal

We present streaming self-training (SST) that aims to democratize the process of learning visual recognition models such that a non-expert user can define a new task depending on their needs via a few labeled examples and minimal domain knowledge.

Fine-Grained Image Classification Semantic Segmentation +1

Paper
Add Code

Video-Specific Autoencoders for Exploring, Editing and Transmitting Videos

no code implementations • 31 Mar 2021 • Kevin Wang, Deva Ramanan, Aayush Bansal

Associating latent codes of a video and manifold projection enables users to make desired edits.

Denoising Super-Resolution

Paper
Add Code

Evaluating Large-Vocabulary Object Detectors: The Devil is in the Details

2 code implementations • 1 Feb 2021 • Achal Dave, Piotr Dollár, Deva Ramanan, Alexander Kirillov, Ross Girshick

On one hand, this is desirable as it treats all classes equally.

Benchmarking object-detection +2

1,942

Paper
Code

Learning to Segment Rigid Motions from Two Frames

1 code implementation • CVPR 2021 • Gengshan Yang, Deva Ramanan

Geometric motion segmentation algorithms, however, generalize to novel scenes, but have yet to achieve comparable performance to appearance-based ones, due to noisy motion estimations and degenerate motion configurations.

Motion Segmentation Scene Flow Estimation +2

183

Paper
Code

Learning Rare Category Classifiers on a Tight Labeling Budget

no code implementations • ICCV 2021 • Ravi Teja Mullapudi, Fait Poms, William R. Mark, Deva Ramanan, Kayvon Fatahalian

In this paper, we consider the scenario where we start with as-little-as five labeled positives of a rare category and a large amount of unlabeled data of which 99. 9% of it is negatives.

Active Learning Representation Learning

Paper
Add Code

An Empirical Exploration of Open-Set Recognition via Lightweight Statistical Pipelines

no code implementations • 1 Jan 2021 • Shu Kong, Deva Ramanan

Machine-learned safety-critical systems need to be self-aware and reliably know their unknowns in the open-world.

open-set classification Open Set Learning +3

Paper
Add Code

Detecting Invisible People

1 code implementation • ICCV 2021 • Tarasha Khurana, Achal Dave, Deva Ramanan

We demonstrate that current detection and tracking systems perform dramatically worse on this task.

Monocular Depth Estimation Object +3

Paper
Code

Background Splitting: Finding Rare Classes in a Sea of Background

1 code implementation • CVPR 2021 • Ravi Teja Mullapudi, Fait Poms, William R. Mark, Deva Ramanan, Kayvon Fatahalian

We focus on the real-world problem of training accurate deep models for image classification of a small number of rare categories.

Image Classification

Paper
Code

What-If Motion Prediction for Autonomous Driving

1 code implementation • 24 Aug 2020 • Siddhesh Khandelwal, William Qi, Jagjeet Singh, Andrew Hartnett, Deva Ramanan

Forecasting the long-term future motion of road actors is a core challenge to the deployment of safe autonomous vehicles (AVs).

Autonomous Driving counterfactual +1

115

Paper
Code

Perceiving 3D Human-Object Spatial Arrangements from a Single Image in the Wild

1 code implementation • ECCV 2020 • Jason Y. Zhang, Sam Pepose, Hanbyul Joo, Deva Ramanan, Jitendra Malik, Angjoo Kanazawa

We present a method that infers spatial arrangements and shapes of humans and objects in a globally consistent 3D scene, all from a single image in-the-wild captured in an uncontrolled environment.

Ranked #3 on 3D Object Reconstruction on BEHAVE

3D Human Pose Estimation 3D Human Reconstruction +5

173

Paper
Code

4D Visualization of Dynamic Events from Unconstrained Multi-View Videos

no code implementations • CVPR 2020 • Aayush Bansal, Minh Vo, Yaser Sheikh, Deva Ramanan, Srinivasa Narasimhan

We present a data-driven approach for 4D space-time visualization of dynamic events from videos captured by hand-held multiple cameras.

Paper
Add Code

Towards Streaming Perception

1 code implementation • ECCV 2020 • Mengtian Li, Yu-Xiong Wang, Deva Ramanan

While past work has studied the algorithmic trade-off between latency and accuracy, there has not been a clear metric to compare different methods along the Pareto optimal latency-accuracy curve.

Ranked #2 on Real-Time Object Detection on Argoverse-HD (Detection-Only, Val) (using extra training data)

Instance Segmentation Motion Forecasting +5

Paper
Code

TAO: A Large-Scale Benchmark for Tracking Any Object

no code implementations • ECCV 2020 • Achal Dave, Tarasha Khurana, Pavel Tokmakov, Cordelia Schmid, Deva Ramanan

To this end, we ask annotators to label objects that move at any point in the video, and give names to them post factum.

Multi-Object Tracking Object +2

Paper
Add Code

CATER: A diagnostic dataset for Compositional Actions & TEmporal Reasoning

no code implementations • ICLR 2020 • Rohit Girdhar, Deva Ramanan

In this work, we build a video dataset with fully observable and controllable object and scene bias, and which truly requires spatiotemporal understanding in order to be solved.

Object Video Understanding

Paper
Add Code

Learning Generative Models of Tissue Organization with Supervised GANs

1 code implementation • 31 Mar 2020 • Ligong Han, Robert F. Murphy, Deva Ramanan

A key step in understanding the spatial organization of cells and tissues is the ability to construct generative models that accurately reflect that organization.

Image Generation

Paper
Code

Unsupervised Audiovisual Synthesis via Exemplar Autoencoders

1 code implementation • ICLR 2021 • Kangle Deng, Aayush Bansal, Deva Ramanan

We present an unsupervised approach that converts the input speech of any individual into audiovisual streams of potentially-infinitely many output speakers.

120

Paper
Code

Learning to Move with Affordance Maps

1 code implementation • ICLR 2020 • William Qi, Ravi Teja Mullapudi, Saurabh Gupta, Deva Ramanan

In this paper, we combine the best of both worlds with a modular approach that learns a spatial representation of a scene that is trained to be effective when coupled with traditional geometric planners.

Autonomous Navigation Navigate +1

Paper
Code

Hierarchical Deep Stereo Matching on High-resolution Images

2 code implementations • CVPR 2019 • Gengshan Yang, Joshua Manela, Michael Happold, Deva Ramanan

We explore the problem of real-time stereo matching on high-res imagery.

Autonomous Driving Stereo Matching +1

405

Paper
Code

Inferring Distributions Over Depth from a Single Image

1 code implementation • 12 Dec 2019 • Gengshan Yang, Peiyun Hu, Deva Ramanan

Such approaches cannot diagnose when failures might occur.

Autonomous Vehicles Binary Classification +2

Paper
Code

Learning to Optimally Segment Point Clouds

no code implementations • 10 Dec 2019 • Peiyun Hu, David Held, Deva Ramanan

We prove that if we score a segmentation by the worst objectness among its individual segments, there is an efficient algorithm that finds the optimal worst-case segmentation among an exponentially large number of candidate segmentations.

Instance Segmentation Segmentation +1

Paper
Add Code

What You See is What You Get: Exploiting Visibility for 3D Object Detection

1 code implementation • CVPR 2020 • Peiyun Hu, Jason Ziglar, David Held, Deva Ramanan

On the NuScenes 3D detection benchmark, we show that, by adding an additional stream for visibility input, we can significantly improve the overall detection accuracy of a state-of-the-art 3D detector.

3D Object Detection Data Augmentation +1

111

Paper
Code

Volumetric Correspondence Networks for Optical Flow

2 code implementations • NeurIPS 2019 • Gengshan Yang, Deva Ramanan

As a result, SOTA networks also employ various heuristics designed to limit volumetric processing, leading to limited accuracy and overfitting.

Ranked #14 on Optical Flow Estimation on KITTI 2015 (train)

Optical Flow Estimation

152

Paper
Code

Are we asking the right questions in MovieQA?

no code implementations • 8 Nov 2019 • Bhavan Jasani, Rohit Girdhar, Deva Ramanan

Joint vision and language tasks like visual question answering are fascinating because they explore high-level understanding, but at the same time, can be more prone to language biases.

Question Answering Visual Question Answering

Paper
Add Code

Argoverse: 3D Tracking and Forecasting with Rich Maps

3 code implementations • CVPR 2019 • Ming-Fang Chang, John Lambert, Patsorn Sangkloy, Jagjeet Singh, Slawomir Bak, Andrew Hartnett, De Wang, Peter Carr, Simon Lucey, Deva Ramanan, James Hays

In our baseline experiments, we illustrate how detailed map information such as lane direction, driveable area, and ground height improves the accuracy of 3D object tracking and motion forecasting.

3D Object Tracking Autonomous Vehicles +3

808

Paper
Code

Learning to Track Any Object

no code implementations • 25 Oct 2019 • Achal Dave, Pavel Tokmakov, Cordelia Schmid, Deva Ramanan

Moreover, at test time the same network can be applied to detection and tracking, resulting in a unified approach for the two tasks.

Instance Segmentation Object +5

Paper
Add Code

MetaPix: Few-Shot Video Retargeting

no code implementations • ICLR 2020 • Jessica Lee, Deva Ramanan, Rohit Girdhar

We address the task of unsupervised retargeting of human actions from one video to another.

Meta-Learning

Paper
Add Code

CATER: A diagnostic dataset for Compositional Actions and TEmporal Reasoning

2 code implementations • 10 Oct 2019 • Rohit Girdhar, Deva Ramanan

In this work, we build a video dataset with fully observable and controllable object and scene bias, and which truly requires spatiotemporal understanding in order to be solved.

Object Video Object Tracking +1

103

Paper
Code

Weakly-supervised Action Localization with Background Modeling

no code implementations • ICCV 2019 • Phuc Xuan Nguyen, Deva Ramanan, Charless C. Fowlkes

Our approach makes use of two innovations to attention-modeling in weakly-supervised learning.

Action Localization Weakly Supervised Action Localization +1

Paper
Add Code

Growing a Brain: Fine-Tuning by Increasing Model Capacity

no code implementations • CVPR 2017 • Yu-Xiong Wang, Deva Ramanan, Martial Hebert

One of their remarkable properties is the ability to transfer knowledge from a large source dataset to a (typically smaller) target dataset.

Developmental Learning

Paper
Add Code

Shapes and Context: In-the-Wild Image Synthesis & Manipulation

no code implementations • CVPR 2019 • Aayush Bansal, Yaser Sheikh, Deva Ramanan

We introduce a data-driven approach for interactively synthesizing in-the-wild images from semantic label maps.

Image Generation

Paper
Add Code

Do Image Classifiers Generalize Across Time?

1 code implementation • ICCV 2021 • Vaishaal Shankar, Achal Dave, Rebecca Roelofs, Deva Ramanan, Benjamin Recht, Ludwig Schmidt

Additionally, we evaluate three detection models and show that natural perturbations induce both classification as well as localization errors, leading to a median drop in detection mAP of 14 points.

General Classification Video Object Detection

Paper
Code

A Systematic Framework for Natural Perturbations from Videos

no code implementations • ICML Workshop Deep_Phenomen 2019 • Vaishaal Shankar, Achal Dave, Rebecca Roelofs, Deva Ramanan, Benjamin Recht, Ludwig Schmidt

We introduce a systematic framework for quantifying the robustness of classifiers to naturally occurring perturbations of images found in videos.

object-detection Video Object Detection

Paper
Add Code

Budgeted Training: Rethinking Deep Neural Network Training Under Resource Constraints

1 code implementation • ICLR 2020 • Mengtian Li, Ersin Yumer, Deva Ramanan

We also revisit existing approaches for fast convergence and show that budget-aware learning schedules readily outperform such approaches under (the practical but under-explored) budgeted training setting.

General Classification Image Classification +6

127

Paper
Code

Towards Segmenting Anything That Moves

1 code implementation • 11 Feb 2019 • Achal Dave, Pavel Tokmakov, Deva Ramanan

To address this concern, we propose two new benchmarks for generic, moving object detection, and show that our model matches top-down methods on common categories, while significantly out-performing both top-down and bottom-up methods on never-before-seen categories.

Action Detection Instance Segmentation +7

103

Paper
Code

DistInit: Learning Video Representations Without a Single Labeled Video

no code implementations • ICCV 2019 • Rohit Girdhar, Du Tran, Lorenzo Torresani, Deva Ramanan

In this work, we propose an alternative approach to learning video representations that require no semantically labeled videos and instead leverages the years of effort in collecting and labeling large and clean still-image datasets.

Ranked #72 on Action Recognition on HMDB-51 (using extra training data)

Action Recognition Temporal Action Localization +1

Paper
Add Code

Photo-Sketching: Inferring Contour Drawings from Images

3 code implementations • 2 Jan 2019 • Mengtian Li, Zhe Lin, Radomir Mech, Ersin Yumer, Deva Ramanan

Edges, boundaries and contours are important subjects of study in both computer graphics and computer vision.

Boundary Detection

Paper
Code

Online Model Distillation for Efficient Video Inference

1 code implementation • ICCV 2019 • Ravi Teja Mullapudi, Steven Chen, Keyi Zhang, Deva Ramanan, Kayvon Fatahalian

Rather than learn a specialized student model on offline data from the video stream, we train the student in an online fashion on the live video, intermittently running the teacher to provide a target for learning.

Segmentation Semantic Segmentation +2

Paper
Code

Few-Shot Human Motion Prediction via Meta-Learning

no code implementations • ECCV 2018 • Liang-Yan Gui, Yu-Xiong Wang, Deva Ramanan, Jose M. F. Moura

This paper addresses the problem of few-shot human motion prediction, in the spirit of the recent progress on few-shot learning and meta-learning.

Few-Shot Learning Human motion prediction +1

Paper
Add Code

Recycle-GAN: Unsupervised Video Retargeting

1 code implementation • ECCV 2018 • Aayush Bansal, Shugao Ma, Deva Ramanan, Yaser Sheikh

We introduce a data-driven approach for unsupervised video retargeting that translates content from one domain to another while preserving the style native to a domain, i. e., if contents of John Oliver's speech were to be transferred to Stephen Colbert, then the generated content/speech should be in Stephen Colbert's style.

Face to Face Translation Translation +1

406

Paper
Code

Active Testing: An Efficient and Robust Framework for Estimating Accuracy

no code implementations • ICML 2018 • Phuc Nguyen, Deva Ramanan, Charless Fowlkes

Much recent work on visual recognition aims to scale up learning to massive, noisily-annotated datasets.

Instance Segmentation Multi-Label Classification +1

Paper
Add Code

Cross-Domain Image Matching with Deep Feature Maps

1 code implementation • 6 Apr 2018 • Bailey Kong, James Supancic, Deva Ramanan, Charless C. Fowlkes

We investigate the problem of automatically determining what type of shoe left an impression found at a crime scene.

Image Retrieval Retrieval

Paper
Code

Active Learning with Partial Feedback

1 code implementation • ICLR 2019 • Peiyun Hu, Zachary C. Lipton, Anima Anandkumar, Deva Ramanan

While many active learning papers assume that the learner can simply ask for a label and receive it, real annotation often presents a mismatch between the form of a label (say, one among many classes), and the form of an annotation (typically yes/no binary feedback).

Active Learning

Paper
Code

Brute-Force Facial Landmark Analysis With A 140,000-Way Classifier

no code implementations • 6 Feb 2018 • Mengtian Li, Laszlo Jeni, Deva Ramanan

While most prior work treats this as a regression problem, we instead formulate it as a discrete $K$-way classification task, where a classifier is trained to return one of $K$ discrete alignments.

General Classification regression

Paper
Add Code

Learning to Model the Tail

no code implementations • NeurIPS 2017 • Yu-Xiong Wang, Deva Ramanan, Martial Hebert

We cast this problem as transfer learning, where knowledge from the data-rich classes in the head of the distribution is transferred to the data-poor classes in the tail.

Image Classification Transfer Learning

Paper
Add Code

Patch Correspondences for Interpreting Pixel-level CNNs

no code implementations • 29 Nov 2017 • Victor Fragoso, Chunhui Liu, Aayush Bansal, Deva Ramanan

We present compositional nearest neighbors (CompNN), a simple approach to visually interpreting distributed representations learned by a convolutional neural network (CNN) for pixel-level tasks (e. g., image synthesis and segmentation).

Image-to-Image Translation Segmentation +2

Paper
Add Code

Attentional Pooling for Action Recognition

1 code implementation • NeurIPS 2017 • Rohit Girdhar, Deva Ramanan

We introduce a simple yet surprisingly powerful model to incorporate attention in action recognition and human object interaction tasks.

Ranked #7 on Human-Object Interaction Detection on HICO

Action Recognition Human-Object Interaction Detection +1

258

Paper
Code

PixelNN: Example-based Image Synthesis

1 code implementation • ICLR 2018 • Aayush Bansal, Yaser Sheikh, Deva Ramanan

We present a simple nearest-neighbor (NN) approach that synthesizes high-frequency photorealistic images from an "incomplete" signal such as a low-resolution image, a surface normal map, or edges.

Image Generation

Paper
Code

Learning Policies for Adaptive Tracking with Deep Feature Cascades

no code implementations • ICCV 2017 • Chen Huang, Simon Lucey, Deva Ramanan

Our fundamental insight is to take an adaptive approach, where easy frames are processed with cheap features (such as pixel values), while challenging frames are processed with invariant but expensive deep features.

Decision Making Visual Object Tracking

Paper
Add Code

Unconstrained Face Detection and Open-Set Face Recognition Challenge

no code implementations • 8 Aug 2017 • Manuel Günther, Peiyun Hu, Christian Herrmann, Chi Ho Chan, Min Jiang, Shufan Yang, Akshay Raj Dhamija, Deva Ramanan, Jürgen Beyerer, Josef Kittler, Mohamad Al Jazaery, Mohammad Iqbal Nouyed, Guodong Guo, Cezary Stankiewicz, Terrance E. Boult

Face detection and recognition benchmarks have shifted toward more difficult environments.

Face Detection Face Identification +3

Paper
Add Code

Comparing Apples and Oranges: Off-Road Pedestrian Detection on the NREC Agricultural Person-Detection Dataset

no code implementations • 22 Jul 2017 • Zachary Pezzementi, Trenton Tabor, Peiyun Hu, Jonathan K. Chang, Deva Ramanan, Carl Wellington, Benzun P. Wisely Babu, Herman Herman

Person detection from vehicles has made rapid progress recently with the advent of multiple highquality datasets of urban and highway driving, yet no large-scale benchmark is available for the same problem in off-road or agricultural environments.

Human Detection Pedestrian Detection

Paper
Add Code

Tracking as Online Decision-Making: Learning a Policy from Streaming Videos with Reinforcement Learning

no code implementations • ICCV 2017 • James Steven Supancic III, Deva Ramanan

We formulate tracking as an online decision-making process, where a tracking agent must follow an object despite ambiguous image frames and a limited computational budget.

Decision Making Reinforcement Learning (RL)

Paper
Add Code

Predictive-Corrective Networks for Action Detection

no code implementations • CVPR 2017 • Achal Dave, Olga Russakovsky, Deva Ramanan

While deep feature learning has revolutionized techniques for static-image understanding, the same does not quite hold for video processing.

Action Detection Optical Flow Estimation +2

Paper
Add Code

ActionVLAD: Learning spatio-temporal aggregation for action classification

no code implementations • CVPR 2017 • Rohit Girdhar, Deva Ramanan, Abhinav Gupta, Josef Sivic, Bryan Russell

In this work, we introduce a new video representation for action classification that aggregates local convolutional features across the entire spatio-temporal extent of the video.

Ranked #8 on Long-video Activity Recognition on Breakfast

Action Classification Classification +3

Paper
Add Code

Expecting the Unexpected: Training Detectors for Unusual Pedestrians with Adversarial Imposters

1 code implementation • CVPR 2017 • Shiyu Huang, Deva Ramanan

Such "in-the-tail" data is notoriously hard to observe, making both training and testing difficult.

Pedestrian Detection

181

Paper
Code

Need for Speed: A Benchmark for Higher Frame Rate Object Tracking

1 code implementation • ICCV 2017 • Hamed Kiani Galoogahi, Ashton Fagg, Chen Huang, Deva Ramanan, Simon Lucey

In this paper, we propose the first higher frame rate video dataset (called Need for Speed - NfS) and benchmark for visual object tracking.

Visual Object Tracking

Paper
Code

PixelNet: Representation of the pixels, by the pixels, and for the pixels

1 code implementation • 21 Feb 2017 • Aayush Bansal, Xinlei Chen, Bryan Russell, Abhinav Gupta, Deva Ramanan

We explore design principles for general pixel-level prediction problems, from low-level edge detection to mid-level surface normal estimation to high-level semantic segmentation.

Edge Detection Segmentation +2

Paper
Code

3D Human Pose Estimation = 2D Pose Estimation + Matching

no code implementations • CVPR 2017 • Ching-Hang Chen, Deva Ramanan

While many approaches try to directly predict 3D pose from image measurements, we explore a simple architecture that reasons through intermediate 2D pose predictions.

Ranked #294 on 3D Human Pose Estimation on Human3.6M

2D Pose Estimation 3D Human Pose Estimation +2

Paper
Add Code

Tinkering Under the Hood: Interactive Zero-Shot Learning with Net Surgery

no code implementations • 15 Dec 2016 • Vivek Krishnan, Deva Ramanan

We consider the task of visual net surgery, in which a CNN can be reconfigured without extra data to recognize novel concepts that may be omitted from the training set.

Novel Concepts Zero-Shot Learning

Paper
Add Code

Finding Tiny Faces

20 code implementations • CVPR 2017 • Peiyun Hu, Deva Ramanan

We explore three aspects of the problem in the context of finding small faces: the role of scale invariance, image resolution, and contextual reasoning.

Ranked #25 on Face Detection on WIDER Face (Medium)

Face Detection Object Recognition

1,128

Paper
Code

PixelNet: Towards a General Pixel-level Architecture

no code implementations • 21 Sep 2016 • Aayush Bansal, Xinlei Chen, Bryan Russell, Abhinav Gupta, Deva Ramanan

We explore architectures for general pixel-level prediction problems, from low-level edge detection to mid-level surface normal estimation to high-level semantic segmentation.

Edge Detection Semantic Segmentation +1

Paper
Add Code

The Open World of Micro-Videos

no code implementations • 31 Mar 2016 • Phuc Xuan Nguyen, Gregory Rogez, Charless Fowlkes, Deva Ramanan

Micro-videos are six-second videos popular on social media networks with several unique properties.

TAG Video Understanding

Paper
Add Code

Understanding Everyday Hands in Action From RGB-D Images

no code implementations • ICCV 2015 • Gregory Rogez, James S. Supancic III, Deva Ramanan

We analyze functional manipulations of handheld objects, formalizing the problem as one of fine-grained grasp classification.

Paper
Add Code

Depth-Based Hand Pose Estimation: Data, Methods, and Challenges

no code implementations • ICCV 2015 • James S. Supancic III, Gregory Rogez, Yi Yang, Jamie Shotton, Deva Ramanan

To spur further progress we introduce a challenging new dataset with diverse, cluttered scenes.

Hand Pose Estimation

Paper
Add Code

Look and Think Twice: Capturing Top-Down Visual Attention With Feedback Convolutional Neural Networks

no code implementations • ICCV 2015 • Chunshui Cao, Xian-Ming Liu, Yi Yang, Yinan Yu, Jiang Wang, Zilei Wang, Yongzhen Huang, Liang Wang, Chang Huang, Wei Xu, Deva Ramanan, Thomas S. Huang

While feedforward deep convolutional neural networks (CNNs) have been a great success in computer vision, it is important to remember that the human visual contex contains generally more feedback connections than foward connections.

Paper
Add Code

Bottom-Up and Top-Down Reasoning with Hierarchical Rectified Gaussians

1 code implementation • CVPR 2016 • Peiyun Hu, Deva Ramanan

We show that RGs can be optimized with a quadratic program (QP), that can in turn be optimized with a recurrent neural network (with rectified linear units).

Ranked #40 on Pose Estimation on MPII Human Pose

Pose Estimation

Paper
Code

First-Person Pose Recognition Using Egocentric Workspaces

no code implementations • CVPR 2015 • Gregory Rogez, James S. Supancic III, Deva Ramanan

In egocentric views, hands and arms are observable within a well defined volume in front of the camera.

Pose Estimation

Paper
Add Code

Multi-scale recognition with DAG-CNNs

no code implementations • ICCV 2015 • Songfan Yang, Deva Ramanan

We explore multi-scale convolutional neural nets (CNNs) for image classification.

Classification General Classification +1

Paper
Add Code

Depth-based hand pose estimation: methods, data, and challenges

no code implementations • 24 Apr 2015 • James Steven Supancic III, Gregory Rogez, Yi Yang, Jamie Shotton, Deva Ramanan

To spur further progress we introduce a challenging new dataset with diverse, cluttered scenes.

Hand Pose Estimation

Paper
Add Code

Do We Need More Training Data?

no code implementations • 5 Mar 2015 • Xiangxin Zhu, Carl Vondrick, Charless Fowlkes, Deva Ramanan

Datasets for training object recognition systems are steadily increasing in size.

Object Recognition

Paper
Add Code

3D Hand Pose Detection in Egocentric RGB-D Images

no code implementations • 29 Nov 2014 • Gregory Rogez, James S. Supancic III, Maryam Khademi, Jose Maria Martinez Montiel, Deva Ramanan

We focus on the task of everyday hand pose estimation from egocentric viewpoints.

Hand Detection Hand Pose Estimation

Paper
Add Code

Egocentric Pose Recognition in Four Lines of Code

no code implementations • 29 Nov 2014 • Gregory Rogez, James S. Supancic III, Deva Ramanan

We tackle the problem of estimating the 3D pose of an individual's upper limbs (arms+hands) from a chest mounted depth-camera.

Pose Estimation

Paper
Add Code

Parsing Videos of Actions with Segmental Grammars

no code implementations • CVPR 2014 • Hamed Pirsiavash, Deva Ramanan

Real-world videos of human activities exhibit temporal structure at various scales; long videos are typically composed out of multiple action instances, where each instance is itself composed of sub-actions with variable durations and orderings.

Paper
Add Code

Analysis by Synthesis: 3D Object Recognition by Object Reconstruction

no code implementations • CVPR 2014 • Mohsen Hejrati, Deva Ramanan

We introduce an efficient "brute-force" approach to inference that searches through a large number of candidate reconstructions, returning the optimal one.

3D Object Recognition Object +1

Paper
Add Code

Capturing Long-tail Distributions of Object Subcategories

no code implementations • CVPR 2014 • Xiangxin Zhu, Dragomir Anguelov, Deva Ramanan

We argue that object subcategories follow a long-tail distribution: a few subcategories are common, while many are rare.

Clustering Object

Paper
Add Code

Parsing Occluded People

no code implementations • CVPR 2014 • Golnaz Ghiasi, Yi Yang, Deva Ramanan, Charless C. Fowlkes

Occlusion poses a significant difficulty for object recognition due to the combinatorial diversity of possible occlusion patterns.

Object Recognition Pose Estimation

Paper
Add Code

Microsoft COCO: Common Objects in Context

35 code implementations • 1 May 2014 • Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, Piotr Dollár

We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding.

Instance Segmentation Object +5

12,022

Paper
Code

Dual coordinate solvers for large-scale structural SVMs

no code implementations • 6 Dec 2013 • Deva Ramanan

This manuscript describes a method for training linear SVMs (including binary SVMs, SVM regression, and structural SVMs) from large, out-of-core training datasets.

3D Object Recognition Action Classification +2

Paper
Add Code

Exploring Weak Stabilization for Motion Feature Extraction

no code implementations • CVPR 2013 • Dennis Park, C. L. Zitnick, Deva Ramanan, Piotr Dollar

We describe novel but simple motion features for the problem of detecting objects in video sequences.

Optical Flow Estimation Pedestrian Detection +1

Paper
Add Code

Histograms of Sparse Codes for Object Detection

no code implementations • CVPR 2013 • Xiaofeng Ren, Deva Ramanan

Object detection has seen huge progress in recent years, much thanks to the heavily-engineered Histograms of Oriented Gradients (HOG) features.

Dimensionality Reduction Object +2

Paper
Add Code

Self-Paced Learning for Long-Term Tracking

no code implementations • CVPR 2013 • James S. Supancic III, Deva Ramanan

We address the problem of long-term object tracking, where the object may become occluded or leave-the-view.

Object object-detection +2

Paper
Add Code

Analyzing 3D Objects in Cluttered Images

no code implementations • NeurIPS 2012 • Mohsen Hejrati, Deva Ramanan

We use a morphable model to capture 3D within-class variation, and use a weak-perspective camera model to capture viewpoint.

3D Shape Reconstruction Viewpoint Estimation

Paper
Add Code

Video Annotation and Tracking with Active Learning

no code implementations • NeurIPS 2011 • Carl Vondrick, Deva Ramanan

We introduce a novel active learning framework for video annotation.

Active Learning

Paper
Add Code

Statistical Tests for Optimization Efficiency

no code implementations • NeurIPS 2011 • Levi Boyles, Anoop Korattikara, Deva Ramanan, Max Welling

Learning problems such as logistic regression are typically formulated as pure optimization problems defined on some loss function.

regression

Paper
Add Code

Bilinear classifiers for visual recognition

no code implementations • NeurIPS 2009 • Hamed Pirsiavash, Deva Ramanan, Charless C. Fowlkes

Bilinear classifiers are a discriminative variant of bilinear models, which capture the dependence of data on multiple factors.

Action Classification General Classification +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.