Search Results for author: Sudheendra Vijayanarasimhan

Found 16 papers, 4 papers with code

Distribution Aware Metrics for Conditional Natural Language Generation

no code implementations15 Sep 2022 David M Chan, Yiming Ni, Austin Myers, Sudheendra Vijayanarasimhan, David A Ross, John Canny

In this work we argue that existing metrics are not appropriate for domains such as visual description or summarization where ground truths are semantically diverse, and where the diversity in those captions captures useful additional information about the context.

speech-recognition Speech Recognition +1

What's in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics

1 code implementation12 May 2022 David M. Chan, Austin Myers, Sudheendra Vijayanarasimhan, David A. Ross, Bryan Seybold, John F. Canny

While there have been significant gains in the field of automated video description, the generalization performance of automated description models to novel domains remains a major barrier to using these systems in the real world.

Video Description

Active Learning for Video Description With Cluster-Regularized Ensemble Ranking

no code implementations27 Jul 2020 David M. Chan, Sudheendra Vijayanarasimhan, David A. Ross, John Canny

Automatic video captioning aims to train models to generate text descriptions for all segments in a video, however, the most effective approaches require large amounts of manual annotation which is slow and expensive.

Active Learning Video Captioning +1

End-to-End Learning of Semantic Grasping

no code implementations6 Jul 2017 Eric Jang, Sudheendra Vijayanarasimhan, Peter Pastor, Julian Ibarz, Sergey Levine

We consider the task of semantic robotic grasping, in which a robot picks up an object of a user-specified class using only monocular images.

object-detection Object Detection +2

AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions

4 code implementations CVPR 2018 Chunhui Gu, Chen Sun, David A. Ross, Carl Vondrick, Caroline Pantofaru, Yeqing Li, Sudheendra Vijayanarasimhan, George Toderici, Susanna Ricco, Rahul Sukthankar, Cordelia Schmid, Jitendra Malik

The AVA dataset densely annotates 80 atomic visual actions in 430 15-minute video clips, where actions are localized in space and time, resulting in 1. 58M action labels with multiple labels per person occurring frequently.

Action Recognition Video Understanding

Motion Prediction Under Multimodality with Conditional Stochastic Networks

no code implementations5 May 2017 Katerina Fragkiadaki, Jonathan Huang, Alex Alemi, Sudheendra Vijayanarasimhan, Susanna Ricco, Rahul Sukthankar

In this work, we present stochastic neural network architectures that handle such multimodality through stochasticity: future trajectories of objects, body joints or frames are represented as deep, non-linear transformations of random (as opposed to deterministic) variables.

motion prediction Optical Flow Estimation +2

SfM-Net: Learning of Structure and Motion from Video

no code implementations25 Apr 2017 Sudheendra Vijayanarasimhan, Susanna Ricco, Cordelia Schmid, Rahul Sukthankar, Katerina Fragkiadaki

We propose SfM-Net, a geometry-aware neural network for motion estimation in videos that decomposes frame-to-frame pixel motion in terms of scene and object depth, camera motion and 3D object rotations and translations.

Motion Estimation Optical Flow Estimation

Efficient Large Scale Video Classification

no code implementations22 May 2015 Balakrishnan Varadarajan, George Toderici, Sudheendra Vijayanarasimhan, Apostol Natsev

We present two methods that build on this work, and scale it up to work with millions of videos and hundreds of thousands of classes while maintaining a low computational cost.

Classification General Classification +2

Beyond Short Snippets: Deep Networks for Video Classification

no code implementations CVPR 2015 Joe Yue-Hei Ng, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, George Toderici

Convolutional neural networks (CNNs) have been extensively applied for image recognition problems giving state-of-the-art results on recognition, detection, segmentation and retrieval.

Action Recognition Classification +2

Deep Networks With Large Output Spaces

no code implementations23 Dec 2014 Sudheendra Vijayanarasimhan, Jonathon Shlens, Rajat Monga, Jay Yagnik

Deep neural networks have been extremely successful at various image, speech, video recognition tasks because of their ability to model deep structures within the data.

Video Recognition

Fast, Accurate Detection of 100,000 Object Classes on a Single Machine

no code implementations CVPR 2013 Thomas Dean, Mark A. Ruzon, Mark Segal, Jonathon Shlens, Sudheendra Vijayanarasimhan, Jay Yagnik

Many object detection systems are constrained by the time required to convolve a target image with a bank of filters that code for different aspects of an object's appearance, such as the presence of component parts.

object-detection Object Detection

Hashing Hyperplane Queries to Near Points with Applications to Large-Scale Active Learning

no code implementations NeurIPS 2010 Prateek Jain, Sudheendra Vijayanarasimhan, Kristen Grauman

Our first approach maps the data to two-bit binary keys that are locality-sensitive for the angle between the hyperplane normal and a database point.

Active Learning

Multi-Level Active Prediction of Useful Image Annotations for Recognition

no code implementations NeurIPS 2008 Sudheendra Vijayanarasimhan, Kristen Grauman

We introduce a framework for actively learning visual categories from a mixture of weakly and strongly labeled image examples.

Cannot find the paper you are looking for? You can Submit a new open access paper.