1 code implementation • 2 Feb 2023 • David M. Chan, Austin Myers, Sudheendra Vijayanarasimhan, David A. Ross, John Canny
If you ask a human to describe an image, they might do so in a thousand different ways.
no code implementations • 20 Dec 2022 • Vivek Rathod, Bryan Seybold, Sudheendra Vijayanarasimhan, Austin Myers, Xiuye Gu, Vighnesh Birodkar, David A. Ross
Detecting actions in untrimmed videos should not be limited to a small, closed set of classes.
no code implementations • 15 Sep 2022 • David M Chan, Yiming Ni, David A Ross, Sudheendra Vijayanarasimhan, Austin Myers, John Canny
In this work we argue that existing metrics are not appropriate for domains such as visual description or summarization where ground truths are semantically diverse, and where the diversity in those captions captures useful additional information about the context.
1 code implementation • 12 May 2022 • David M. Chan, Austin Myers, Sudheendra Vijayanarasimhan, David A. Ross, Bryan Seybold, John F. Canny
While there have been significant gains in the field of automated video description, the generalization performance of automated description models to novel domains remains a major barrier to using these systems in the real world.
no code implementations • 27 Jul 2020 • David M. Chan, Sudheendra Vijayanarasimhan, David A. Ross, John Canny
Automatic video captioning aims to train models to generate text descriptions for all segments in a video, however, the most effective approaches require large amounts of manual annotation which is slow and expensive.
no code implementations • CVPR 2018 • Yu-Wei Chao, Sudheendra Vijayanarasimhan, Bryan Seybold, David A. Ross, Jia Deng, Rahul Sukthankar
We propose TAL-Net, an improved approach to temporal action localization in video that is inspired by the Faster R-CNN object detection framework.
Ranked #29 on Temporal Action Localization on THUMOS’14
no code implementations • 6 Jul 2017 • Eric Jang, Sudheendra Vijayanarasimhan, Peter Pastor, Julian Ibarz, Sergey Levine
We consider the task of semantic robotic grasping, in which a robot picks up an object of a user-specified class using only monocular images.
8 code implementations • CVPR 2018 • Chunhui Gu, Chen Sun, David A. Ross, Carl Vondrick, Caroline Pantofaru, Yeqing Li, Sudheendra Vijayanarasimhan, George Toderici, Susanna Ricco, Rahul Sukthankar, Cordelia Schmid, Jitendra Malik
The AVA dataset densely annotates 80 atomic visual actions in 430 15-minute video clips, where actions are localized in space and time, resulting in 1. 58M action labels with multiple labels per person occurring frequently.
Ranked #6 on Action Detection on UCF101-24
12 code implementations • 19 May 2017 • Will Kay, Joao Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, Mustafa Suleyman, Andrew Zisserman
We describe the DeepMind Kinetics human action video dataset.
no code implementations • 5 May 2017 • Katerina Fragkiadaki, Jonathan Huang, Alex Alemi, Sudheendra Vijayanarasimhan, Susanna Ricco, Rahul Sukthankar
In this work, we present stochastic neural network architectures that handle such multimodality through stochasticity: future trajectories of objects, body joints or frames are represented as deep, non-linear transformations of random (as opposed to deterministic) variables.
no code implementations • 25 Apr 2017 • Sudheendra Vijayanarasimhan, Susanna Ricco, Cordelia Schmid, Rahul Sukthankar, Katerina Fragkiadaki
We propose SfM-Net, a geometry-aware neural network for motion estimation in videos that decomposes frame-to-frame pixel motion in terms of scene and object depth, camera motion and 3D object rotations and translations.
7 code implementations • 27 Sep 2016 • Sami Abu-El-Haija, Nisarg Kothari, Joonseok Lee, Paul Natsev, George Toderici, Balakrishnan Varadarajan, Sudheendra Vijayanarasimhan
Despite the size of the dataset, some of our models train to convergence in less than a day on a single machine using TensorFlow.
Ranked #1 on Action Recognition In Videos on ActivityNet
no code implementations • 22 May 2015 • Balakrishnan Varadarajan, George Toderici, Sudheendra Vijayanarasimhan, Apostol Natsev
We present two methods that build on this work, and scale it up to work with millions of videos and hundreds of thousands of classes while maintaining a low computational cost.
1 code implementation • CVPR 2015 • Joe Yue-Hei Ng, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, George Toderici
Convolutional neural networks (CNNs) have been extensively applied for image recognition problems giving state-of-the-art results on recognition, detection, segmentation and retrieval.
Ranked #5 on Action Recognition on Sports-1M
no code implementations • 23 Dec 2014 • Sudheendra Vijayanarasimhan, Jonathon Shlens, Rajat Monga, Jay Yagnik
Deep neural networks have been extremely successful at various image, speech, video recognition tasks because of their ability to model deep structures within the data.
no code implementations • CVPR 2013 • Thomas Dean, Mark A. Ruzon, Mark Segal, Jonathon Shlens, Sudheendra Vijayanarasimhan, Jay Yagnik
Many object detection systems are constrained by the time required to convolve a target image with a bank of filters that code for different aspects of an object's appearance, such as the presence of component parts.
no code implementations • NeurIPS 2010 • Prateek Jain, Sudheendra Vijayanarasimhan, Kristen Grauman
Our first approach maps the data to two-bit binary keys that are locality-sensitive for the angle between the hyperplane normal and a database point.
no code implementations • NeurIPS 2008 • Sudheendra Vijayanarasimhan, Kristen Grauman
We introduce a framework for actively learning visual categories from a mixture of weakly and strongly labeled image examples.