On semi-supervised learning benchmarks we improve performance significantly when only 1% ImageNet labels are available, from 53. 8% to 56. 5%.
Ranked #1 on Image Classification on PASCAL VOC 2007
This task space can be quite general and abstract; its only requirements are to be sampleable and to well-cover the space of useful tasks.
We demonstrate the use of this representation to imitate surgical suturing motions from publicly available videos of the JIGSAWS dataset.
Prior work in imitation learning typically requires each task be specified with a task id or goal image -- something that is often impractical in open-world environments.
We propose a self-supervised approach for learning representations of objects from monocular videos and demonstrate it is particularly useful in situated settings such as robotics.
We introduce a self-supervised representation learning method based on the task of temporal alignment between videos.
Ranked #1 on Video Alignment on UPenn Action
Mutual information maximization has emerged as a powerful learning objective for unsupervised representation learning obtaining state-of-the-art performance in applications such as object recognition, speech recognition, and reinforcement learning.
In this work we explore a new approach for robots to teach themselves about the world simply by observing it.
While representations are learned from an unlabeled collection of task-related videos, robot behaviors such as pouring are learned by watching a single 3rd-person demonstration by a human.
Ranked #3 on Video Alignment on UPenn Action
We present a method that is able to identify key intermediate steps of a task from only a handful of demonstration sequences, and automatically identify the most discriminative features for identifying these steps.
This paper presents experiments extending the work of Ba et al. (2014) on recurrent neural models for attention into less constrained visual environments, specifically fine-grained categorization on the Stanford Dogs data set.
We propose a deep convolutional neural network architecture codenamed "Inception", which was responsible for setting the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC 2014).
This integrated framework is the winner of the localization task of the ImageNet Large Scale Visual Recognition Challenge 2013 (ILSVRC2013) and obtained very competitive results for the detection and classifications tasks.
Ranked #462 on Image Classification on ImageNet
We classify digits of real-world house numbers using convolutional neural networks (ConvNets).
We propose an unsupervised method for learning multi-stage hierarchies of sparse convolutional features.