Data efficiency and robustness to task-irrelevant perturbations are long-standing challenges for deep reinforcement learning algorithms.
To autonomously navigate and plan interactions in real-world environments, robots require the ability to robustly perceive and map complex, unstructured surrounding scenes.
Our key contribution is the collection of a large-scale dataset consisting of 150K human-played games with a total of 800K visual question-answer pairs on 66K images.
To tackle this problem, we propose a simple but effective pattern mining-based method, called Object Location Mining (OLM), which exploits the advantages of data mining and feature representation of pre-trained convolutional neural networks (CNNs).
Given an egocentric video/images sequence acquired by the camera, our algorithm uses both the appearance extracted by means of a convolutional neural network and an object refill methodology that allows to discover objects even in case of small amount of object appearance in the collection of images.
Unsupervised object discovery in images involves uncovering recurring patterns that define objects and discriminates them against the background.
This paper is concerned with the training of recurrent neural networks as goal-oriented dialog agents using reinforcement learning.
We contribute to this research thread with two findings: (1) a study correlating a given level of noisily labels to the expected drop in accuracy, for two deep architectures, on two different types of noise, that clearly identifies GoogLeNet as a suitable architecture for learning from Web data; (2) a recipe for the creation of Web datasets with minimal noise and maximum visual variability, based on a visual and natural language processing concept expansion strategy.