We introduce the task of local relighting, which changes a photograph of a scene by switching on and off the light sources that are visible within the image.
In this paper, by considering historical accident data and Street View images, we detail how to automatically predict the impact (increase or decrease) of urban interventions on accident incidence.
In this work, we present the Incidents1M Dataset, a large-scale multi-label dataset which contains 977, 088 images, with 43 incident and 49 place categories.
At the moment, urban mobility research and governmental initiatives are mostly focused on motor-related issues, e. g. the problems of congestion and pollution.
Several studies have shown the relevance of biosignals in driver stress recognition.
In this paper, we model the emotions evoked by videos in a different manner: instead of modeling the aggregated value we jointly model the emotions experienced by each viewer and the aggregated value using a multi-task learning approach.
This work revisits the ChaLearn First Impressions database, annotated for personality perception using pairwise comparisons via crowdsourcing.
This work explores facial expression bias as a security vulnerability of face recognition systems.
We start by hosting models online, and gather human feedback from real-time, open-ended conversations, which we then use to train and improve the models using offline reinforcement learning (RL).
We propose two face representations that are blind to facial expressions associated to emotional responses.
Second, we use a similar analytic method to analyze a generative adversarial network (GAN) model trained to generate scenes.
While most studies on social media are limited to text, images offer more information for understanding disaster and incident scenes.
In this paper we present EMOTIC, a dataset of images of people in a diverse set of natural situations, annotated with their apparent emotion.
Ranked #3 on Emotion Recognition in Context on EMOTIC (using extra training data)
This is a critical shortcoming for applying RL to real-world problems where collecting data is expensive, and models must be tested offline before being deployed to interact with the environment -- e. g. systems that learn from human interaction.
Most deep reinforcement learning (RL) systems are not able to learn effectively from off-policy data, especially if they cannot explore online in the environment.
To investigate the strengths of this novel metric and interactive evaluation in comparison to state-of-the-art metrics and human evaluation of static conversations, we perform extended experiments with a set of models, including several that make novel improvements to recent hierarchical dialog generation architectures through sentiment and semantic knowledge distillation on the utterance level.
In this paper we present the Emotions in Context Database (EMCO), a dataset of images containing people in context in non-controlled environments.
The rise of multi-million-item dataset initiatives has enabled data-hungry machine learning algorithms to reach near-human semantic classification at tasks such as object and scene recognition.
In this work, we revisit the global average pooling layer proposed in , and shed light on how it explicitly enables the convolutional neural network to have remarkable localization ability despite being trained on image-level labels.
Ranked #2 on Weakly-Supervised Object Localization on ILSVRC 2015
In this paper we propose to use the Winner Takes All hashing technique to speed up forward propagation and backward propagation in fully connected layers in convolutional neural networks.
With the success of new computational architectures for visual processing, such as convolutional neural networks (CNN) and access to image databases with millions of labeled examples (e. g., ImageNet, Places), the state of the art in computer vision is advancing rapidly.
Whereas the tremendous recent progress in object recognition tasks is due to the availability of large datasets like ImageNet and the rise of Convolutional Neural Networks (CNNs) for learning high-level features, performance at scene recognition has not attained the same level of success.
When learning a new concept, not all training examples may prove equally useful for training: some may have higher or lower training value than others.