ICCV 2017

Mask R-CNN

ICCV 2017 tensorflow/models

Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance.

HUMAN PART SEGMENTATION INSTANCE SEGMENTATION KEYPOINT DETECTION MULTI-HUMAN PARSING NUCLEAR SEGMENTATION OBJECT DETECTION SEMANTIC SEGMENTATION

Focal Loss for Dense Object Detection

ICCV 2017 tensorflow/models

Our novel Focal Loss focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training.

DENSE OBJECT DETECTION

Learning to Reason: End-to-End Module Networks for Visual Question Answering

ICCV 2017 tensorflow/models

Natural language questions are inherently compositional, and many are most easily answered by reasoning about their decomposition into modular sub-problems.

VISUAL QUESTION ANSWERING

Large-Scale Image Retrieval with Attentive Deep Local Features

ICCV 2017 tensorflow/models

We propose an attentive local feature descriptor suitable for large-scale image retrieval, referred to as DELF (DEep Local Feature).

IMAGE RETRIEVAL

Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks

ICCV 2017 tensorflow/models

Image-to-image translation is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image using a training set of aligned image pairs.

MULTIMODAL UNSUPERVISED IMAGE-TO-IMAGE TRANSLATION STYLE TRANSFER UNSUPERVISED IMAGE-TO-IMAGE TRANSLATION

DualGAN: Unsupervised Dual Learning for Image-to-Image Translation

ICCV 2017 eriklindernoren/Keras-GAN

Depending on the task complexity, thousands to millions of labeled image pairs are needed to train a conditional GAN.

IMAGE-TO-IMAGE TRANSLATION

Least Squares Generative Adversarial Networks

ICCV 2017 eriklindernoren/Keras-GAN

To overcome such a problem, we propose in this paper the Least Squares Generative Adversarial Networks (LSGANs) which adopt the least squares loss function for the discriminator.

Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN Regression

ICCV 2017 AaronJackson/vrn

Our CNN works with just a single 2D facial image, does not require accurate alignment nor establishes dense correspondence between images, works for arbitrary facial poses and expressions, and can be used to reconstruct the whole 3D facial geometry (including the non-visible parts of the face) bypassing the construction (during training) and fitting (during testing) of a 3D Morphable Model.

3D FACE RECONSTRUCTION FACE ALIGNMENT

RMPE: Regional Multi-person Pose Estimation

ICCV 2017 MVIG-SJTU/AlphaPose

In this paper, we propose a novel regional multi-person pose estimation (RMPE) framework to facilitate pose estimation in the presence of inaccurate human bounding boxes.

HUMAN DETECTION MULTI-PERSON POSE ESTIMATION

How far are we from solving the 2D & 3D Face Alignment problem? (and a dataset of 230,000 3D facial landmarks)

ICCV 2017 1adrianb/face-alignment

To this end, we make the following 5 contributions: (a) we construct, for the first time, a very strong baseline by combining a state-of-the-art architecture for landmark localization with a state-of-the-art residual block, train it on a very large yet synthetically expanded 2D facial landmark dataset and finally evaluate it on all other 2D facial landmark datasets.

 SOTA for Face Alignment on 300W (AUC0.07 metric )

FACE ALIGNMENT