ICCV 2017

Focal Loss for Dense Object Detection

ICCV 2017 tensorflow/models

Our novel Focal Loss focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training.

DENSE OBJECT DETECTION

Learning to Reason: End-to-End Module Networks for Visual Question Answering

ICCV 2017 tensorflow/models

Natural language questions are inherently compositional, and many are most easily answered by reasoning about their decomposition into modular sub-problems.

VISUAL QUESTION ANSWERING

Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks

ICCV 2017 tensorflow/models

Image-to-image translation is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image using a training set of aligned image pairs.

MULTIMODAL UNSUPERVISED IMAGE-TO-IMAGE TRANSLATION STYLE TRANSFER UNSUPERVISED IMAGE-TO-IMAGE TRANSLATION

Large-Scale Image Retrieval with Attentive Deep Local Features

ICCV 2017 tensorflow/models

We propose an attentive local feature descriptor suitable for large-scale image retrieval, referred to as DELF (DEep Local Feature).

IMAGE RETRIEVAL

Mask R-CNN

ICCV 2017 tensorflow/models

Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance.

 SOTA for Instance Segmentation on Cityscapes test (using extra training data)

HUMAN PART SEGMENTATION INSTANCE SEGMENTATION KEYPOINT DETECTION MULTI-HUMAN PARSING NUCLEAR SEGMENTATION OBJECT DETECTION SEMANTIC SEGMENTATION

Least Squares Generative Adversarial Networks

ICCV 2017 eriklindernoren/Keras-GAN

To overcome such a problem, we propose in this paper the Least Squares Generative Adversarial Networks (LSGANs) which adopt the least squares loss function for the discriminator.

DualGAN: Unsupervised Dual Learning for Image-to-Image Translation

ICCV 2017 eriklindernoren/Keras-GAN

Depending on the task complexity, thousands to millions of labeled image pairs are needed to train a conditional GAN.

IMAGE-TO-IMAGE TRANSLATION

RMPE: Regional Multi-person Pose Estimation

ICCV 2017 MVIG-SJTU/AlphaPose

In this paper, we propose a novel regional multi-person pose estimation (RMPE) framework to facilitate pose estimation in the presence of inaccurate human bounding boxes.

HUMAN DETECTION MULTI-PERSON POSE ESTIMATION

Spatial Memory for Context Reasoning in Object Detection

ICCV 2017 endernewton/tf-faster-rcnn

On the other hand, modeling object-object relationships requires {\bf spatial} reasoning -- not only do we need a memory to store the spatial layout, but also a effective reasoning module to extract spatial patterns.

OBJECT DETECTION

Deformable Convolutional Networks

ICCV 2017 msracver/Deformable-ConvNets

Convolutional neural networks (CNNs) are inherently limited to model geometric transformations due to the fixed geometric structures in its building modules.

OBJECT DETECTION SEMANTIC SEGMENTATION