We propose a deep convolutional neural network architecture codenamed "Inception", which was responsible for setting the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC 2014).
On the widely used Labeled Faces in the Wild (LFW) dataset, our system achieves a new record accuracy of 99. 63%.
#3 best model for Face Verification on Labeled Faces in the Wild
Our approach leverages datasets of images and their sentence descriptions to learn about the inter-modal correspondences between language and visual data.
We propose a novel paradigm for evaluating image descriptions that uses human consensus.
The current leading approaches for semantic segmentation exploit shape information by extracting CNN features from masked image regions.
In this paper, we propose an effective feature representation called Local Maximal Occurrence (LOMO), and a subspace and metric learning method called Cross-view Quadratic Discriminant Analysis (XQDA).
#30 best model for Person Re-Identification on DukeMTMC-reID
Correlation clustering, or multicut partitioning, is widely used in image segmentation for partitioning an undirected graph or image with positive and negative edge weights such that the sum of cut edge weights is minimized.
Here we show a related result: it is easy to produce images that are completely unrecognizable to humans, but that state-of-the-art DNNs believe to be recognizable objects with 99. 99% confidence (e. g. labeling with certainty that white noise static is a lion).
Convolutional networks are powerful visual models that yield hierarchies of features.
#8 best model for Semantic Segmentation on COCO-Stuff test