We propose a deep convolutional neural network architecture codenamed "Inception", which was responsible for setting the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC 2014).
On the widely used Labeled Faces in the Wild (LFW) dataset, our system achieves a new record accuracy of 99. 63%.
#3 best model for Face Verification on Labeled Faces in the Wild
Our approach leverages datasets of images and their sentence descriptions to learn about the inter-modal correspondences between language and visual data.
We propose a novel paradigm for evaluating image descriptions that uses human consensus.
The current leading approaches for semantic segmentation exploit shape information by extracting CNN features from masked image regions.
Correlation clustering, or multicut partitioning, is widely used in image segmentation for partitioning an undirected graph or image with positive and negative edge weights such that the sum of cut edge weights is minimized.
Here we show a related result: it is easy to produce images that are completely unrecognizable to humans, but that state-of-the-art DNNs believe to be recognizable objects with 99. 99% confidence (e. g. labeling with certainty that white noise static is a lion).
Deformable part models (DPMs) and convolutional neural networks (CNNs) are two widely used tools for visual recognition.