We argue that using a single embedding vector to represent an image, as commonly practiced, is not sufficient to rank both relevant seen and unseen labels accurately.
Ranked #2 on Multi-label zero-shot learning on Open Images V4
ImageNet-1K serves as the primary dataset for pretraining deep learning models for computer vision tasks.
Ranked #1 on Fine-Grained Image Classification on Stanford Cars (using extra training data)
Methods that reach State of the Art (SotA) accuracy, usually make use of 3D convolution layers as a way to abstract the temporal information from video frames.
Ranked #17 on Action Recognition on UCF101 (using extra training data)
Realistic use of neural networks often requires adhering to multiple constraints on latency, energy and memory among others.
Ranked #6 on Neural Architecture Search on ImageNet
We show that convergence to a global minimum is guaranteed for networks with widths quadratic in the sample size and linear in their depth at a time logarithmic in both.
In this paper, we introduce a novel asymmetric loss ("ASL"), which operates differently on positive and negative samples.
Ranked #3 on Multi-Label Classification on NUS-WIDE
Furthermore, we show the representation power of our ReID network via SotA results on a different task of multi-object tracking.
Ranked #9 on Person Re-Identification on Market-1501 (Rank-1 metric)
This paper introduces a novel optimization method for differential neural architecture search, based on the theory of prediction with expert advice.
In this paper, we propose a differentiable search space that allows the annealing of architecture weights, while gradually pruning inferior operations.
Our approach considers an "objective-space" as the space of all linear combinations of two objectives, and the Dynamic-Net is emulating the traversing of this objective-space at test-time, without any further training.
In this paper we propose a novel method that makes an explicit use of the discriminator in test-time, in a feedback manner in order to improve the generator results.
This paper reports on the 2018 PIRM challenge on perceptual super-resolution (SR), held in conjunction with the Perceptual Image Restoration and Manipulation (PIRM) workshop at ECCV 2018.
Maintaining natural image statistics is a crucial factor in restoration and generation of realistic looking images.
Feed-forward CNNs trained for image transformation problems rely on loss functions that measure the similarity between the generated image and a target image.
We propose a novel measure for template matching named Deformable Diversity Similarity -- based on the diversity of feature matches between a target image window and the template.
We introduce RIANN (Ring Intersection Approximate Nearest Neighbor search), an algorithm for matching patches of a video to a set of reference patches in real-time.
In this paper, we show that the most commonly-used measures for evaluating both non-binary maps and binary maps do not always provide a reliable evaluation.
For example, the time each video frame is observed is a fraction of a second, while a still image can be viewed leisurely.
In this work we propose a crowdsourced method for acquisition of gaze direction data from a virtually unlimited number of participants, using a robust self-reporting mechanism (see Figure 1).
Social and Information Networks Human-Computer Interaction