Optimal transport theory provides a useful tool to measure the differences between two distributions.
Coupling the high-fidelity generation capabilities of label-conditional image synthesis methods with the flexibility of unconditional generative models, we propose a semantic bottleneck GAN model for unconditional synthesis of complex scenes.
The Vision Transformer was the first major attempt to apply a pure transformer model directly to images as input, demonstrating that as compared to convolutional networks, transformer-based architectures can achieve competitive results on benchmark classification tasks.
In this paper, we present a conceptually simple and general yet novel framework for few-shot temporal activity detection based on proposal regression which detects the start and end time of the activities in untrimmed videos.
For the former, we use an unconditional progressive segmentation generation network that captures the distribution of realistic semantic scene layouts.
Ranked #1 on Image Generation on Cityscapes-5K 256x512
This paper addresses unsupervised domain adaptation, the setting where labeled training data is available on a source domain, but the goal is to have good performance on a target domain with only unlabeled data.
The solution we present not only allows us to train for multiple application objectives in a single deep neural network architecture, but takes advantage of correlated information in the combination of all training data from each application to generate a unified embedding that outperforms all specialized embeddings previously deployed for each product.
Without dense labels, as is the case when only detection labels are available in the source, transformations are learned using CycleGAN alignment.
Domain adaptation is critical for success in new, unseen environments.
Adversarial learning methods are a promising approach to training robust deep networks, and can generate complex samples across diverse domains.
Over the past three years Pinterest has experimented with several visual search and recommendation services, including Related Pins (2014), Similar Looks (2015), Flashlight (2016) and Lens (2017).
We propose a novel, more powerful combination of both distribution and pairwise image alignment, and remove the requirement for expensive annotation by using weakly aligned pairs of images in the source and target domains.
This paper presents Pinterest Related Pins, an item-to-item recommendation system that combines collaborative filtering with content-based ranking.
Recent reports suggest that a generic supervised deep CNN model trained on a large-scale dataset reduces, but does not remove, dataset bias.
Recent reports suggest that a generic supervised deep CNN model trained on a large-scale dataset reduces, but does not remove, dataset bias on a standard benchmark.
Ranked #6 on Domain Adaptation on Office-Caltech
A major challenge in scaling object detection is the difficulty of obtaining labeled images for large numbers of categories.
In other words, are deep CNNs trained on large amounts of labeled data as susceptible to dataset bias as previous methods have been shown to be?
We evaluate whether features extracted from the activation of a deep convolutional network trained in a fully supervised fashion on a large, fixed set of object recognition tasks can be re-purposed to novel generic tasks.