Large-scale pretraining of visual representations has led to state-of-the-art performance on a range of benchmark computer vision tasks, yet the benefits of these techniques at extreme scale in complex production systems has been relatively unexplored.
Ranked #21 on Image Classification on ObjectNet (using extra training data)
The Vision Transformer was the first major attempt to apply a pure transformer model directly to images as input, demonstrating that as compared to convolutional networks, transformer-based architectures can achieve competitive results on benchmark classification tasks.
Related Pins is the Web-scale recommender system that powers over 40% of user engagement on Pinterest.
Over the past three years Pinterest has experimented with several visual search and recommendation services, including Related Pins (2014), Similar Looks (2015), Flashlight (2016) and Lens (2017).
This paper presents Pinterest Related Pins, an item-to-item recommendation system that combines collaborative filtering with content-based ranking.
We demonstrate that, with the availability of distributed computation platforms such as Amazon Web Services and open-source tools, it is possible for a small engineering team to build, launch and maintain a cost-effective, large-scale visual search system with widely available tools.