MultiBiSage can capture the graph structure of multiple bipartite graphs to learn high-quality pin embeddings.
Sequential models have become increasingly popular in powering personalized recommendation systems over the past several years.
Large-scale pretraining of visual representations has led to state-of-the-art performance on a range of benchmark computer vision tasks, yet the benefits of these techniques at extreme scale in complex production systems has been relatively unexplored.
The Vision Transformer was the first major attempt to apply a pure transformer model directly to images as input, demonstrating that as compared to convolutional networks, transformer-based architectures can achieve competitive results on benchmark classification tasks.
As online content becomes ever more visual, the demand for searching by visual queries grows correspondingly stronger.
The solution we present not only allows us to train for multiple application objectives in a single deep neural network architecture, but takes advantage of correlated information in the combination of all training data from each application to generate a unified embedding that outperforms all specialized embeddings previously deployed for each product.
Deep metric learning aims to learn a function mapping image pixels to embedding feature vectors that model the similarity between images.
Ranked #3 on Image Retrieval on CARS196
Over the past three years Pinterest has experimented with several visual search and recommendation services, including Related Pins (2014), Similar Looks (2015), Flashlight (2016) and Lens (2017).
We demonstrate that, with the availability of distributed computation platforms such as Amazon Web Services and open-source tools, it is possible for a small engineering team to build, launch and maintain a cost-effective, large-scale visual search system with widely available tools.