Field-aware Variational Autoencoders for Billion-scale User Representation Learning

User representation learning plays an essential role in Internet applications, such as recommender systems. Though developing a universal embedding for users is demanding, only few previous works are conducted in an unsupervised learning manner. The unsupervised method is however important as most of the user data is collected without specific labels. In this paper, we harness the unsupervised advantages of Variational Autoencoders (VAEs), to learn user representation from large-scale, high-dimensional, and multi-field data. We extend the traditional VAE by developing Field-aware VAE (FVAE) to model each feature field with an independent multinomial distribution. To reduce the complexity in training, we employ dynamic hash tables, a batched softmax function, and a feature sampling strategy to improve the efficiency of our method. We conduct experiments on multiple datasets, showing that the proposed FVAE significantly outperforms baselines on several tasks of data reconstruction and tag prediction. Moreover, we deploy the proposed method in real-world applications and conduct online A/B tests in a look-alike system. Results demonstrate that our method can effectively improve the quality of recommendation. To the best of our knowledge, it is the first time that the VAE-based user representation learning model is applied to real-world recommender systems.

PDF

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods