Field-aware Variational Autoencoders for Billion-scale User Representation Learning

IEEE 38th International Conference on Data Engineering (ICDE) 2022 · Ge Fan, Chaoyun Zhang, Junyang Chen, Baopu Li, Zenglin Xu, Yingjie Li, Luyu Peng, Zhiguo Gong ·

User representation learning plays an essential role in Internet applications, such as recommender systems. Though developing a universal embedding for users is demanding, only few previous works are conducted in an unsupervised learning manner. The unsupervised method is however important as most of the user data is collected without specific labels. In this paper, we harness the unsupervised advantages of Variational Autoencoders (VAEs), to learn user representation from large-scale, high-dimensional, and multi-field data. We extend the traditional VAE by developing Field-aware VAE (FVAE) to model each feature field with an independent multinomial distribution. To reduce the complexity in training, we employ dynamic hash tables, a batched softmax function, and a feature sampling strategy to improve the efficiency of our method. We conduct experiments on multiple datasets, showing that the proposed FVAE significantly outperforms baselines on several tasks of data reconstruction and tag prediction. Moreover, we deploy the proposed method in real-world applications and conduct online A/B tests in a look-alike system. Results demonstrate that our method can effectively improve the quality of recommendation. To the best of our knowledge, it is the first time that the VAE-based user representation learning model is applied to real-world recommender systems.

PDF