First, the risk of having non-causal knowledge is higher, as the shared MTL model needs to encode all knowledge from different tasks, and causal knowledge for one task could be potentially spurious to the other.
State-of-the-art MoE models use a trainable sparse gate to select a subset of the experts for each input example.
It is also very encouraging that our framework further improves head items and overall performance on top of the gains on tail items.
Embedding learning of categorical features (e. g. user/item IDs) is at the core of various recommendation models including matrix factorization and neural collaborative filtering.
Learning effective feature crosses is the key behind building recommender systems.
Ranked #3 on Click-Through Rate Prediction on Criteo
Despite deep neural network (DNN)'s impressive prediction performance in various domains, it is well known now that a set of DNN models trained with the same model specification and the same data can produce very different prediction results.
A delicate balance between multi-task generalization and multi-objective optimization is therefore needed for finding a better trade-off between efficiency and generalization.
Our online results also verify our hypothesis that our framework indeed improves model performance even more on slices that lack supervision.
We investigate the Plackett-Luce (PL) model based listwise learning-to-rank (LTR) on data with partitioned preference, where a set of items are sliced into ordered and disjoint partitions, but the ranking of items within a partition is unknown.
In this paper, we seek to learn highly compact embeddings for large-vocab sparse features in recommender systems (recsys).
In this paper, we introduce a large scale multi-objective ranking system for recommending what video to watch next on an industrial video sharing platform.
Recommender systems are one of the most pervasive applications of machine learning in industry, with many services using them to match users to products or information.
In this work, we propose a novel multi-task learning approach, Multi-gate Mixture-of-Experts (MMoE), which explicitly learns to model task relationships from data.
We study the problem of learning similarity functions over very large corpora using neural network embedding models.
1 code implementation • 8 Aug 2017 • Heng-Tze Cheng, Zakaria Haque, Lichan Hong, Mustafa Ispir, Clemens Mewald, Illia Polosukhin, Georgios Roumpos, D. Sculley, Jamie Smith, David Soergel, Yuan Tang, Philipp Tucker, Martin Wicke, Cassandra Xia, Jianwei Xie
Our focus is on simplifying cutting edge machine learning for practitioners in order to bring such technologies into production.
31 code implementations • 24 Jun 2016 • Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, Rohan Anil, Zakaria Haque, Lichan Hong, Vihan Jain, Xiaobing Liu, Hemal Shah
Memorization of feature interactions through a wide set of cross-product feature transformations are effective and interpretable, while generalization requires more feature engineering effort.
Ranked #2 on Click-Through Rate Prediction on Bing News