In this paper, We propose a novel CTR Framework named ContextNet that implicitly models high-order feature interactions by dynamically refining each feature's embedding according to the input context.
For example, the continuous features are usually transformed to the power forms by adding a new feature to allow it to easily form non-linear functions of the feature.
We also turn the feed-forward layer in DNN model into a mixture of addictive and multiplicative feature interactions by proposing MaskBlock in this paper.
Ranked #1 on Click-Through Rate Prediction on Criteo
Our proposed model uses the pre-trained Transformer as the base classifier to choose harder training sets to fine-tune and gains the benefits of both the pre-training language knowledge and boosting ensemble in NLP tasks.
Inspired by these observations, we propose a novel model named GateNet which introduces either the feature embedding gate or the hidden gate to the embedding layer or hidden layers of DNN CTR models, respectively.
Normalization has become one of the most fundamental components in many deep neural networks for machine learning tasks while deep neural network has also been widely used in CTR estimation field.
In this paper, a new model named FiBiNET as an abbreviation for Feature Importance and Bilinear feature Interaction NETwork is proposed to dynamically learn the feature importance and fine-grained feature interactions.
Ranked #6 on Click-Through Rate Prediction on Criteo
Although some CTR model such as Attentional Factorization Machine (AFM) has been proposed to model the weight of second order interaction features, we posit the evaluation of feature importance before explicit feature interaction procedure is also important for CTR prediction tasks because the model can learn to selectively highlight the informative features and suppress less useful ones if the task has many input features.
Ranked #5 on Click-Through Rate Prediction on Criteo
Recurrent Neural Networks have achieved state-of-the-art results for many problems in NLP and two most popular RNN architectures are Tail Model and Pooling Model.
To solve this problem, we propose to learn a new classifier by adding an adaptation function to the base classifier, and update the adaptation function parameter according to the streaming data samples.