In Federated Learning (FL) of click-through rate (CTR) prediction, users' data is not shared for privacy protection.
The set of used data augmentations is of crucial importance for the quality of the learned feature representation.
We also define a new forgetting measure for class-incremental learning, and see that forgetting is not the principal cause of low performance.
Session-based recommenders, used for making predictions out of users' uninterrupted sequences of actions, are attractive for many applications.
We call our method Recurrent Attention to Transient Tasks (RATT), and also show how to adapt continual learning approaches based on weight egularization and knowledge distillation to recurrent continual learning problems.
The vast majority of methods have studied this scenario for classification networks, where for each new task the classification layer of the network must be augmented with additional weights to make room for the newly added classes.