Unsupervised Representation Learning of Player Behavioral Data with Confidence Guided Masking

Players of online games generate rich behavioral data during gaming. Based on these data, game developers can build a range of data science applications, such as bot detection and social recommendation, to improve the gaming experience. However, the development of such applications requires data cleansing, training sample labeling, feature engineering, and model development, which makes the use of such applications in small and medium-sized game studios still uncommon. While acquiring supervised learning data is costly, unlabeled behavioral logs are often continuously and automatically generated in games. Thus we resort to unsupervised representation learning of player behavioral data to optimize intelligent services in games. Behavioral data has many unique properties, including semantic complexity, excessive length, etc. A worth noting property within raw player behavioral data is that a lot of it is task-irrelevant. For these data characteristics, we introduce a BPE-enhanced compression method and propose a novel adaptive masking strategy called Masking by Token Confidence (MTC) for the Masked Language Modeling (MLM) pre-training task. MTC is designed to increase the masking probabilities of task-relevant tokens. Experiments on four downstream tasks and successful deployment in a world-renowned Massively Multiplayer Online Role-Playing Game (MMORPG) prove the effectiveness of the MTC strategy.

PDF Abstract


  Add Datasets introduced or used in this paper

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.