Search Results for author: Jiaxing Song

Found 2 papers, 0 papers with code

Optimal Mixture Weights for Off-Policy Evaluation with Multiple Behavior Policies

no code implementations29 Nov 2020 Jinlin Lai, Lixin Zou, Jiaxing Song

Off-policy evaluation is a key component of reinforcement learning which evaluates a target policy with offline data collected from behavior policies.

Off-policy evaluation Recommendation Systems +2

Reinforcement Learning to Optimize Long-term User Engagement in Recommender Systems

no code implementations13 Feb 2019 Lixin Zou, Long Xia, Zhuoye Ding, Jiaxing Song, Weidong Liu, Dawei Yin

Though reinforcement learning~(RL) naturally fits the problem of maximizing the long term rewards, applying RL to optimize long-term user engagement is still facing challenges: user behaviors are versatile and difficult to model, which typically consists of both instant feedback~(e. g. clicks, ordering) and delayed feedback~(e. g. dwell time, revisit); in addition, performing effective off-policy learning is still immature, especially when combining bootstrapping and function approximation.

Recommendation Systems reinforcement-learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.