no code implementations • 17 Feb 2023 • Vasilis Syrgkanis, Ruohan Zhan
Our goal is to be able to evaluate counterfactual adaptive policies after data collection and to estimate structural parameters such as dynamic treatment effects, which can be used for credit assignment (e. g. what was the effect of the first period action on the final outcome).
1 code implementation • 3 Feb 2023 • Qingpeng Cai, Zhenghai Xue, Chi Zhang, Wanqi Xue, Shuchang Liu, Ruohan Zhan, Xueliang Wang, Tianyou Zuo, Wentao Xie, Dong Zheng, Peng Jiang, Kun Gai
One the one hand, the platforms aims at optimizing the users' cumulative watch time (main goal) in long term, which can be effectively optimized by Reinforcement Learning.
no code implementations • 13 Jun 2022 • Ruohan Zhan, Changhua Pei, Qiang Su, Jianfeng Wen, Xueliang Wang, Guanyu Mu, Dong Zheng, Peng Jiang
We employ a causal graph illuminating that duration is a confounding factor that concurrently affects video exposure and watch-time prediction -- the first effect on video causes the bias issue and should be eliminated, while the second effect on watch time originates from video intrinsic characteristics and should be preserved.
1 code implementation • 1 Jun 2022 • Wanqi Xue, Qingpeng Cai, Ruohan Zhan, Dong Zheng, Peng Jiang, Kun Gai, Bo An
Meanwhile, reinforcement learning (RL) is widely regarded as a promising framework for optimizing long-term engagement in sequential recommendation.
no code implementations • 26 May 2022 • Qingpeng Cai, Ruohan Zhan, Chi Zhang, Jie Zheng, Guangwei Ding, Pinghua Gong, Dong Zheng, Peng Jiang
In this paper, we formulate the problem of short video recommendation as a constrained Markov Decision Process (MDP), where platforms want to optimize the main goal of user watch time in long term, with the constraint of accommodating the auxiliary responses of user interactions such as sharing/downloading videos.
1 code implementation • 3 Jun 2021 • Ruohan Zhan, Vitor Hadad, David A. Hirshberg, Susan Athey
In particular, when the pattern of treatment assignment in the collected data looks little like the pattern generated by the policy to be evaluated, the importance weights used in DR estimators explode, leading to excessive variance.
no code implementations • 6 May 2021 • Ruohan Zhan, Konstantina Christakopoulou, Ya Le, Jayden Ooi, Martin Mladenov, Alex Beutel, Craig Boutilier, Ed H. Chi, Minmin Chen
We then build a REINFORCE recommender agent, coined EcoAgent, to optimize a joint objective of user utility and the counterfactual utility lift of the provider associated with the recommended content, which we show to be equivalent to maximizing overall user utility and the utilities of all providers on the platform under some mild assumptions.
1 code implementation • 5 May 2021 • Ruohan Zhan, Zhimei Ren, Susan Athey, Zhengyuan Zhou
Learning optimal policies from historical data enables personalization in a wide variety of applications including healthcare, digital recommendations, and online education.
no code implementations • CVPR 2020 • Xiyang Luo, Ruohan Zhan, Huiwen Chang, Feng Yang, Peyman Milanfar
Watermarking is the process of embedding information into an image that can survive under distortions, while requiring the encoded image to have little or no perceptual difference from the original image.
1 code implementation • 7 Nov 2019 • Vitor Hadad, David A. Hirshberg, Ruohan Zhan, Stefan Wager, Susan Athey
In this context, typical estimators that use inverse propensity weighting to eliminate sampling bias can be problematic: their distributions become skewed and heavy-tailed as the propensity scores decay to zero.