Search Results for author: Ruohan Zhan

Found 10 papers, 5 papers with code

Post-Episodic Reinforcement Learning Inference

no code implementations17 Feb 2023 Vasilis Syrgkanis, Ruohan Zhan

Our goal is to be able to evaluate counterfactual adaptive policies after data collection and to estimate structural parameters such as dynamic treatment effects, which can be used for credit assignment (e. g. what was the effect of the first period action on the final outcome).

counterfactual Off-policy evaluation +1

Two-Stage Constrained Actor-Critic for Short Video Recommendation

1 code implementation3 Feb 2023 Qingpeng Cai, Zhenghai Xue, Chi Zhang, Wanqi Xue, Shuchang Liu, Ruohan Zhan, Xueliang Wang, Tianyou Zuo, Wentao Xie, Dong Zheng, Peng Jiang, Kun Gai

One the one hand, the platforms aims at optimizing the users' cumulative watch time (main goal) in long term, which can be effectively optimized by Reinforcement Learning.

Recommendation Systems reinforcement-learning +2

Deconfounding Duration Bias in Watch-time Prediction for Video Recommendation

no code implementations13 Jun 2022 Ruohan Zhan, Changhua Pei, Qiang Su, Jianfeng Wen, Xueliang Wang, Guanyu Mu, Dong Zheng, Peng Jiang

We employ a causal graph illuminating that duration is a confounding factor that concurrently affects video exposure and watch-time prediction -- the first effect on video causes the bias issue and should be eliminated, while the second effect on watch time originates from video intrinsic characteristics and should be preserved.

ResAct: Reinforcing Long-term Engagement in Sequential Recommendation with Residual Actor

1 code implementation1 Jun 2022 Wanqi Xue, Qingpeng Cai, Ruohan Zhan, Dong Zheng, Peng Jiang, Kun Gai, Bo An

Meanwhile, reinforcement learning (RL) is widely regarded as a promising framework for optimizing long-term engagement in sequential recommendation.

Reinforcement Learning (RL) Sequential Recommendation

Constrained Reinforcement Learning for Short Video Recommendation

no code implementations26 May 2022 Qingpeng Cai, Ruohan Zhan, Chi Zhang, Jie Zheng, Guangwei Ding, Pinghua Gong, Dong Zheng, Peng Jiang

In this paper, we formulate the problem of short video recommendation as a constrained Markov Decision Process (MDP), where platforms want to optimize the main goal of user watch time in long term, with the constraint of accommodating the auxiliary responses of user interactions such as sharing/downloading videos.

Recommendation Systems reinforcement-learning +1

Off-Policy Evaluation via Adaptive Weighting with Data from Contextual Bandits

1 code implementation3 Jun 2021 Ruohan Zhan, Vitor Hadad, David A. Hirshberg, Susan Athey

In particular, when the pattern of treatment assignment in the collected data looks little like the pattern generated by the policy to be evaluated, the importance weights used in DR estimators explode, leading to excessive variance.

Multi-Armed Bandits Off-policy evaluation

Towards Content Provider Aware Recommender Systems: A Simulation Study on the Interplay between User and Provider Utilities

no code implementations6 May 2021 Ruohan Zhan, Konstantina Christakopoulou, Ya Le, Jayden Ooi, Martin Mladenov, Alex Beutel, Craig Boutilier, Ed H. Chi, Minmin Chen

We then build a REINFORCE recommender agent, coined EcoAgent, to optimize a joint objective of user utility and the counterfactual utility lift of the provider associated with the recommended content, which we show to be equivalent to maximizing overall user utility and the utilities of all providers on the platform under some mild assumptions.

counterfactual Recommendation Systems

Policy Learning with Adaptively Collected Data

1 code implementation5 May 2021 Ruohan Zhan, Zhimei Ren, Susan Athey, Zhengyuan Zhou

Learning optimal policies from historical data enables personalization in a wide variety of applications including healthcare, digital recommendations, and online education.

Multi-Armed Bandits

Distortion Agnostic Deep Watermarking

no code implementations CVPR 2020 Xiyang Luo, Ruohan Zhan, Huiwen Chang, Feng Yang, Peyman Milanfar

Watermarking is the process of embedding information into an image that can survive under distortions, while requiring the encoded image to have little or no perceptual difference from the original image.

Confidence Intervals for Policy Evaluation in Adaptive Experiments

1 code implementation7 Nov 2019 Vitor Hadad, David A. Hirshberg, Ruohan Zhan, Stefan Wager, Susan Athey

In this context, typical estimators that use inverse propensity weighting to eliminate sampling bias can be problematic: their distributions become skewed and heavy-tailed as the propensity scores decay to zero.

Experimental Design Multi-Armed Bandits

Cannot find the paper you are looking for? You can Submit a new open access paper.