no code implementations • 12 Apr 2024 • Jonathan D. Chang, Dhruv Sreenivas, Yingbing Huang, Kianté Brantley, Wen Sun
In the weighted replay buffer, the contribution of the data from older policies are properly discounted with the weight computed based on the boosting framework.
no code implementations • 18 Mar 2022 • Shachi Deshpande, Kaiwen Wang, Dhruv Sreenivas, Zheng Li, Volodymyr Kuleshov
Oftentimes, the confounders are unobserved, but we have access to large amounts of additional unstructured data (images, text) that contain valuable proxy signal about the missing confounders.
1 code implementation • NeurIPS 2021 • Jonathan D. Chang, Masatoshi Uehara, Dhruv Sreenivas, Rahul Kidambi, Wen Sun
Instead, the learner is presented with a static offline dataset of state-action-next state transition triples from a potentially less proficient behavior policy.
1 code implementation • NeurIPS 2021 • Jonathan Daniel Chang, Masatoshi Uehara, Dhruv Sreenivas, Rahul Kidambi, Wen Sun
Instead, the learner is presented with a static offline dataset of state-action-next state triples from a potentially less proficient behavior policy.