no code implementations • 7 Jun 2023 • Gangyi Zhang, Chongming Gao, Wenqiang Lei, Xiaojie Guo, Shijun Li, Hongshen Chen, Zhuozhi Ding, Sulong Xu, Lingfei Wu
In the VPMCR setting, we propose a solution called Adaptive Vague Preference Policy Learning (AVPPL), which consists of two components: Ambiguity-aware Soft Estimation (ASE) and Dynamism-aware Policy Learning (DPL).