no code implementations • 31 Oct 2022 • Dongcui Diao, Hengshuai Yao, Bei Jiang
Recognizing and telling similar objects apart is even hard for human beings.
1 code implementation • 20 May 2022 • Xing Chen, Dongcui Diao, Hechang Chen, Hengshuai Yao, Haiyin Piao, Zhixiao Sun, Zhiwei Yang, Randy Goebel, Bei Jiang, Yi Chang
The popular Proximal Policy Optimization (PPO) algorithm approximates the solution in a clipped policy space.
no code implementations • NeurIPS 2009 • Hengshuai Yao, Shalabh Bhatnagar, Dongcui Diao, Richard S. Sutton, Csaba Szepesvári
We extend Dyna planning architecture for policy evaluation and control in two significant aspects.