no code implementations • 20 Dec 2023 • Guozhong Zheng, Weiran Cai, Guanxiao Qi, Jiqiang Zhang, Li Chen
We reveal that the population is able to reach the optimal allocation when individuals appreciate both the past experience and rewards in the future, and they are able to balance the exploitation of their Q-tables and the exploration by randomly acting.