1 code implementation • 4 Jan 2022 • Yun Li, Yixiu Wang, Yifu Chen, Kaixun Hua, Jiayang Ren, Ghazaleh Mozafari, Qiugang Lu, Yankai Cao
The design procedure of the proposed scheme consists of two sequential processes: (1) the SL process, in which we first run a simulation with an MPC embedding a low-fidelity battery model to generate a training data set, and then, based on the generated data set, we optimize a DNN-approximated policy using SL algorithms; and (2) the RL process, in which we utilize RL algorithms to improve the performance of the DNN-approximated policy by balancing short-term economic incentives and long-term battery degradation.