Deep reinforcement learning for time series: playing idealized trading games

11 Mar 2018 · Xiang Gao ·

Deep Q-learning is investigated as an end-to-end solution to estimate the optimal strategies for acting on time series input. Experiments are conducted on two idealized trading games. 1) Univariate: the only input is a wave-like price time series, and 2) Bivariate: the input includes a random stepwise price time series and a noisy signal time series, which is positively correlated with future price changes. The Univariate game tests whether the agent can capture the underlying dynamics, and the Bivariate game tests whether the agent can utilize the hidden relation among the inputs. Stacked Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM) units, Convolutional Neural Network (CNN), and multi-layer perceptron (MLP) are used to model Q values. For both games, all agents successfully find a profitable strategy. The GRU-based agents show best overall performance in the Univariate game, while the MLP-based agents outperform others in the Bivariate game.

PDF Abstract