We introduce an online popularity prediction and tracking task as a benchmark
task for reinforcement learning with a combinatorial, natural language action
space. A specified number of discussion threads predicted to be popular are
recommended, chosen from a fixed window of recent comments to track. Novel deep
reinforcement learning architectures are studied for effective modeling of the
value function associated with actions comprised of interdependent sub-actions.
The proposed model, which represents dependence between sub-actions through a
bi-directional LSTM, gives the best performance across different experimental
configurations and domains, and it also generalizes well with varying numbers
of recommendation requests.