We present a method for Temporal Difference (TD) learning that addresses
several challenges faced by robots learning to navigate in a marine
environment. For improved data efficiency, our method reduces TD updates to
Gaussian Process regression...
To make predictions amenable to online settings,
we introduce a sparse approximation with improved quality over current
rejection-based sparse methods. We derive the predictive value function
posterior and use the moments to obtain a new algorithm for model-free policy
evaluation, SPGP-SARSA. With simple changes, we show SPGP-SARSA can be reduced
to a model-based equivalent, SPGP-TD. We perform comprehensive simulation
studies and also conduct physical learning trials with an underwater robot. Our
results show SPGP-SARSA can outperform the state-of-the-art sparse method,
replicate the prediction quality of its exact counterpart, and be applied to
solve underwater navigation tasks.