Dental disease is one of the most common chronic diseases despite being largely preventable.
Online reinforcement learning (RL) algorithms are increasingly used to personalize digital interventions in the fields of mobile health and online education.
In the absence of randomized controlled and natural experiments, it is necessary to balance the distributions of (observable) covariates of the treated and control groups in order to obtain an unbiased estimate of a causal effect of interest; otherwise, a different effect size may be estimated, and incorrect recommendations may be given.
In batch reinforcement learning, there can be poorly explored state-action pairs resulting in poorly learned, inaccurate models and poorly performing associated policies.
Yet there is a lack of general methods for conducting statistical inference using more complex models on data collected with (contextual) bandit algorithms; for example, current methods cannot be used for valid inference on parameters in a logistic regression model for a binary reward.
For example, a mobile health (mHealth) application designed to increase physical activity must make contextually relevant suggestions to motivate users.
As bandit algorithms are increasingly utilized in scientific studies and industrial applications, there is an associated increasing need for reliable inference methods based on the resulting adaptively-collected data.
We present a parameter learning method for GLM emissions and survival model fitting, and present promising results on both synthetic data and an mHealth drug use dataset.
Increasing technological sophistication and widespread use of smartphones and wearable devices provide opportunities for innovative and highly personalized health interventions.