To make RMABs more useful in settings with uncertain dynamics: (i) We introduce the Robust RMAB problem and develop solutions for a minimax regret objective when transitions are given by interval uncertainties; (ii) We develop a double oracle algorithm for solving Robust RMABs and demonstrate its effectiveness on three experimental domains; (iii) To enable our double oracle approach, we introduce RMABPPO, a novel deep reinforcement learning algorithm for solving RMABs.
Multi-action restless multi-armed bandits (RMABs) are a powerful framework for constrained resource allocation in which $N$ independent processes are managed.
We argue that AI for social good ought to be assessed by the communities that the AI system will impact, using as a guide the capabilities approach, a framework to measure the ability of different policies to improve human welfare equity.
(ii) We exploit the optimality of threshold policies to build fast algorithms for computing the Whittle index, including a closed-form.
Digital Adherence Technologies (DATs) are an increasingly popular method for verifying patient adherence to many medications.