With increasing world population and expanded use of forests as cohabited regions, interactions and conflicts with wildlife are increasing, leading to large-scale loss of lives (animal and human) and livelihoods (economic).
Our second major contribution is evaluation of our RMAB system in collaboration with an NGO, via a real-world service quality improvement study.
We introduce a novel decomposed GP regression to incorporate the subgroup decomposed feedback.
To make RMABs more useful in settings with uncertain dynamics: (i) We introduce the Robust RMAB problem and develop solutions for a minimax regret objective when transitions are given by interval uncertainties; (ii) We develop a double oracle algorithm for solving Robust RMABs and demonstrate its effectiveness on three experimental domains; (iii) To enable our double oracle approach, we introduce RMABPPO, a novel deep reinforcement learning algorithm for solving RMABs.
Multi-action restless multi-armed bandits (RMABs) are a powerful framework for constrained resource allocation in which $N$ independent processes are managed.
We formulate the problem as a game between the defender and nature who controls the parameter values of the adversarial behavior and design an algorithm MIRROR to find a robust policy.
Empirical results show that our method achieves influence as high as the state-of-the-art methods for contingency-aware IM, while having negligible runtime at test phase.
In the predict-then-optimize framework, the objective is to train a predictive model, mapping from environment features to parameters of an optimization problem, which maximizes decision quality when the optimization is subsequently solved.
In many public health settings, it is important for patients to adhere to health programs, such as taking medications and periodic health checks.
Restless Multi-Armed Bandits (RMABs) have been popularly used to model limited resource allocation problems.
no code implementations • 7 Mar 2021 • Siddharth Nishtala, Lovish Madaan, Aditya Mate, Harshavardhan Kamarthi, Anirudh Grama, Divy Thakkar, Dhyanesh Narayanan, Suresh Chaudhary, Neha Madhiwalla, Ramesh Padmanabhan, Aparna Hegde, Pradeep Varakantham, Balaraman Ravindran, Milind Tambe
India has a maternal mortality ratio of 113 and child mortality ratio of 2830 per 100, 000 live births.
However, given the limited number of health workers, only a small subset of the population can be visited in any given time period.
We therefore first propose a novel GSG model that combines defender allocation, patrolling, real-time drone notification to human patrollers, and drones sending warning signals to attackers.
Decision Making Multiagent Systems
Dams impact downstream river dynamics through flow regulation and disruption of upstream-downstream linkages.
Our main contributions are as follows: (i) Building on the Whittle index technique for RMABs, we derive conditions under which the Collapsing Bandits problem is indexable.
To ensure under-resourced parks have access to meaningful poaching predictions, we introduce the use of publicly available remote sensing data to extract features for parks.
In this work, we define and test a data collection diligence score.
Conservation efforts in green security domains to protect wildlife and forests are constrained by the limited availability of defenders (i. e., patrollers), who must patrol vast areas to protect from attackers (e. g., poachers or illegal loggers).
For example, case counts may be sparse when only a small fraction of infections are caught by a testing program.
(ii) We exploit the optimality of threshold policies to build fast algorithms for computing the Whittle index, including a closed-form.
Solving optimization problems with unknown parameters often requires learning a predictive model to predict the values of the unknown parameters and then solving the problem using these values.
Under this framework, the trade-off between fairness and efficiency can be controlled by a single inequality aversion design parameter.
no code implementations • 13 Jun 2020 • Siddharth Nishtala, Harshavardhan Kamarthi, Divy Thakkar, Dhyanesh Narayanan, Anirudh Grama, Aparna Hegde, Ramesh Padmanabhan, Neha Madhiwalla, Suresh Chaudhary, Balaraman Ravindran, Milind Tambe
India accounts for 11% of maternal deaths globally where a woman dies in childbirth every fifteen minutes.
With the maturing of AI and multiagent systems research, we have a tremendous opportunity to direct these advances towards addressing complex societal problems.
To solve the online problem with a hard bound on risk, we formulate it as a Reinforcement Learning (RL) problem with constraints on the action space (hard bound on risk).
It has been successfully applied to several limited combinatorial problem classes, such as those that can be expressed as linear programs (LP), and submodular optimization.
A serious challenge when finding influential actors in real-world social networks is the lack of knowledge about the structure of the underlying network.
However, graphs or related attributes are often only partially observed, introducing learning problems such as link prediction which must be solved prior to optimization.
1 code implementation • 8 Mar 2019 • Lily Xu, Shahrzad Gholami, Sara Mc Carthy, Bistra Dilkina, Andrew Plumptre, Milind Tambe, Rohit Singh, Mustapha Nsubuga, Joshua Mabonga, Margaret Driciru, Fred Wanyama, Aggrey Rwetsiba, Tom Okello, Eric Enyel
We evaluate our approach on real-world historical poaching data from Murchison Falls and Queen Elizabeth National Parks in Uganda and, for the first time, Srepok Wildlife Sanctuary in Cambodia.
Stackelberg security games are a critical tool for maximizing the utility of limited defense resources to protect important targets from an intelligent adversary.
Influence maximization is a widely used model for information dissemination in social networks.
Computer Science and Game Theory Social and Information Networks
Digital Adherence Technologies (DATs) are an increasingly popular method for verifying patient adherence to many medications.
These components are typically approached separately: a machine learning model is first trained via a measure of predictive accuracy, and then its predictions are used as input into an optimization algorithm which produces a decision.
To mitigate this issue, we propose to design entropy-maximizing defending strategies for spatio-temporal security games, which frequently suffer from CoC.
To pilot test an artificial intelligence (AI) algorithm that selects peer change agents (PCA) to disseminate HIV testing messaging in a population of homeless youth.
This paper presents HEALER, a software agent that recommends sequential intervention plans for use by homeless shelters, who organize these interventions to raise awareness about HIV among homeless youth.
We provide four main contributions: (1) a PAC model of learning adversary response functions in SSGs; (2) PAC-model analysis of the learning of key, existing bounded rationality models in SSGs; (3) an entirely new approach to adversary modeling based on a non-parametric class of response functions with PAC-model analysis and (4) identification of conditions under which computing the best defender strategy against the learned adversary behavior is indeed the optimal strategy.
Our experiments confirm the necessity of handling information leakage and the advantage of our algorithms.