We show that unlike prior work on multi-defender security games, the introduction of schedules can cause non-existence of equilibrium even under rather restricted environments.
Traffic simulators are used to generate data for learning in intelligent transportation systems (ITSs).
Landmines remain a threat to war-affected communities for years after conflicts have ended, partly due to the laborious nature of demining tasks.
Agents built with large language models (LLMs) have recently achieved great advancements.
In this work, we present a novel subgame curriculum learning framework for zero-sum games.
To address these limitations, we present a novel cascaded motion diffusion model, DiffDance, designed for high-resolution, long-form dance generation.
1 code implementation • 30 Apr 2023 • Sedrick Scott Keh, Zheyuan Ryan Shi, David J. Patterson, Nirmal Bhagabati, Karun Dewan, Areendran Gopala, Pablo Izquierdo, Debojyoti Mallick, Ambika Sharma, Pooja Shrestha, Fei Fang
We introduce NewsPanda, a toolkit which automatically detects and analyzes online articles related to environmental conservation and infrastructure construction.
At the bottom level, it learns latent representations for each agent, given the global latent representations from the top level.
We aim to understand how people assess human likeness in navigation produced by people and artificially intelligent (AI) agents in a video game.
In this paper, we propose a cross-domain recommendation method: Self-supervised Interest Transfer Network (SITN), which can effectively transfer invariant knowledge between domains via prototypical contrastive learning.
In contrast to previous work, the proposed procedure aims to approximate the relevant network interference patterns.
We believe our dataset, benchmark model, and evaluation metric will boost the development of video background music generation.
This allows the student to self-reflect on what it has learned, enabling advice generalization and leading to improved sample efficiency and learning performance - even in environments where the teacher is sub-optimal.
Curriculum Reinforcement Learning (CRL) aims to create a sequence of tasks, starting from easy ones and gradually learning towards difficult tasks.
we summarize three practical challenges which are not well solved for multi-scenario modeling: (1) Lacking of fine-grained and decoupled information transfer controls among multiple scenarios.
Therefore, it is crucial to perform cross-domain CTR prediction to transfer knowledge from large domains to small domains to alleviate the data sparsity issue.
The performance of these detection algorithms can be taken as a baseline for future research on detecting malicious bidding.
Traffic signal control (TSC) is a high-stakes domain that is growing in importance as traffic volume grows globally.
The first algorithm, IVIPER, extends VIPER, a recent method for single-agent interpretable RL, to the multi-agent setting.
It seems likely that these patterns are shaped by the environment a speaker is exposed to in complex ways.
Preventing poaching through ranger patrols protects endangered wildlife, directly contributing to the UN Sustainable Development Goal 15 of life on land.
To this end, we characterize card and game features for DouDizhu to represent the perfect and imperfect information.
In this paper, we introduce a novel hierarchical formulation of robust RL - a general-sum Stackelberg game model called RRL-Stack - to formalize the sequential nature and provide extra flexibility for robust training.
In this survey, we propose a novel taxonomy for organizing the XRL literature that prioritizes the RL setting.
Unlike commercial ridesharing, non-commercial peer-to-peer (P2P) ridesharing has been subject to limited research -- although it can promote viable solutions in non-urban communities.
Many scientific conferences employ a two-phase paper review process, where some papers are assigned additional reviewers after the initial reviews are submitted.
We formulate the problem as a game between the defender and nature who controls the parameter values of the adversarial behavior and design an algorithm MIRROR to find a robust policy.
Current deep learning models often achieve excellent results on benchmark image-to-text datasets but fail to generate texts that are useful in practice.
We propose a simple, general and effective technique, Reward Randomization for discovering diverse strategic policies in complex multi-agent games.
Because of this decision tree equivalence, any function approximator can be used during training, including a neural network, while yielding a decision tree policy for the base MDP.
Conservation efforts in green security domains to protect wildlife and forests are constrained by the limited availability of defenders (i. e., patrollers), who must patrol vast areas to protect from attackers (e. g., poachers or illegal loggers).
In this paper, we introduce bandit data-driven optimization, the first iterative prediction-prescription framework to address these pain points.
Critical as is to improve the online shopping experience for customers and merchants, how to find a proper approach for user intent prediction are paid great attention in both industry and academia.
We further consider the problem of restricting the joint probability that certain suspect pairs of reviewers are assigned to certain papers, and show that this problem is NP-hard for arbitrary constraints on these joint probabilities but efficiently solvable for a practical special case.
In multi-agent games, the complexity of the environment can grow exponentially as the number of agents increases, so it is particularly challenging to learn good policies when the agent population is large.
Artificial intelligence for social good (AI4SG) is a research theme that aims to use and advance artificial intelligence to address societal issues and improve the well-being of the world.
With the maturing of AI and multiagent systems research, we have a tremendous opportunity to direct these advances towards addressing complex societal problems.
We show that a regret minimizer can be designed for a scaled extension of any two convex sets, and that from the decomposition we then obtain a global regret minimizer.
Although existing works formulate this problem into a centralized learning with decentralized execution framework, which avoids the non-stationary problem in training, their decentralized execution paradigm limits the agents' capability to coordinate.
Transportation service providers that dispatch drivers and vehicles to riders start to support both on-demand ride requests posted in real time and rides scheduled in advance, leading to new challenges which, to the best of our knowledge, have not been addressed by existing works.
In order to formally reason about deception, we introduce the feature deception problem (FDP), a domain-independent model and present a learning and planning framework for finding the optimal deception strategy, taking into account the adversary's preferences which are initially unknown to the defender.
With the recent advances in solving large, zero-sum extensive form games, there is a growing interest in the inverse problem of inferring underlying game parameters given only access to agent actions.
Cyber adversaries have increasingly leveraged social engineering attacks to breach large organizations and threaten the well-being of today's online users.
Green Security Games (GSGs) have been proposed and applied to optimize patrols conducted by law enforcement agencies in green security domains such as combating poaching, illegal logging and overfishing.
In this pilot study, we investigate (1) in what way a robot can express a certain mood to influence a human's decision making behavioral model; (2) how and to what extent the human will be influenced in a game theoretic setting.
Although recent work in AI has made great progress in solving large, zero-sum, extensive-form games, the underlying assumption in most past work is that the parameters of the game itself are known to the agents.
We study Stackelberg Security Games where the defender, in addition to allocating defensive resources to protect targets from the attacker, can strategically manipulate the attacker's payoff under budget constraints in weighted L^p-norm form regarding the amount of change.