In this work, we develop linear bandit algorithms that automatically adapt to different environments.
We study infinite-horizon discounted two-player zero-sum Markov games, and develop a decentralized algorithm that provably converges to the set of Nash equilibria under self-play.
Specifically, for OMWU in bilinear games over the simplex, we show that when the equilibrium is unique, linear last-iterate convergence is achieved with a learning rate whose value is set to a universal constant, improving the result of (Daskalakis & Panageas, 2019b) under the same assumption.
We develop a new approach to obtaining high probability regret bounds for online learning with bandit feedback against an adaptive adversary.
We study small-loss bounds for adversarial multi-armed bandits with graph feedback, that is, adaptive regret bounds that depend on the loss of the best arm or related quantities, instead of the total number of rounds.
Robustness of convolutional neural networks (CNNs) has gained in importance on account of adversarial examples, i. e., inputs added as well-designed perturbations that are imperceptible to humans but can cause the model to predict incorrectly.
Our results show that the dimensions of different categories are close to each other and decline quickly along the convolutional layers and fully connected layers.
Toward a deeper understanding on the inner work of deep neural networks, we investigate CNN (convolutional neural network) using DCN (deconvolutional network) and randomization technique, and gain new insights for the intrinsic property of this network architecture.