no code implementations • 18 Feb 2025 • Mengxiao Zhang, Yingfei Wang, Haipeng Luo
A recent work by Schlisselberg et al. (2024) studies a delay-as-payoff model for stochastic multi-armed bandits, where the payoff (either loss or reward) is delayed for a period that is proportional to the payoff itself.
no code implementations • 12 Feb 2025 • Yiping Liu, Mengxiao Zhang, Jiamou Liu, Song Yang
These marketplaces leverage model trading mechanisms to properly incentive data owners to contribute their data, and return a well performing ML model to the model consumers.
no code implementations • 31 May 2024 • Mengxiao Zhang, Yuheng Zhang, Haipeng Luo, Paul Mineiro
Interactive-Grounded Learning (IGL) [Xie et al., 2021] is a powerful framework in which a learner aims at maximizing unobservable rewards through interacting with an environment and observing reward-dependent feedback on the taken actions.
no code implementations • 31 May 2024 • Mengxiao Zhang, Ramiro Deo-Campo Vuong, Haipeng Luo
Specifically, in stochastic $N$-agent $K$-armed bandits, we develop an algorithm with $\widetilde{\mathcal{O}}\left(K^{\frac{2}{N}}T^{\frac{N-1}{N}}\right)$ regret and prove that the dependence on $T$ is tight, making it a sharp contrast to the $\sqrt{T}$-regret bounds of Hossain et al. [2021], Jones et al. [2023].
no code implementations • 12 Feb 2024 • Mengxiao Zhang, Haipeng Luo
Contextual multinomial logit (MNL) bandits capture many real-world assortment recommendation problems such as online retailing/advertising.
no code implementations • 12 Feb 2024 • Mengxiao Zhang, Yuheng Zhang, Haipeng Luo, Paul Mineiro
Bandits with feedback graphs are powerful online learning models that interpolate between the full information and classic bandit problems, capturing many real-life applications.
no code implementations • 8 Oct 2023 • Mengxiao Zhang, Haipeng Luo
We study online learning in contextual pay-per-click auctions where at each of the $T$ rounds, the learner receives some context along with a set of ads and needs to make an estimate on their click-through rate (CTR) in order to run a second-price pay-per-click auction.
no code implementations • 7 Mar 2023 • Mengxiao Zhang, Fernando Beltran, Jiamou Liu
Data pricing, as a key function of a data marketplace, demands quantifying the monetary value of data.
no code implementations • 30 Jan 2023 • Brendan Lucier, Sarath Pattathil, Aleksandrs Slivkins, Mengxiao Zhang
We study a game between autobidding algorithms that compete in an online advertising platform.
no code implementations • 23 Oct 2022 • Mengxiao Zhang, Shi Chen, Haipeng Luo, Yingfei Wang
Supply chain management (SCM) has been recognized as an important discipline with applications to many industries, where the two-echelon stochastic inventory model, involving one downstream retailer and one upstream supplier, plays a fundamental role for developing firms' SCM strategies.
no code implementations • 4 Oct 2022 • Haipeng Luo, Hanghang Tong, Mengxiao Zhang, Yuheng Zhang
For general strongly observable graphs, we develop an algorithm that achieves the optimal regret $\widetilde{\mathcal{O}}((\sum_{t=1}^T\alpha_t)^{1/2}+\max_{t\in[T]}\alpha_t)$ with high probability, where $\alpha_t$ is the independence number of the feedback graph at round $t$.
1 code implementation • 26 Jul 2022 • Chaofei Hong, Mengwen Yuan, Mengxiao Zhang, Xiao Wang, Chegnjun Zhang, Jiaxin Wang, Gang Pan, Zhaohui Wu, Huajin Tang
In this work, we present a Python based spiking neural network (SNN) simulation and training framework, aka SPAIC that aims to support brain-inspired model and algorithm researches integrated with features from both deep learning and neuroscience.
no code implementations • 12 Feb 2022 • Haipeng Luo, Mengxiao Zhang, Peng Zhao, Zhi-Hua Zhou
The CORRAL algorithm of Agarwal et al. (2017) and its variants (Foster et al., 2020a) achieve this goal with a regret overhead of order $\widetilde{O}(\sqrt{MT})$ where $M$ is the number of base algorithms and $T$ is the time horizon.
no code implementations • 12 Feb 2022 • Haipeng Luo, Mengxiao Zhang, Peng Zhao
We consider the problem of adversarial bandit convex optimization, that is, online learning over a sequence of arbitrary convex loss functions with only one function evaluation for each of them.
no code implementations • 30 Jan 2022 • Mengxiao Zhang, Peng Zhao, Haipeng Luo, Zhi-Hua Zhou
Learning from repeated play in a fixed two-player zero-sum game is a classic problem in game theory and online learning.
no code implementations • 11 Feb 2021 • Chung-Wei Lee, Haipeng Luo, Chen-Yu Wei, Mengxiao Zhang, Xiaojin Zhang
In this work, we develop linear bandit algorithms that automatically adapt to different environments.
no code implementations • 8 Feb 2021 • Chen-Yu Wei, Chung-Wei Lee, Mengxiao Zhang, Haipeng Luo
We study infinite-horizon discounted two-player zero-sum Markov games, and develop a decentralized algorithm that provably converges to the set of Nash equilibria under self-play.
1 code implementation • ICLR 2019 • Tiange Luo, Tianle Cai, Mengxiao Zhang, Siyu Chen, Li-Wei Wang
We next investigate the adversarial examples which 'fool' a CNN with Random Mask.
1 code implementation • ICLR 2021 • Chen-Yu Wei, Chung-Wei Lee, Mengxiao Zhang, Haipeng Luo
Specifically, for OMWU in bilinear games over the simplex, we show that when the equilibrium is unique, linear last-iterate convergence is achieved with a learning rate whose value is set to a universal constant, improving the result of (Daskalakis & Panageas, 2019b) under the same assumption.
no code implementations • NeurIPS 2020 • Chung-Wei Lee, Haipeng Luo, Chen-Yu Wei, Mengxiao Zhang
We develop a new approach to obtaining high probability regret bounds for online learning with bandit feedback against an adaptive adversary.
no code implementations • 2 Feb 2020 • Chung-Wei Lee, Haipeng Luo, Mengxiao Zhang
We study small-loss bounds for adversarial multi-armed bandits with graph feedback, that is, adaptive regret bounds that depend on the loss of the best arm or related quantities, instead of the total number of rounds.
1 code implementation • 19 Nov 2019 • Tiange Luo, Tianle Cai, Mengxiao Zhang, Siyu Chen, Di He, Li-Wei Wang
Robustness of convolutional neural networks (CNNs) has gained in importance on account of adversarial examples, i. e., inputs added as well-designed perturbations that are imperceptible to humans but can cause the model to predict incorrectly.
no code implementations • 5 Nov 2017 • Mengxiao Zhang, Wangquan Wu, Yanren Zhang, Kun He, Tao Yu, Huan Long, John E. Hopcroft
Our results show that the dimensions of different categories are close to each other and decline quickly along the convolutional layers and fully connected layers.
no code implementations • 2 Apr 2017 • Kun He, Jingbo Wang, Haochuan Li, Yao Shu, Mengxiao Zhang, Man Zhu, Li-Wei Wang, John E. Hopcroft
Toward a deeper understanding on the inner work of deep neural networks, we investigate CNN (convolutional neural network) using DCN (deconvolutional network) and randomization technique, and gain new insights for the intrinsic property of this network architecture.