no code implementations • 13 Oct 2023 • Viraj Nadkarni, Jiachen Hu, Ranvir Rana, Chi Jin, Sanjeev Kulkarni, Pramod Viswanath
This ensures that the market maker balances losses to informed traders with profits from noise traders.
no code implementations • 21 Feb 2023 • Han Zhong, Jiachen Hu, Yecheng Xue, Tongyang Li, LiWei Wang
While quantum reinforcement learning (RL) has attracted a surge of attention recently, its theoretical understanding is limited.
no code implementations • 27 Oct 2022 • Jiachen Hu, Han Zhong, Chi Jin, LiWei Wang
Sim-to-real transfer trains RL agents in the simulated environments and then deploys them in the real world.
no code implementations • ICLR 2022 • Xiaoyu Chen, Jiachen Hu, Lin F. Yang, LiWei Wang
In particular, we take a plug-in solver approach, where we focus on learning a model in the exploration phase and demand that \emph{any planning algorithm} on the learned model can give a near-optimal policy.
Model-based Reinforcement Learning Reinforcement Learning (RL)
no code implementations • ICLR 2022 • Xiaoyu Chen, Jiachen Hu, Chi Jin, Lihong Li, LiWei Wang
Reinforcement learning encounters many challenges when applied directly in the real world.
no code implementations • 8 Feb 2021 • Jiachen Hu, Xiaoyu Chen, Chi Jin, Lihong Li, LiWei Wang
This paper studies representation learning for multi-task linear bandits and multi-task episodic RL with linear value function approximation.
no code implementations • ICLR 2021 • Xiaoyu Chen, Jiachen Hu, Lihong Li, Li-Wei Wang
The regret of FMDP-BF is shown to be exponentially smaller than that of optimal algorithms designed for non-factored MDPs, and improves on the best previous result for FMDPs~\citep{osband2014near} by a factored of $\sqrt{H|\mathcal{S}_i|}$, where $|\mathcal{S}_i|$ is the cardinality of the factored state subspace and $H$ is the planning horizon.
no code implementations • ICLR 2020 • Yuanhao Wang, Jiachen Hu, Xiaoyu Chen, Li-Wei Wang
We study the problem of regret minimization for distributed bandits learning, in which $M$ agents work collaboratively to minimize their total regret under the coordination of a central server.