no code implementations • 26 Oct 2023 • Sohrab Andaz, Carson Eisenach, Dhruv Madeka, Kari Torkkola, Randy Jia, Dean Foster, Sham Kakade
In this paper we address the problem of learning and backtesting inventory control policies in the presence of general arrival dynamics -- which we term as a quantity-over-time arrivals model (QOT).
no code implementations • 24 Oct 2023 • Dean Foster, Randy Jia, Dhruv Madeka
Solutions to address the periodic review inventory control problem with nonstationary random demand, lost sales, and stochastic vendor lead times typically involve making strong assumptions on the dynamics for either approximation or simulation, and applying methods such as optimization, dynamic programming, or reinforcement learning.
no code implementations • 14 Nov 2022 • Zeyu Jia, Randy Jia, Dhruv Madeka, Dean P. Foster
We study the problem of Reinforcement Learning (RL) with linear function approximation, i. e. assuming the optimal action-value function is linear in a known $d$-dimensional feature mapping.
no code implementations • 10 May 2019 • Shipra Agrawal, Randy Jia
We consider the relatively less studied problem of designing a learning algorithm for this problem when the underlying demand distribution is unknown.
no code implementations • NeurIPS 2017 • Shipra Agrawal, Randy Jia
Our main result is a high probability regret upper bound of $\tilde{O}(D\sqrt{SAT})$ for any communicating MDP with $S$ states, $A$ actions and diameter $D$, when $T\ge S^5A$.
no code implementations • 19 May 2017 • Shipra Agrawal, Randy Jia
We present an algorithm based on posterior sampling (aka Thompson sampling) that achieves near-optimal worst-case regret bounds when the underlying Markov Decision Process (MDP) is communicating with a finite, though unknown, diameter.