no code implementations • 3 Dec 2024 • Yuda Song, HANLIN ZHANG, Carson Eisenach, Sham Kakade, Dean Foster, Udaya Ghai
Self-improvement is a mechanism in Large Language Model (LLM) pre-training, post-training and test-time inference.
no code implementations • 24 Sep 2024 • Carson Eisenach, Udaya Ghai, Dhruv Madeka, Kari Torkkola, Dean Foster, Sham Kakade
This paper addresses the capacitated periodic review inventory control problem, focusing on a retailer managing multiple products with limited shared resources, such as storage or inbound labor at a facility.
no code implementations • 26 Oct 2023 • Sohrab Andaz, Carson Eisenach, Dhruv Madeka, Kari Torkkola, Randy Jia, Dean Foster, Sham Kakade
In this paper we address the problem of learning and backtesting inventory control policies in the presence of general arrival dynamics -- which we term as a quantity-over-time arrivals model (QOT).
no code implementations • 6 Oct 2022 • Dhruv Madeka, Kari Torkkola, Carson Eisenach, Anna Luo, Dean P. Foster, Sham M. Kakade
This work provides a Deep Reinforcement Learning approach to solving a periodic review inventory control system with stochastic vendor lead times, lost sales, correlated demand, and price matching.
no code implementations • 21 Jul 2022 • Sitan Yang, Carson Eisenach, Dhruv Madeka
For example, MQTransformer - an improvement of MQCNN - has shown the state-of-the-art performance in probabilistic demand forecasting.
no code implementations • 30 Sep 2020 • Carson Eisenach, Yagna Patel, Dhruv Madeka
In this work, we propose novel improvements to the current state of the art by incorporating changes inspired by recent advances in Transformer architectures for Natural Language Processing.
1 code implementation • ICLR 2019 • Carson Eisenach, Haichuan Yang, Ji Liu, Han Liu
In the former, an agent learns a policy over $\mathbb{R}^d$ and in the latter, over a discrete set of actions each of which is parametrized by a continuous parameter.
no code implementations • 13 Jun 2018 • Carson Eisenach, Florentina Bunea, Yang Ning, Claudiu Dinicu
We employ model assisted clustering, in which the clusters contain features that are similar to the same unobserved latent variable.
2 code implementations • 1 Jun 2018 • Carson Eisenach, Han Liu
Compared to the naive interior point method, our method reduces the computational complexity of solving the SDP from $\tilde{O}(d^7\log\epsilon^{-1})$ to $\tilde{O}(d^{6}K^{-2}\epsilon^{-1})$ arithmetic operations for an $\epsilon$-optimal solution.
1 code implementation • ICLR 2018 • Jiechao Xiong, Qing Wang, Zhuoran Yang, Peng Sun, Yang Zheng, Lei Han, Haobo Fu, Xiangru Lian, Carson Eisenach, Haichuan Yang, Emmanuel Ekwedike, Bei Peng, Haoyue Gao, Tong Zhang, Ji Liu, Han Liu
Most existing deep reinforcement learning (DRL) frameworks consider action spaces that are either discrete or continuous space.