no code implementations • 11 Feb 2025 • Raj Pabari, Udaya Ghai, Dominique Perrault-Joncas, Kari Torkkola, Orit Ronen, Dhruv Madeka, Dean Foster, Omer Gottesman
We introduce and analyze a variation of the Bertrand game in which the revenue is shared between two players.
no code implementations • 15 Oct 2024 • Wenda Xu, Rujun Han, Zifeng Wang, Long T. Le, Dhruv Madeka, Lei LI, William Yang Wang, Rishabh Agarwal, Chen-Yu Lee, Tomas Pfister
To address these limitations, we introduce Speculative Knowledge Distillation (SKD), a novel approach that leverages cooperation between student and teacher models to generate high-quality training data on-the-fly while aligning with the student's inference-time distribution.
no code implementations • 24 Sep 2024 • Carson Eisenach, Udaya Ghai, Dhruv Madeka, Kari Torkkola, Dean Foster, Sham Kakade
This paper addresses the capacitated periodic review inventory control problem, focusing on a retailer managing multiple products with limited shared resources, such as storage or inbound labor at a facility.
1 code implementation • 7 Dec 2023 • HANLIN ZHANG, Yi-Fan Zhang, Yaodong Yu, Dhruv Madeka, Dean Foster, Eric Xing, Himabindu Lakkaraju, Sham Kakade
Accurate uncertainty quantification is crucial for the safe deployment of machine learning models, and prior research has demonstrated improvements in the calibration of modern language models (LMs).
no code implementations • 26 Oct 2023 • Sohrab Andaz, Carson Eisenach, Dhruv Madeka, Kari Torkkola, Randy Jia, Dean Foster, Sham Kakade
In this paper we address the problem of learning and backtesting inventory control policies in the presence of general arrival dynamics -- which we term as a quantity-over-time arrivals model (QOT).
no code implementations • 24 Oct 2023 • Dean Foster, Randy Jia, Dhruv Madeka
Solutions to address the periodic review inventory control problem with nonstationary random demand, lost sales, and stochastic vendor lead times typically involve making strong assumptions on the dynamics for either approximation or simulation, and applying methods such as optimization, dynamic programming, or reinforcement learning.
1 code implementation • 18 Jul 2023 • Jens Tuyls, Dhruv Madeka, Kari Torkkola, Dean Foster, Karthik Narasimhan, Sham Kakade
Inspired by recent work in Natural Language Processing (NLP) where "scaling up" has resulted in increasingly more capable LLMs, we investigate whether carefully scaling up model and data size can bring similar improvements in the imitation learning setting for single-agent games.
no code implementations • 14 Nov 2022 • Zeyu Jia, Randy Jia, Dhruv Madeka, Dean P. Foster
We study the problem of Reinforcement Learning (RL) with linear function approximation, i. e. assuming the optimal action-value function is linear in a known $d$-dimensional feature mapping.
no code implementations • 6 Oct 2022 • Dhruv Madeka, Kari Torkkola, Carson Eisenach, Anna Luo, Dean P. Foster, Sham M. Kakade
This work provides a Deep Reinforcement Learning approach to solving a periodic review inventory control system with stochastic vendor lead times, lost sales, correlated demand, and price matching.
no code implementations • 21 Jul 2022 • Sitan Yang, Carson Eisenach, Dhruv Madeka
For example, MQTransformer - an improvement of MQCNN - has shown the state-of-the-art performance in probabilistic demand forecasting.
no code implementations • 18 Jul 2022 • Philip Amortila, Nan Jiang, Dhruv Madeka, Dean P. Foster
Towards establishing the minimal amount of expert queries needed, we show that, in the same setting, any learner whose exploration budget is polynomially-bounded (in terms of $d, H,$ and $|\mathcal{A}|$) will require at least $\tilde\Omega(\sqrt{d})$ oracle calls to recover a policy competing with the expert's value function.
no code implementations • 14 Dec 2021 • Nilesh Tripuraneni, Dhruv Madeka, Dean Foster, Dominique Perrault-Joncas, Michael I. Jordan
The key insight of our procedure is that the noisy (but unbiased) difference-of-means estimate can be used as a ground truth ``label" on a portion of the RCT, to test the performance of an estimator trained on the other portion.
no code implementations • 30 Sep 2020 • Carson Eisenach, Yagna Patel, Dhruv Madeka
In this work, we propose novel improvements to the current state of the art by incorporating changes inspired by recent advances in Transformer architectures for Natural Language Processing.
5 code implementations • 29 Nov 2017 • Ruofeng Wen, Kari Torkkola, Balakrishnan Narayanaswamy, Dhruv Madeka
We propose a framework for general probabilistic multi-step time series regression.
no code implementations • 21 Apr 2017 • Mathieu Cliche, David Rosenberg, Dhruv Madeka, Connie Yee
Charts are an excellent way to convey patterns and trends in data, but they do not facilitate further modeling of the data or close inspection of individual data points.
Optical Character Recognition
Optical Character Recognition (OCR)