no code implementations • 19 Mar 2024 • Yuchao Li, Dimitri Bertsekas
We consider methods for computing word sequences that are highly likely, based on these probabilities.
no code implementations • 2 Nov 2023 • Daniel Garces, Sushmita Bhattacharya, Dimitri Bertsekas, Stephanie Gil
We provide two main theoretical results: 1) characterize the number of taxis $m$ that is sufficient for IA to be stable; 2) derive a necessary condition on $m$ to maintain stability for IA as time goes to infinity.
no code implementations • 15 Dec 2022 • Dimitri Bertsekas
We provide a unifying approximate dynamic programming framework that applies to a broad variety of problems involving sequential estimation.
no code implementations • 28 Nov 2022 • Daniel Garces, Sushmita Bhattacharya, Stephanie Gil, Dimitri Bertsekas
We propose a mechanism for switching the originally trained offline approximation when the current demand is outside the original validity region.
no code implementations • 15 Nov 2022 • Siddhant Bhambri, Amrita Bhattacharjee, Dimitri Bertsekas
In this paper we address the solution of the popular Wordle puzzle, using new reinforcement learning methods, which apply more generally to adaptive control of dynamic systems and to classes of Partially Observable Markov Decision Process (POMDP) problems.
no code implementations • 19 Jul 2022 • Dimitri Bertsekas
We consider some classical optimization problems in path planning and network transport, and we introduce new auction-based algorithms for their optimal and suboptimal solution.
no code implementations • 20 Aug 2021 • Dimitri Bertsekas
In this paper we aim to provide analysis and insights (often based on visualization), which explain the beneficial effects of on-line decision making on top of off-line training.
no code implementations • 1 Jun 2021 • Dimitri Bertsekas
This allows the continuous updating/improvement of the current policy, thus resulting in a form of on-line PI that incorporates the improved controls into the current policy as new states and controls are generated.
no code implementations • 9 Nov 2020 • Sushmita Bhattacharya, Siva Kailas, Sahil Badyal, Stephanie Gil, Dimitri Bertsekas
Our methods specifically address the computational challenges of partially observable multiagent problems.
no code implementations • 4 May 2020 • Dimitri Bertsekas
In this paper, this result is extended to value iteration and optimistic versions of policy iteration, as well as to more general DP problems where the Bellman operator is a contraction mapping, such as stochastic shortest path problems with all policies being proper.
no code implementations • 18 Feb 2020 • Dimitri Bertsekas
Under suitable assumptions, we show that if the base heuristic produces a feasible solution, the rollout algorithm has a cost improvement property: it produces a feasible solution, whose cost is no worse than the base heuristic's cost.
no code implementations • 11 Feb 2020 • Sushmita Bhattacharya, Sahil Badyal, Thomas Wheeler, Stephanie Gil, Dimitri Bertsekas
In this paper we consider infinite horizon discounted dynamic programming problems with finite state and control spaces, and partial state observations.
no code implementations • 6 Oct 2019 • Dimitri Bertsekas
When $V$ is equal to the cost function $J_{\mu}$ of some known policy $\mu$ and there is only one aggregate state, our scheme is equivalent to the rollout algorithm based on $\mu$ (i. e., the result of a single policy improvement starting with the policy $\mu$).
1 code implementation • 30 Sep 2019 • Dimitri Bertsekas
The amount of local computation required at every stage by each agent is independent of the number of agents, while the amount of total computation (over all agents) grows linearly with the number of agents.