no code implementations • 6 Jun 2024 • Johannes Müller, Semih Cayci
We study the error introduced by entropy regularization of infinite-horizon discrete discounted Markov decision processes.
no code implementations • 28 May 2024 • Semih Cayci, Atilla Eryilmaz
In this paper, we study a natural policy gradient method based on recurrent neural networks (RNNs) for partially-observable Markov decision processes, whereby RNNs are used for policy parameterization and policy evaluation to address curse of dimensionality in non-Markovian reinforcement learning.
no code implementations • 19 Feb 2024 • Semih Cayci, Atilla Eryilmaz
We analyze recurrent neural networks trained with gradient descent in the supervised learning setting for dynamical systems, and prove that gradient descent can achieve optimality \emph{without} massive overparameterization.
no code implementations • 29 Dec 2022 • Batuhan Yardim, Semih Cayci, Matthieu Geist, Niao He
Instead, we show that $N$ agents running policy mirror ascent converge to the Nash equilibrium of the regularized game within $\widetilde{\mathcal{O}}(\varepsilon^{-2})$ samples from a single sample trajectory without a population generative model, up to a standard $\mathcal{O}(\frac{1}{\sqrt{N}})$ error due to the mean field.
no code implementations • 2 Jun 2022 • Semih Cayci, Niao He, R. Srikant
Natural actor-critic (NAC) and its variants, equipped with the representation power of neural networks, have demonstrated impressive empirical success in solving Markov decision problems with large state spaces.
no code implementations • 20 Feb 2022 • Semih Cayci, Niao He, R. Srikant
We consider the reinforcement learning problem for partially observed Markov decision processes (POMDPs) with large or even countably infinite state spaces, where the controller has access to only noisy observations of the underlying controlled Markov chain.
no code implementations • 9 Jun 2021 • Semih Cayci, Yilin Zheng, Atilla Eryilmaz
In a wide variety of applications including online advertising, contractual hiring, and wireless scheduling, the controller is constrained by a stringent budget constraint on the available resources, which are consumed in a random amount by each action, and a stochastic feasibility constraint that may impose important operational limitations on decision-making.
no code implementations • 8 Jun 2021 • Semih Cayci, Niao He, R. Srikant
Furthermore, under mild regularity conditions on the concentrability coefficient and basis vectors, we prove that entropy-regularized NPG exhibits \emph{linear convergence} up to a function approximation error.
no code implementations • 2 Mar 2021 • Semih Cayci, Siddhartha Satpathi, Niao He, R. Srikant
In this paper, we study the dynamics of temporal difference learning with neural network-based value function approximation over a general state space, namely, \emph{Neural TD learning}.
no code implementations • 30 Jun 2020 • Semih Cayci, Atilla Eryilmaz, R. Srikant
Time-constrained decision processes have been ubiquitous in many fundamental applications in physics, biology and computer science.
no code implementations • NeurIPS 2020 • Semih Cayci, Swati Gupta, Atilla Eryilmaz
Furthermore, as a consequence of certain ethical and economic concerns, the controller may impose deadlines on the completion of each task, and require fairness across different groups in the allocation of total time budget $B$.
no code implementations • 29 Feb 2020 • Semih Cayci, Atilla Eryilmaz, R. Srikant
We prove a regret lower bound for this problem, and show that the proposed algorithms achieve tight problem-dependent regret bounds, which are optimal up to a universal constant factor in the case of jointly Gaussian cost and reward pairs.
no code implementations • 27 Nov 2018 • Semih Cayci, Atilla Eryilmaz
We study the problem of serving randomly arriving and delay-sensitive traffic over a multi-channel communication system with time-varying channel states and unknown statistics.