Search Results for author: Semih Cayci

Found 13 papers, 0 papers with code

Essentially Sharp Estimates on the Entropy Regularization Error in Discrete Discounted Markov Decision Processes

no code implementations6 Jun 2024 Johannes Müller, Semih Cayci

We study the error introduced by entropy regularization of infinite-horizon discrete discounted Markov decision processes.

Policy Gradient Methods

Recurrent Natural Policy Gradient for POMDPs

no code implementations28 May 2024 Semih Cayci, Atilla Eryilmaz

In this paper, we study a natural policy gradient method based on recurrent neural networks (RNNs) for partially-observable Markov decision processes, whereby RNNs are used for policy parameterization and policy evaluation to address curse of dimensionality in non-Markovian reinforcement learning.

Convergence of Gradient Descent for Recurrent Neural Networks: A Nonasymptotic Analysis

no code implementations19 Feb 2024 Semih Cayci, Atilla Eryilmaz

We analyze recurrent neural networks trained with gradient descent in the supervised learning setting for dynamical systems, and prove that gradient descent can achieve optimality \emph{without} massive overparameterization.

Policy Mirror Ascent for Efficient and Independent Learning in Mean Field Games

no code implementations29 Dec 2022 Batuhan Yardim, Semih Cayci, Matthieu Geist, Niao He

Instead, we show that $N$ agents running policy mirror ascent converge to the Nash equilibrium of the regularized game within $\widetilde{\mathcal{O}}(\varepsilon^{-2})$ samples from a single sample trajectory without a population generative model, up to a standard $\mathcal{O}(\frac{1}{\sqrt{N}})$ error due to the mean field.

Finite-Time Analysis of Entropy-Regularized Neural Natural Actor-Critic Algorithm

no code implementations2 Jun 2022 Semih Cayci, Niao He, R. Srikant

Natural actor-critic (NAC) and its variants, equipped with the representation power of neural networks, have demonstrated impressive empirical success in solving Markov decision problems with large state spaces.

Finite-Time Analysis of Natural Actor-Critic for POMDPs

no code implementations20 Feb 2022 Semih Cayci, Niao He, R. Srikant

We consider the reinforcement learning problem for partially observed Markov decision processes (POMDPs) with large or even countably infinite state spaces, where the controller has access to only noisy observations of the underlying controlled Markov chain.

A Lyapunov-Based Methodology for Constrained Optimization with Bandit Feedback

no code implementations9 Jun 2021 Semih Cayci, Yilin Zheng, Atilla Eryilmaz

In a wide variety of applications including online advertising, contractual hiring, and wireless scheduling, the controller is constrained by a stringent budget constraint on the available resources, which are consumed in a random amount by each action, and a stochastic feasibility constraint that may impose important operational limitations on decision-making.

Decision Making Scheduling

Linear Convergence of Entropy-Regularized Natural Policy Gradient with Linear Function Approximation

no code implementations8 Jun 2021 Semih Cayci, Niao He, R. Srikant

Furthermore, under mild regularity conditions on the concentrability coefficient and basis vectors, we prove that entropy-regularized NPG exhibits \emph{linear convergence} up to a function approximation error.

Sample Complexity and Overparameterization Bounds for Temporal Difference Learning with Neural Network Approximation

no code implementations2 Mar 2021 Semih Cayci, Siddhartha Satpathi, Niao He, R. Srikant

In this paper, we study the dynamics of temporal difference learning with neural network-based value function approximation over a general state space, namely, \emph{Neural TD learning}.

Continuous-Time Multi-Armed Bandits with Controlled Restarts

no code implementations30 Jun 2020 Semih Cayci, Atilla Eryilmaz, R. Srikant

Time-constrained decision processes have been ubiquitous in many fundamental applications in physics, biology and computer science.

Multi-Armed Bandits

Group-Fair Online Allocation in Continuous Time

no code implementations NeurIPS 2020 Semih Cayci, Swati Gupta, Atilla Eryilmaz

Furthermore, as a consequence of certain ethical and economic concerns, the controller may impose deadlines on the completion of each task, and require fairness across different groups in the allocation of total time budget $B$.

Cloud Computing Decision Making +2

Budget-Constrained Bandits over General Cost and Reward Distributions

no code implementations29 Feb 2020 Semih Cayci, Atilla Eryilmaz, R. Srikant

We prove a regret lower bound for this problem, and show that the proposed algorithms achieve tight problem-dependent regret bounds, which are optimal up to a universal constant factor in the case of jointly Gaussian cost and reward pairs.

Optimal Learning for Dynamic Coding in Deadline-Constrained Multi-Channel Networks

no code implementations27 Nov 2018 Semih Cayci, Atilla Eryilmaz

We study the problem of serving randomly arriving and delay-sensitive traffic over a multi-channel communication system with time-varying channel states and unknown statistics.

Thompson Sampling

Cannot find the paper you are looking for? You can Submit a new open access paper.