1 code implementation • 2 Sep 2024 • Baturay Saglam, Dionysis Kalogerias
Deterministic policy gradient algorithms are foundational for actor-critic methods in controlling continuous systems, yet they often encounter inaccuracies due to their dependence on the derivative of the critic's value estimates with respect to input actions.
2 code implementations • 10 Oct 2022 • Baturay Saglam, Doga Gurgunoglu, Suleyman S. Kozat
We introduce a novel deep reinforcement learning (DRL) approach to jointly optimize transmit beamforming and reconfigurable intelligent surface (RIS) phase shifts in a multiuser multiple input single output (MU-MISO) system to maximize the sum downlink rate under the phase-dependent reflection amplitude model.
2 code implementations • 1 Oct 2022 • Baturay Saglam, Suleyman S. Kozat
In continuous control, exploration is often performed through undirected strategies in which parameters of the networks or selected actions are perturbed by random noise.
1 code implementation • 1 Sep 2022 • Baturay Saglam, Furkan B. Mutlu, Dogan C. Cicek, Suleyman S. Kozat
A widely-studied deep reinforcement learning (RL) technique known as Prioritized Experience Replay (PER) allows agents to learn from transitions sampled with non-uniform probability proportional to their temporal-difference (TD) error.
1 code implementation • 1 Aug 2022 • Baturay Saglam, Dogan C. Cicek, Furkan B. Mutlu, Suleyman S. Kozat
Compared to on-policy counterparts, off-policy model-free deep reinforcement learning can improve data efficiency by repeatedly using the previously gathered data.
1 code implementation • 27 Jul 2022 • Baturay Saglam, Dogan C. Cicek, Furkan B. Mutlu, Suleyman S. Kozat
Learning in high dimensional continuous tasks is challenging, mainly when the experience replay memory is very limited.
no code implementations • 12 Nov 2021 • Dogan C. Cicek, Enes Duran, Baturay Saglam, Kagan Kaya, Furkan B. Mutlu, Suleyman S. Kozat
We show through continuous control environments of OpenAI gym that our algorithm matches or outperforms the state-of-the-art off-policy policy gradient learning algorithms.
no code implementations • 2 Nov 2021 • Dogan C. Cicek, Enes Duran, Baturay Saglam, Furkan B. Mutlu, Suleyman S. Kozat
In addition, experience replay stores the transitions are generated by the previous policies of the agent that may significantly deviate from the most recent policy of the agent.
1 code implementation • 24 Sep 2021 • Baturay Saglam, Furkan Burak Mutlu, Dogan Can Cicek, Suleyman Serdar Kozat
We show that when the reinforcement signals received by the agents have a high variance, deep actor-critic approaches that overcome the overestimation bias lead to a substantial underestimation bias.
1 code implementation • 22 Sep 2021 • Baturay Saglam, Enes Duran, Dogan C. Cicek, Furkan B. Mutlu, Suleyman S. Kozat
We show that in deep actor-critic methods that aim to overcome the overestimation bias, if the reinforcement signals received by the agent have a high variance, a significant underestimation bias arises.