no code implementations • 1 May 2024 • Nitsan Soffair, Gilad Katz
Discounted algorithms often encounter evaluation errors due to their reliance on short-term estimations, which can impede their efficacy in addressing simple, short-term tasks and impose undesired temporal discounts (\(\gamma\)).
no code implementations • 8 Mar 2024 • Nitsan Soffair, Shie Mannor
DDPG is hindered by the overestimation bias problem, wherein its $Q$-estimates tend to overstate the actual $Q$-values.
no code implementations • 3 Feb 2024 • Nitsan Soffair, Shie Mannor
MinMaxMin $Q$-learning is a novel optimistic Actor-Critic algorithm that addresses the problem of overestimation bias ($Q$-estimations are overestimating the real $Q$-values) inherent in conservative RL algorithms.
no code implementations • 3 Feb 2024 • Nitsan Soffair, Dotan Di-Castro, Orly Avner, Shie Mannor
We implement SQT on top of TD3/TD7 code and test it against the state-of-the-art (SOTA) actor-critic algorithms, DDPG, TD3 and TD7 on seven popular MuJoCo and Bullet tasks.
no code implementations • 3 Jan 2023 • Nitsan Soffair
The SOTA algorithms for addressing QDec-POMDP issues, QDec-FP and QDec-FPS, are unable to effectively tackle problems that involve different types of sensing agents.
no code implementations • 9 Nov 2022 • Nitsan Soffair
In the first stage, we solve a single-agent problem and get a policy.