1 code implementation • 31 Jan 2024 • Lucile Ter-Minassian, Liran Szlak, Ehud Karavani, Chris Holmes, Yishai Shimoni
Interpretability and transparency are essential for incorporating causal effect models from observational data into policy decision-making.
no code implementations • 8 Dec 2021 • Liran Szlak, Kristoffer Aberg, Rony Paz
During probabilistic learning organisms often apply a sub-optimal "probability-matching" strategy, where selection rates match reward probabilities, rather than engaging in the optimal "maximization" strategy, where the option with the highest reward probability is always selected.
no code implementations • 8 Dec 2021 • Liran Szlak, Ohad Shamir
Experience replay \citep{lin1993reinforcement, mnih2015human} is a widely used technique to achieve efficient use of data and improved performance in RL algorithms.
no code implementations • 8 Dec 2021 • Liran Szlak, Ohad Shamir
A commonly used heuristic in RL is experience replay (e. g.~\citet{lin1993reinforcement, mnih2015human}), in which a learner stores and re-uses past trajectories as if they were sampled online.
no code implementations • ICML 2017 • Ohad Shamir, Liran Szlak
In this paper, we consider the applicability of this setting to convex online learning with delayed feedback, in which the feedback on the prediction made in round $t$ arrives with some delay $\tau$.
no code implementations • 9 Dec 2015 • Jonathan Rosenski, Ohad Shamir, Liran Szlak
We consider a variant of the stochastic multi-armed bandit problem, where multiple players simultaneously choose from the same set of arms and may collide, receiving no reward.