no code implementations • 11 Nov 2024 • Haolin Liu, Zakaria Mhammedi, Chen-Yu Wei, Julian Zimmert
First, we improve the $poly(d, A, H)T^{5/6}$ regret bound of Zhao et al. (2024) to $poly(d, A, H)T^{2/3}$ for the full-information unknown transition setting, where d is the rank of the transitions, A is the number of actions, H is the horizon length, and T is the number of episodes.
no code implementations • 23 Oct 2024 • Philip Amortila, Dylan J. Foster, Nan Jiang, Akshay Krishnamurthy, Zakaria Mhammedi
Real-world applications of reinforcement learning often involve environments where agents operate on complex, high-dimensional observations, but the underlying (''latent'') dynamics are comparatively simple.
no code implementations • 3 Oct 2024 • Zakaria Mhammedi
Existing projection-free methods based on the classical Frank-Wolfe algorithm achieve a suboptimal regret bound of $O(T^{3/4})$, while more recent separation-based approaches guarantee a regret bound of $O(\kappa \sqrt{T})$, where $\kappa$ denotes the asphericity of the feasible set, defined as the ratio of the radii of the containing and contained balls.
no code implementations • 1 Oct 2024 • Daniel Pfrommer, Swati Padmanabhan, Kwangjun Ahn, Jack Umenberger, Tobia Marcucci, Zakaria Mhammedi, Ali Jadbabaie
Recent work in imitation learning has shown that having an expert controller that is both suitably smooth and stable enables stronger guarantees on the performance of the learned controller.
no code implementations • 7 Sep 2024 • Zakaria Mhammedi
Designing sample-efficient and computationally feasible reinforcement learning (RL) algorithms is particularly challenging in environments with large or infinite state and action spaces.
no code implementations • 30 May 2024 • Ashok Cutkosky, Zakaria Mhammedi
We provide an online learning algorithm that obtains regret $G\|w_\star\|\sqrt{T\log(\|w_\star\|G\sqrt{T})} + \|w_\star\|^2 + G^2$ on $G$-Lipschitz convex losses for any comparison point $w_\star$ without knowing either $G$ or $\|w_\star\|$.
no code implementations • 23 Apr 2024 • Zakaria Mhammedi, Dylan J. Foster, Alexander Rakhlin
We use local simulator access to unlock new statistical guarantees that were previously out of reach: - We show that MDPs with low coverability (Xie et al. 2023) -- a general structural condition that subsumes Block MDPs and Low-Rank MDPs -- can be learned in a sample-efficient fashion with only $Q^{\star}$-realizability (realizability of the optimal state-value function); existing online RL algorithms require significantly stronger representation conditions.
no code implementations • NeurIPS 2023 • Zakaria Mhammedi, Adam Block, Dylan J. Foster, Alexander Rakhlin
A major challenge in reinforcement learning is to develop practical, sample-efficient algorithms for exploration in high-dimensional domains where generalization and function approximation is required.
no code implementations • 2 Jun 2023 • Daniel Pfrommer, Swati Padmanabhan, Kwangjun Ahn, Jack Umenberger, Tobia Marcucci, Zakaria Mhammedi, Ali Jadbabaie
Recent work in imitation learning has shown that having an expert controller that is both suitably smooth and stable enables stronger guarantees on the performance of the learned controller.
1 code implementation • 12 Apr 2023 • Zakaria Mhammedi, Dylan J. Foster, Alexander Rakhlin
We address these issues by providing the first computationally efficient algorithm that attains rate-optimal sample complexity with respect to the desired accuracy level, with minimal statistical assumptions.
no code implementations • 2 Nov 2022 • Zakaria Mhammedi, Khashayar Gatmiry
Typical algorithms for these settings, such as the Online Newton Step (ONS), can guarantee a $O(d\ln T)$ bound on their regret after $T$ rounds, where $d$ is the dimension of the feasible set.
no code implementations • 17 Oct 2022 • Kwangjun Ahn, Zakaria Mhammedi, Horia Mania, Zhang-Wei Hong, Ali Jadbabaie
Recent approaches to data-driven MPC have used the simplest form of imitation learning known as behavior cloning to learn controllers that mimic the performance of MPC by online sampling of the trajectories of the closed-loop MPC system.
no code implementations • 23 May 2022 • Zakaria Mhammedi
In this paper, we leverage recent results in parameter-free Online Learning, and develop an OCO algorithm that makes two calls to an LO Oracle per round and achieves the near-optimal $\widetilde{O}(\sqrt{T})$ regret whenever the feasible set is strongly convex.
no code implementations • 15 Feb 2022 • Zakaria Mhammedi, Alexander Rakhlin
In this paper, we build on the recent work by Haipeng et al. 2018 and present the first practical online portfolio selection algorithm with a logarithmic regret and whose per-round time and space complexities depend only logarithmically on the horizon.
no code implementations • NeurIPS 2021 • Zakaria Mhammedi
Acquisition of data is a difficult task in many applications of machine learning, and it is only natural that one hopes and expects the population risk to decrease (better performance) monotonically with increasing data points.
no code implementations • 10 Nov 2021 • Zakaria Mhammedi
However, the Frank-Wolfe algorithm and its variants do not achieve the optimal performance, in terms of regret or rate, for general convex sets.
no code implementations • 28 Nov 2020 • Zakaria Mhammedi
Acquisition of data is a difficult task in many applications of machine learning, and it is only natural that one hopes and expects the population risk to decrease (better performance) monotonically with increasing data points.
no code implementations • NeurIPS 2020 • Zakaria Mhammedi, Dylan J. Foster, Max Simchowitz, Dipendra Misra, Wen Sun, Akshay Krishnamurthy, Alexander Rakhlin, John Langford
We introduce a new algorithm, RichID, which learns a near-optimal policy for the RichLQR with sample complexity scaling only with the dimension of the latent state space and the capacity of the decoder function class.
no code implementations • NeurIPS 2020 • Zakaria Mhammedi, Benjamin Guedj, Robert C. Williamson
Conditional Value at Risk (CVaR) is a family of "coherent risk measures" which generalize the traditional mathematical expectation.
no code implementations • 27 Feb 2020 • Zakaria Mhammedi, Wouter M. Koolen
We study Online Convex Optimization in the unbounded setting where neither predictions nor gradient are constrained.
1 code implementation • NeurIPS 2019 • Zakaria Mhammedi, Peter D. Grunwald, Benjamin Guedj
We present a new PAC-Bayesian generalization bound.
no code implementations • 27 Feb 2019 • Zakaria Mhammedi, Wouter M. Koolen, Tim van Erven
For MetaGrad, we further improve the computational efficiency of handling constraints on the domain of prediction, and we remove the need to specify the number of rounds in advance.
no code implementations • CVPR 2018 • Soumava Kumar Roy, Zakaria Mhammedi, Mehrtash Harandi
In this paper, we extend some popular optimization algorithm to the Riemannian (constrained) setting.
no code implementations • NeurIPS 2018 • Zakaria Mhammedi, Robert C. Williamson
For a given entropy $\Phi$, losses for which a constant regret is possible using the \textsc{GAA} are called $\Phi$-mixable.
no code implementations • 4 Mar 2017 • Xingjun Ma, Sudanthi Wijewickrema, Shuo Zhou, Yun Zhou, Zakaria Mhammedi, Stephen O'Leary, James Bailey
It is the aim of this paper to develop an efficient and effective feedback generation method for the provision of real-time feedback in SBT.
1 code implementation • ICML 2017 • Zakaria Mhammedi, Andrew Hellicar, Ashfaqur Rahman, James Bailey
Our contributions are as follows; we first show that constraining the transition matrix to be unitary is a special case of an orthogonal constraint.