no code implementations • 11 Feb 2024 • Jeongyeol Kwon, Dohyun Kwon, Hanbaek Lyu
We study the complexity of finding stationary points with such an $y^*$-aware oracle: we propose a simple first-order method that converges to an $\epsilon$ stationary point using $O(\epsilon^{-6}), O(\epsilon^{-4})$ access to first-order $y^*$-aware oracles.
no code implementations • 11 Feb 2024 • Jeongyeol Kwon, Liu Yang, Robert Nowak, Josiah Hanna
Then, our main contributions are two-fold: (a) we demonstrate that the performance of reinforcement learning is strongly correlated with the prediction accuracy of future observations in partially observable environments, and (b) our approach can significantly improve the overall end-to-end approach by preventing high-variance noisy signals from reinforcement learning objectives to influence the representation learning.
no code implementations • 11 Oct 2023 • Jeongyeol Kwon, Yonathan Efroni, Shie Mannor, Constantine Caramanis
In such an environment, the latent information remains fixed throughout each episode, since the identity of the user does not change during an interaction.
no code implementations • 4 Sep 2023 • Jeongyeol Kwon, Dohyun Kwon, Stephen Wright, Robert Nowak
When the perturbed lower-level problem uniformly satisfies the small-error proximal error-bound (EB) condition, we propose a first-order algorithm that converges to an $\epsilon$-stationary point of the penalty function, using in total $O(\epsilon^{-3})$ and $O(\epsilon^{-7})$ accesses to first-order (stochastic) gradient oracles when the oracle is deterministic and oracles are noisy, respectively.
no code implementations • 15 Jun 2023 • Haoyue Bai, Gregory Canal, Xuefeng Du, Jeongyeol Kwon, Robert Nowak, Yixuan Li
Modern machine learning models deployed in the wild can encounter both covariate and semantic shifts, giving rise to the problems of out-of-distribution (OOD) generalization and OOD detection respectively.
no code implementations • 26 Jan 2023 • Jeongyeol Kwon, Dohyun Kwon, Stephen Wright, Robert Nowak
Specifically, we show that F2SA converges to an $\epsilon$-stationary solution of the bilevel problem after $\epsilon^{-7/2}, \epsilon^{-5/2}$, and $\epsilon^{-3/2}$ iterations (each iteration using $O(1)$ samples) when stochastic noises are in both level objectives, only in the upper-level objective, and not present (deterministic settings), respectively.
no code implementations • 5 Oct 2022 • Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor
Then, through a method-of-moments approach, we design a procedure that provably learns a near-optimal policy with $O(\texttt{poly}(A) + \texttt{poly}(M, H)^{\min(M, H)})$ interactions.
no code implementations • 5 Oct 2022 • Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor
We consider episodic reinforcement learning in reward-mixing Markov decision processes (RMMDPs): at the beginning of every episode nature randomly picks a latent reward model among $M$ candidates and an agent interacts with the MDP throughout the episode for $H$ time steps.
no code implementations • 30 Jan 2022 • Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor
This parallelization gain is fundamentally altered by the presence of adversarial users: unless there are super-polynomial number of users, we show a lower bound of $\tilde{\Omega}(\min(S, A) \cdot \alpha^2 / \epsilon^2)$ {\it per-user} interactions to learn an $\epsilon$-optimal policy for the good users.
no code implementations • NeurIPS 2021 • Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor
We study the problem of learning a near optimal policy for two reward-mixing MDPs.
no code implementations • NeurIPS 2021 • Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor
In this work, we consider the regret minimization problem for reinforcement learning in latent Markov Decision Processes (LMDP).
no code implementations • 27 Jan 2021 • Jiacheng Zhuo, Jeongyeol Kwon, Nhat Ho, Constantine Caramanis
We consider solving the low rank matrix sensing problem with Factorized Gradient Descend (FGD) method when the true rank is unknown and over-specified, which we refer to as over-parameterized matrix sensing.
no code implementations • 4 Jun 2020 • Jeongyeol Kwon, Nhat Ho, Constantine Caramanis
In the low SNR regime where the SNR is below $\mathcal{O}((d/n)^{1/4})$, we show that EM converges to a $\mathcal{O}((d/n)^{1/4})$ neighborhood of the true parameters, after $\mathcal{O}((n/d)^{1/2})$ iterations.
no code implementations • 2 Feb 2020 • Jeongyeol Kwon, Constantine Caramanis
A fundamental previous result established that separation of $\Omega(\sqrt{\log k})$ is necessary and sufficient for identifiability of the parameters with polynomial sample complexity (Regev and Vijayaraghavan, 2017).
no code implementations • 28 May 2019 • Jeongyeol Kwon, Constantine Caramanis
In particular, our results imply exact recovery as $\sigma \rightarrow 0$, in contrast to most previous local convergence results for EM, where the statistical error scaled with the norm of parameters.
no code implementations • 12 Oct 2018 • Jeongyeol Kwon, Wei Qian, Constantine Caramanis, Yudong Chen, Damek Davis
Recent results established that EM enjoys global convergence for Gaussian Mixture Models.