no code implementations • 23 Oct 2024 • Philip Amortila, Dylan J. Foster, Nan Jiang, Akshay Krishnamurthy, Zakaria Mhammedi
Real-world applications of reinforcement learning often involve environments where agents operate on complex, high-dimensional observations, but the underlying (''latent'') dynamics are comparatively simple.
1 code implementation • 11 Mar 2024 • Philip Amortila, Dylan J. Foster, Akshay Krishnamurthy
We propose exploration objectives -- policy optimization objectives that enable downstream maximization of any reward function -- as a conceptual framework to systematize the study of exploration.
no code implementations • 22 Jan 2024 • Philip Amortila, Tongyi Cao, Akshay Krishnamurthy
A pervasive phenomenon in machine learning applications is distribution shift, where training and deployment conditions for a machine learning model differ.
no code implementations • 18 Jan 2024 • Philip Amortila, Dylan J. Foster, Nan Jiang, Ayush Sekhari, Tengyang Xie
The theories of offline and online reinforcement learning, despite having evolved in parallel, have begun to show signs of the possibility for a unification, with algorithms and analysis techniques for one setting often having natural counterparts in the other.
no code implementations • 25 Jul 2023 • Philip Amortila, Nan Jiang, Csaba Szepesvári
Theoretical guarantees in reinforcement learning (RL) are known to suffer multiplicative blow-up factors with respect to the misspecification error of function approximation.
no code implementations • 18 Jul 2022 • Philip Amortila, Nan Jiang, Dhruv Madeka, Dean P. Foster
Towards establishing the minimal amount of expert queries needed, we show that, in the same setting, any learner whose exploration budget is polynomially-bounded (in terms of $d, H,$ and $|\mathcal{A}|$) will require at least $\tilde\Omega(\sqrt{d})$ oracle calls to recover a policy competing with the expert's value function.
no code implementations • 3 Feb 2021 • Gellért Weisz, Philip Amortila, Barnabás Janzer, Yasin Abbasi-Yadkori, Nan Jiang, Csaba Szepesvári
We consider local planning in fixed-horizon MDPs with a generative model under the assumption that the optimal value function lies close to the span of a feature map.
no code implementations • 2 Nov 2020 • Philip Amortila, Nan Jiang, Tengyang Xie
Recently, Wang et al. (2020) showed a highly intriguing hardness result for batch reinforcement learning (RL) with linearly realizable value function and good feature coverage in the finite-horizon case.
no code implementations • 3 Oct 2020 • Gellért Weisz, Philip Amortila, Csaba Szepesvári
We consider the problem of local planning in fixed-horizon and discounted Markov Decision Processes (MDPs) with linear function approximation and a generative model under the assumption that the optimal action-value function lies in the span of a feature map that is available to the planner.
no code implementations • ICML 2020 • Harsh Satija, Philip Amortila, Joelle Pineau
In standard RL, the agent is incentivized to explore any behavior as long as it maximizes rewards, but in the real world, undesired behavior can damage either the system or the agent in a way that breaks the learning process itself.
no code implementations • 27 Mar 2020 • Philip Amortila, Doina Precup, Prakash Panangaden, Marc G. Bellemare
We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes.
no code implementations • 21 Jun 2018 • Philip Amortila, Guillaume Rabusseau
Graph Weighted Models (GWMs) have recently been proposed as a natural generalization of weighted automata over strings and trees to arbitrary families of labeled graphs (and hypergraphs).