no code implementations • 19 Feb 2023 • Jonathan N. Lee, Weihao Kong, Aldo Pacchiano, Vidya Muthukumar, Emma Brunskill
Whether this is possible for more realistic context distributions has remained an open and important question for tasks such as model selection.
no code implementations • 26 Nov 2022 • Abhi Gupta, Ted Moskovitz, David Alvarez-Melis, Aldo Pacchiano
Transferring knowledge across domains is one of the most fundamental problems in machine learning, but doing so effectively in the context of reinforcement learning remains largely an open problem.
no code implementations • 9 Nov 2022 • Andrew Wagenmaker, Aldo Pacchiano
Practical scenarios often motivate an intermediate setting: if we have some set of offline data and, in addition, may also interact with the environment, how can we best use the offline data to minimize the number of online interactions necessary to learn an $\epsilon$-optimal policy?
no code implementations • 23 Oct 2022 • Yingchen Xu, Jack Parker-Holder, Aldo Pacchiano, Philip J. Ball, Oleh Rybkin, Stephen J. Roberts, Tim Rocktäschel, Edward Grefenstette
We then present CASCADE, a novel approach for self-supervised exploration in this new setting.
no code implementations • 18 Oct 2022 • Abhishek Gupta, Aldo Pacchiano, Yuexiang Zhai, Sham M. Kakade, Sergey Levine
Reinforcement learning provides an automated framework for learning behaviors from high-level reward specifications, but in practice the choice of reward function can be crucial for good results -- while in principle the reward only needs to specify what the task is, in reality practitioners often need to design more detailed rewards that provide the agent with some hints about how the task should be completed.
no code implementations • 26 Jul 2022 • Aldo Pacchiano, Drausin Wulsin, Robert A. Barton, Luis Voloch
The problem of how to genetically modify cells in order to maximize a certain cellular phenotype has taken center stage in drug development over the last few years (with, for example, genetically edited CAR-T, CAR-NK, and CAR-NKT cells entering cancer clinical trials).
no code implementations • 29 Jun 2022 • Aldo Pacchiano, Christoph Dann, Claudio Gentile
We study the problem of model selection in bandit scenarios in the presence of nested policy classes, with the goal of obtaining simultaneous adversarial and stochastic ("best of both worlds") high-probability regret guarantees.
no code implementations • 24 Jun 2022 • Aldo Pacchiano, Ofir Nachum, Nilseh Tripuraneni, Peter Bartlett
In contrast with previous work that have studied multi task RL in other function approximation models, we show that in the presence of bilinear optimization oracle and finite state action spaces there exists a computationally efficient algorithm for multitask MatrixRL via a reduction to quadratic programming.
no code implementations • 15 May 2022 • Tianyi Lin, Aldo Pacchiano, Yaodong Yu, Michael I. Jordan
Motivated by applications to online learning in sparse estimation and Bayesian optimization, we consider the problem of online unconstrained nonsubmodular minimization with delayed costs in both full information and bandit feedback settings.
no code implementations • 21 Jan 2022 • Robert Müller, Aldo Pacchiano
We study meta-learning in Markov Decision Processes (MDP) with linear transition models in the undiscounted episodic setting.
no code implementations • NeurIPS 2021 • Aldo Pacchiano, Shaun Singh, Edward Chou, Alexander C. Berg, Jakob Foerster
The lender only observes whether a customer will repay a loan if the loan is issued to begin with, and thus modeled decisions affect what data is available to the lender for future decisions.
no code implementations • 8 Nov 2021 • Aldo Pacchiano, Peter Bartlett, Michael I. Jordan
We study the problem of information sharing and cooperation in Multi-Player Multi-Armed bandits.
no code implementations • 8 Nov 2021 • Aldo Pacchiano, Aadirupa Saha, Jonathan Lee
We consider the problem of preference based reinforcement learning (PbRL), where, unlike traditional reinforcement learning, an agent receives feedback only in terms of a 1 bit (0/1) preference over a trajectory pair instead of absolute rewards for them.
no code implementations • 4 Nov 2021 • Ted Moskovitz, Michael Arbel, Jack Parker-Holder, Aldo Pacchiano
Much of the recent success of deep reinforcement learning has been driven by regularized policy optimization (RPO) algorithms with strong performance across multiple domains.
no code implementations • NeurIPS 2021 • Matteo Papini, Andrea Tirinzoni, Aldo Pacchiano, Marcello Restelli, Alessandro Lazaric, Matteo Pirotta
We study the role of the representation of state-action value functions in regret minimization in finite-horizon Markov Decision Processes (MDPs) with linear structure.
no code implementations • 15 Jun 2021 • Dhruv Malik, Aldo Pacchiano, Vishwak Srinivasan, Yuanzhi Li
Reinforcement learning (RL) is empirically successful in complex nonlinear Markov decision processes (MDPs) with continuous state spaces.
no code implementations • NeurIPS 2021 • Niladri S. Chatterji, Aldo Pacchiano, Peter L. Bartlett, Michael I. Jordan
We study a theory of reinforcement learning (RL) in which the learner receives binary feedback only once at the end of an episode.
no code implementations • 21 May 2021 • Jeffrey Chan, Aldo Pacchiano, Nilesh Tripuraneni, Yun S. Song, Peter Bartlett, Michael I. Jordan
Standard approaches to decision-making under uncertainty focus on sequential exploration of the space of decisions.
no code implementations • NeurIPS 2021 • Aldo Pacchiano, Jonathan Lee, Peter Bartlett, Ofir Nachum
Since its introduction a decade ago, \emph{relative entropy policy search} (REPS) has demonstrated successful policy learning on a number of simulated and real-world robotic domains, not to mention providing algorithmic components used by many recently proposed reinforcement learning (RL) algorithms.
no code implementations • 8 Feb 2021 • Krzysztof Marcin Choromanski, Deepali Jain, Wenhao Yu, Xingyou Song, Jack Parker-Holder, Tingnan Zhang, Valerii Likhosherstov, Aldo Pacchiano, Anirban Santara, Yunhao Tang, Jie Tan, Adrian Weller
There has recently been significant interest in training reinforcement learning (RL) agents in vision-based environments.
2 code implementations • NeurIPS 2021 • Ted Moskovitz, Jack Parker-Holder, Aldo Pacchiano, Michael Arbel, Michael I. Jordan
In recent years, deep off-policy actor-critic algorithms have become a dominant approach to reinforcement learning for continuous control.
1 code implementation • 19 Jan 2021 • Xingyou Song, Krzysztof Choromanski, Jack Parker-Holder, Yunhao Tang, Qiuyi Zhang, Daiyi Peng, Deepali Jain, Wenbo Gao, Aldo Pacchiano, Tamas Sarlos, Yuxiang Yang
In this paper, we approach the problem of optimizing blackbox functions over large hybrid search spaces consisting of both combinatorial and continuous parameters.
no code implementations • 6 Jan 2021 • Silvia Chiappa, Aldo Pacchiano
Whilst optimal transport (OT) is increasingly being recognized as a powerful and flexible approach for dealing with fairness issues, current OT fairness methods are confined to the use of discrete OT.
no code implementations • 24 Dec 2020 • Aldo Pacchiano, Christoph Dann, Claudio Gentile, Peter Bartlett
Finally, unlike recent efforts in model selection for linear stochastic bandits, our approach is versatile enough to also cover cases where the context information is generated by an adversarial environment, rather than a stochastic one.
no code implementations • 19 Nov 2020 • Jonathan N. Lee, Aldo Pacchiano, Vidya Muthukumar, Weihao Kong, Emma Brunskill
Towards this end, we consider the problem of model selection in RL with function approximation, given a set of candidate RL algorithms with known regret guarantees.
no code implementations • NeurIPS 2020 • Jack Parker-Holder, Luke Metz, Cinjon Resnick, Hengyuan Hu, Adam Lerer, Alistair Letcher, Alex Peysakhovich, Aldo Pacchiano, Jakob Foerster
In the era of ever decreasing loss functions, SGD and its various offspring have become the go-to optimization tool in machine learning and are a key component of the success of deep neural networks (DNNs).
no code implementations • ICML 2020 • Jonathan N. Lee, Aldo Pacchiano, Peter Bartlett, Michael. I. Jordan
Maximum a posteriori (MAP) inference in discrete-valued Markov random fields is a fundamental problem in machine learning that involves identifying the most likely configuration of random variables given a distribution.
no code implementations • 21 Jun 2020 • Aldo Pacchiano, Philip J. Ball, Jack Parker-Holder, Krzysztof Choromanski, Stephen Roberts
The principle of optimism in the face of uncertainty is prevalent throughout sequential decision making problems such as multi-armed bandits and reinforcement learning (RL).
no code implementations • 17 Jun 2020 • Aldo Pacchiano, Mohammad Ghavamzadeh, Peter Bartlett, Heinrich Jiang
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB), and prove an $\widetilde{\mathcal{O}}(\frac{d\sqrt{T}}{\tau-c_0})$ bound on its $T$-round regret, where the denominator is the difference between the constraint threshold and the cost of a known feasible action.
no code implementations • ICML Workshop LifelongML 2020 • Robert Müller, Jack Parker-Holder, Aldo Pacchiano
Meta-learning is a paradigm whereby an agent is trained with the specific goal of fast adaptation.
no code implementations • 9 Jun 2020 • Yasin Abbasi-Yadkori, Aldo Pacchiano, My Phan
Given a set of base learning algorithms, an effective model selection strategy adapts to the best learning algorithm in an online fashion.
no code implementations • 8 Jun 2020 • Heinrich Jiang, Qijia Jiang, Aldo Pacchiano
Learning under one-sided feedback (i. e., where we only observe the labels for examples we predicted positively on) is a fundamental problem in machine learning -- applications include lending and recommendation systems.
no code implementations • ICML 2020 • Krzysztof Choromanski, David Cheikhi, Jared Davis, Valerii Likhosherstov, Achille Nazaret, Achraf Bahamou, Xingyou Song, Mrugank Akarte, Jack Parker-Holder, Jacob Bergquist, Yuan Gao, Aldo Pacchiano, Tamas Sarlos, Adrian Weller, Vikas Sindhwani
We present a new class of stochastic, geometrically-driven optimization algorithms on the orthogonal group $O(d)$ and naturally reductive homogeneous manifolds obtained from the action of the rotation group $SO(d)$.
no code implementations • 5 Mar 2020 • Aldo Pacchiano, Heinrich Jiang, Michael. I. Jordan
Mode estimation is a classical problem in statistics with a wide range of applications in machine learning.
no code implementations • NeurIPS 2020 • Aldo Pacchiano, My Phan, Yasin Abbasi-Yadkori, Anup Rao, Julian Zimmert, Tor Lattimore, Csaba Szepesvari
Our methods rely on a novel and generic smoothing transformation for bandit algorithms that permits us to obtain optimal $O(\sqrt{T})$ model selection guarantees for stochastic contextual bandit problems as long as the optimal base algorithm satisfies a high probability regret guarantee.
no code implementations • ICML 2020 • Eric Mazumdar, Aldo Pacchiano, Yi-An Ma, Peter L. Bartlett, Michael. I. Jordan
The resulting approximate Thompson sampling algorithm has logarithmic regret and its computational complexity does not scale with the time horizon of the algorithm.
no code implementations • ICML 2020 • Philip Ball, Jack Parker-Holder, Aldo Pacchiano, Krzysztof Choromanski, Stephen Roberts
Model-Based Reinforcement Learning (MBRL) offers a promising direction for sample efficient learning, often achieving state of the art results for continuous control tasks.
2 code implementations • NeurIPS 2020 • Jack Parker-Holder, Aldo Pacchiano, Krzysztof Choromanski, Stephen Roberts
Exploration is a key problem in reinforcement learning, since agents can only learn from data they acquire in the environment.
no code implementations • 25 Sep 2019 • Xingyou Song, Krzysztof Choromanski, Jack Parker-Holder, Yunhao Tang, Wenbo Gao, Aldo Pacchiano, Tamas Sarlos, Deepali Jain, Yuxiang Yang
We present a neural architecture search algorithm to construct compact reinforcement learning (RL) policies, by combining ENAS and ES in a highly scalable and intuitive way.
no code implementations • 25 Sep 2019 • Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang, Anna Choromanska, Krzysztof Choromanski, Michael I. Jordan
We introduce a new approach for comparing reinforcement learning policies, using Wasserstein distances (WDs) in a newly defined latent behavioral space.
1 code implementation • ICLR 2020 • Xingyou Song, Wenbo Gao, Yuxiang Yang, Krzysztof Choromanski, Aldo Pacchiano, Yunhao Tang
We introduce ES-MAML, a new framework for solving the model agnostic meta learning (MAML) problem based on Evolution Strategies (ES).
1 code implementation • 28 Jul 2019 • Ray Jiang, Aldo Pacchiano, Tom Stepleton, Heinrich Jiang, Silvia Chiappa
We propose an approach to fair classification that enforces independence between the classifier outputs and sensitive information by minimizing Wasserstein-1 distances.
no code implementations • 10 Jul 2019 • Xingyou Song, Krzysztof Choromanski, Jack Parker-Holder, Yunhao Tang, Wenbo Gao, Aldo Pacchiano, Tamas Sarlos, Deepali Jain, Yuxiang Yang
We present a neural architecture search algorithm to construct compact reinforcement learning (RL) policies, by combining ENAS and ES in a highly scalable and intuitive way.
no code implementations • 2 Jul 2019 • Jonathan N. Lee, Aldo Pacchiano, Michael. I. Jordan
Maximum a posteriori (MAP) inference is a fundamental computational paradigm for statistical inference.
1 code implementation • ICML 2020 • Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang, Anna Choromanska, Krzysztof Choromanski, Michael. I. Jordan
We introduce a new approach for comparing reinforcement learning policies, using Wasserstein distances (WDs) in a newly defined latent behavioral space.
no code implementations • 29 May 2019 • Krzysztof Choromanski, Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang
We propose a new class of structured methods for Monte Carlo (MC) sampling, called DPPMC, designed for high-dimensional nonisotropic distributions where samples are correlated to reduce the variance of the estimator via determinantal point processes.
no code implementations • 7 Mar 2019 • Krzysztof Choromanski, Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang, Deepali Jain, Yuxiang Yang, Atil Iscen, Jasmine Hsu, Vikas Sindhwani
Interest in derivative-free optimization (DFO) and "evolutionary strategies" (ES) has recently surged in the Reinforcement Learning (RL) community, with growing evidence that they can match state of the art methods for policy optimization problems in Robotics.
1 code implementation • NeurIPS 2019 • Krzysztof Choromanski, Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang
ASEBO adapts to the geometry of the function and learns optimal sets of sensing directions, which are used to probe it, on-the-fly.
no code implementations • NeurIPS 2018 • Mark Rowland, Krzysztof M. Choromanski, François Chalus, Aldo Pacchiano, Tamas Sarlos, Richard E. Turner, Adrian Weller
Monte Carlo sampling in high-dimensional, low-sample settings is important in many machine learning tasks.
no code implementations • NeurIPS 2018 • Kush Bhatia, Aldo Pacchiano, Nicolas Flammarion, Peter L. Bartlett, Michael. I. Jordan
In this paper, we study the problems of principle Generalized Eigenvector computation and Canonical Correlation Analysis in the stochastic setting.
no code implementations • 20 Nov 2018 • Kush Bhatia, Aldo Pacchiano, Nicolas Flammarion, Peter L. Bartlett, Michael. I. Jordan
In this paper, we study the problems of principal Generalized Eigenvector computation and Canonical Correlation Analysis in the stochastic setting.
no code implementations • 27 Feb 2018 • Aldo Pacchiano, Niladri S. Chatterji, Peter L. Bartlett
We also study the full information setting when the underlying losses are kernel functions and present an adapted exponential weights algorithm and a conditional gradient descent algorithm.
no code implementations • 12 Feb 2018 • Mohammed Amin Abdullah, Aldo Pacchiano, Moez Draief
We describe an application of Wasserstein distance to Reinforcement Learning.
no code implementations • 18 Feb 2015 • Aldo Pacchiano, Oliver Williams
Motivated by the problem of computing investment portfolio weightings we investigate various methods of clustering as alternatives to traditional mean-variance approaches.