no code implementations • 14 Apr 2024 • Dipendra Misra, Aldo Pacchiano, Robert E. Schapire

We study interactive learning in a setting where the agent has to generate a response (e. g., an action or trajectory) given a context and an instruction.

no code implementations • 29 Mar 2024 • Yilei Chen, Aldo Pacchiano, Ioannis Ch. Paschalidis

Up to low order and logarithmic terms $\mathrm{CAESAR}$ achieves a sample complexity $\tilde{O}\left(\frac{H^4}{\epsilon^2}\sum_{h=1}^H\max_{k\in[K]}\sum_{s, a}\frac{(d_h^{\pi^k}(s, a))^2}{\mu^*_h(s, a)}\right)$, where $d^{\pi}$ is the visitation distribution of policy $\pi$, $\mu^*$ is the optimal sampling distribution, and $H$ is the horizon.

1 code implementation • 16 Feb 2024 • Nirjhar Das, Souradip Chakraborty, Aldo Pacchiano, Sayak Ray Chowdhury

Reinforcement Learning from Human Feedback (RLHF) is pivotal in aligning Large Language Models (LLMs) with human preferences.

no code implementations • 5 Feb 2024 • Chinmaya Kausik, Mirco Mutti, Aldo Pacchiano, Ambuj Tewari

Both of these can be instrumental in speeding up learning and improving alignment.

no code implementations • 15 Jan 2024 • Aldo Pacchiano, Mohammad Ghavamzadeh, Peter Bartlett

In the setting that the constraint is in expectation, we further specialize our results to multi-armed bandits and propose a computationally efficient algorithm for this setting with regret analysis.

no code implementations • NeurIPS 2023 • Aldo Pacchiano, Jonathan N. Lee, Emma Brunskill

We study the problem of experiment planning with function approximation in contextual bandit problems.

no code implementations • 15 Aug 2023 • Elena Gal, Shaun Singh, Aldo Pacchiano, Ben Walker, Terry Lyons, Jakob Foerster

We introduce adversarial optimism (AdOpt) to directly address bias in the training set using adversarial domain adaptation.

1 code implementation • NeurIPS 2023 • Parnian Kassraie, Nicolas Emmenegger, Andreas Krause, Aldo Pacchiano

This allows us to develop ALEXP, which has an exponentially improved ($\log M$) dependence on $M$ for its regret.

no code implementations • 5 Jun 2023 • Aldo Pacchiano, Christoph Dann, Claudio Gentile

We consider model selection for sequential decision making in stochastic environments with bandit feedback, where a meta-learner has at its disposal a pool of base learners, and decides on the fly which action to take based on the policies recommended by each base learner.

no code implementations • 1 Jun 2023 • Sinong Geng, Aldo Pacchiano, Andrey Kolobov, Ching-An Cheng

We propose Heuristic Blending (HUBL), a simple performance-improving technique for a broad class of offline RL algorithms based on value bootstrapping.

no code implementations • 19 Feb 2023 • Jonathan N. Lee, Weihao Kong, Aldo Pacchiano, Vidya Muthukumar, Emma Brunskill

Whether this is possible for more realistic context distributions has remained an open and important question for tasks such as model selection.

no code implementations • 26 Nov 2022 • Abhi Gupta, Ted Moskovitz, David Alvarez-Melis, Aldo Pacchiano

Transferring knowledge across domains is one of the most fundamental problems in machine learning, but doing so effectively in the context of reinforcement learning remains largely an open problem.

no code implementations • 9 Nov 2022 • Andrew Wagenmaker, Aldo Pacchiano

Practical scenarios often motivate an intermediate setting: if we have some set of offline data and, in addition, may also interact with the environment, how can we best use the offline data to minimize the number of online interactions necessary to learn an $\epsilon$-optimal policy?

no code implementations • 23 Oct 2022 • Yingchen Xu, Jack Parker-Holder, Aldo Pacchiano, Philip J. Ball, Oleh Rybkin, Stephen J. Roberts, Tim Rocktäschel, Edward Grefenstette

We then present CASCADE, a novel approach for self-supervised exploration in this new setting.

no code implementations • 18 Oct 2022 • Abhishek Gupta, Aldo Pacchiano, Yuexiang Zhai, Sham M. Kakade, Sergey Levine

Reinforcement learning provides an automated framework for learning behaviors from high-level reward specifications, but in practice the choice of reward function can be crucial for good results -- while in principle the reward only needs to specify what the task is, in reality practitioners often need to design more detailed rewards that provide the agent with some hints about how the task should be completed.

no code implementations • 26 Jul 2022 • Aldo Pacchiano, Drausin Wulsin, Robert A. Barton, Luis Voloch

The problem of how to genetically modify cells in order to maximize a certain cellular phenotype has taken center stage in drug development over the last few years (with, for example, genetically edited CAR-T, CAR-NK, and CAR-NKT cells entering cancer clinical trials).

no code implementations • 29 Jun 2022 • Aldo Pacchiano, Christoph Dann, Claudio Gentile

We study the problem of model selection in bandit scenarios in the presence of nested policy classes, with the goal of obtaining simultaneous adversarial and stochastic ("best of both worlds") high-probability regret guarantees.

no code implementations • 24 Jun 2022 • Aldo Pacchiano, Ofir Nachum, Nilseh Tripuraneni, Peter Bartlett

In contrast with previous work that have studied multi task RL in other function approximation models, we show that in the presence of bilinear optimization oracle and finite state action spaces there exists a computationally efficient algorithm for multitask MatrixRL via a reduction to quadratic programming.

no code implementations • 15 May 2022 • Tianyi Lin, Aldo Pacchiano, Yaodong Yu, Michael I. Jordan

Motivated by applications to online learning in sparse estimation and Bayesian optimization, we consider the problem of online unconstrained nonsubmodular minimization with delayed costs in both full information and bandit feedback settings.

no code implementations • 21 Jan 2022 • Robert Müller, Aldo Pacchiano

We study meta-learning in Markov Decision Processes (MDP) with linear transition models in the undiscounted episodic setting.

no code implementations • NeurIPS 2021 • Aldo Pacchiano, Shaun Singh, Edward Chou, Alexander C. Berg, Jakob Foerster

The lender only observes whether a customer will repay a loan if the loan is issued to begin with, and thus modeled decisions affect what data is available to the lender for future decisions.

no code implementations • 8 Nov 2021 • Aldo Pacchiano, Aadirupa Saha, Jonathan Lee

We consider the problem of preference based reinforcement learning (PbRL), where, unlike traditional reinforcement learning, an agent receives feedback only in terms of a 1 bit (0/1) preference over a trajectory pair instead of absolute rewards for them.

no code implementations • 8 Nov 2021 • Aldo Pacchiano, Peter Bartlett, Michael I. Jordan

We study the problem of information sharing and cooperation in Multi-Player Multi-Armed bandits.

no code implementations • 4 Nov 2021 • Ted Moskovitz, Michael Arbel, Jack Parker-Holder, Aldo Pacchiano

Much of the recent success of deep reinforcement learning has been driven by regularized policy optimization (RPO) algorithms with strong performance across multiple domains.

no code implementations • NeurIPS 2021 • Matteo Papini, Andrea Tirinzoni, Aldo Pacchiano, Marcello Restelli, Alessandro Lazaric, Matteo Pirotta

We study the role of the representation of state-action value functions in regret minimization in finite-horizon Markov Decision Processes (MDPs) with linear structure.

no code implementations • 15 Jun 2021 • Dhruv Malik, Aldo Pacchiano, Vishwak Srinivasan, Yuanzhi Li

Reinforcement learning (RL) is empirically successful in complex nonlinear Markov decision processes (MDPs) with continuous state spaces.

no code implementations • NeurIPS 2021 • Niladri S. Chatterji, Aldo Pacchiano, Peter L. Bartlett, Michael I. Jordan

We study a theory of reinforcement learning (RL) in which the learner receives binary feedback only once at the end of an episode.

no code implementations • 21 May 2021 • Jeffrey Chan, Aldo Pacchiano, Nilesh Tripuraneni, Yun S. Song, Peter Bartlett, Michael I. Jordan

Standard approaches to decision-making under uncertainty focus on sequential exploration of the space of decisions.

no code implementations • NeurIPS 2021 • Aldo Pacchiano, Jonathan Lee, Peter Bartlett, Ofir Nachum

Since its introduction a decade ago, \emph{relative entropy policy search} (REPS) has demonstrated successful policy learning on a number of simulated and real-world robotic domains, not to mention providing algorithmic components used by many recently proposed reinforcement learning (RL) algorithms.

no code implementations • 8 Feb 2021 • Krzysztof Marcin Choromanski, Deepali Jain, Wenhao Yu, Xingyou Song, Jack Parker-Holder, Tingnan Zhang, Valerii Likhosherstov, Aldo Pacchiano, Anirban Santara, Yunhao Tang, Jie Tan, Adrian Weller

There has recently been significant interest in training reinforcement learning (RL) agents in vision-based environments.

2 code implementations • NeurIPS 2021 • Ted Moskovitz, Jack Parker-Holder, Aldo Pacchiano, Michael Arbel, Michael I. Jordan

In recent years, deep off-policy actor-critic algorithms have become a dominant approach to reinforcement learning for continuous control.

2 code implementations • 19 Jan 2021 • Xingyou Song, Krzysztof Choromanski, Jack Parker-Holder, Yunhao Tang, Qiuyi Zhang, Daiyi Peng, Deepali Jain, Wenbo Gao, Aldo Pacchiano, Tamas Sarlos, Yuxiang Yang

In this paper, we approach the problem of optimizing blackbox functions over large hybrid search spaces consisting of both combinatorial and continuous parameters.

no code implementations • 6 Jan 2021 • Silvia Chiappa, Aldo Pacchiano

Whilst optimal transport (OT) is increasingly being recognized as a powerful and flexible approach for dealing with fairness issues, current OT fairness methods are confined to the use of discrete OT.

no code implementations • 24 Dec 2020 • Aldo Pacchiano, Christoph Dann, Claudio Gentile, Peter Bartlett

Finally, unlike recent efforts in model selection for linear stochastic bandits, our approach is versatile enough to also cover cases where the context information is generated by an adversarial environment, rather than a stochastic one.

no code implementations • 19 Nov 2020 • Jonathan N. Lee, Aldo Pacchiano, Vidya Muthukumar, Weihao Kong, Emma Brunskill

Towards this end, we consider the problem of model selection in RL with function approximation, given a set of candidate RL algorithms with known regret guarantees.

no code implementations • NeurIPS 2020 • Jack Parker-Holder, Luke Metz, Cinjon Resnick, Hengyuan Hu, Adam Lerer, Alistair Letcher, Alex Peysakhovich, Aldo Pacchiano, Jakob Foerster

In the era of ever decreasing loss functions, SGD and its various offspring have become the go-to optimization tool in machine learning and are a key component of the success of deep neural networks (DNNs).

no code implementations • ICML 2020 • Jonathan N. Lee, Aldo Pacchiano, Peter Bartlett, Michael. I. Jordan

Maximum a posteriori (MAP) inference in discrete-valued Markov random fields is a fundamental problem in machine learning that involves identifying the most likely configuration of random variables given a distribution.

no code implementations • 21 Jun 2020 • Aldo Pacchiano, Philip J. Ball, Jack Parker-Holder, Krzysztof Choromanski, Stephen Roberts

The principle of optimism in the face of uncertainty is prevalent throughout sequential decision making problems such as multi-armed bandits and reinforcement learning (RL).

no code implementations • 17 Jun 2020 • Aldo Pacchiano, Mohammad Ghavamzadeh, Peter Bartlett, Heinrich Jiang

We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB), and prove an $\widetilde{\mathcal{O}}(\frac{d\sqrt{T}}{\tau-c_0})$ bound on its $T$-round regret, where the denominator is the difference between the constraint threshold and the cost of a known feasible action.

no code implementations • ICML Workshop LifelongML 2020 • Robert Müller, Jack Parker-Holder, Aldo Pacchiano

Meta-learning is a paradigm whereby an agent is trained with the specific goal of fast adaptation.

no code implementations • 9 Jun 2020 • Yasin Abbasi-Yadkori, Aldo Pacchiano, My Phan

Given a set of base learning algorithms, an effective model selection strategy adapts to the best learning algorithm in an online fashion.

no code implementations • 8 Jun 2020 • Heinrich Jiang, Qijia Jiang, Aldo Pacchiano

Learning under one-sided feedback (i. e., where we only observe the labels for examples we predicted positively on) is a fundamental problem in machine learning -- applications include lending and recommendation systems.

no code implementations • ICML 2020 • Krzysztof Choromanski, David Cheikhi, Jared Davis, Valerii Likhosherstov, Achille Nazaret, Achraf Bahamou, Xingyou Song, Mrugank Akarte, Jack Parker-Holder, Jacob Bergquist, Yuan Gao, Aldo Pacchiano, Tamas Sarlos, Adrian Weller, Vikas Sindhwani

We present a new class of stochastic, geometrically-driven optimization algorithms on the orthogonal group $O(d)$ and naturally reductive homogeneous manifolds obtained from the action of the rotation group $SO(d)$.

no code implementations • 5 Mar 2020 • Aldo Pacchiano, Heinrich Jiang, Michael. I. Jordan

Mode estimation is a classical problem in statistics with a wide range of applications in machine learning.

no code implementations • NeurIPS 2020 • Aldo Pacchiano, My Phan, Yasin Abbasi-Yadkori, Anup Rao, Julian Zimmert, Tor Lattimore, Csaba Szepesvari

Our methods rely on a novel and generic smoothing transformation for bandit algorithms that permits us to obtain optimal $O(\sqrt{T})$ model selection guarantees for stochastic contextual bandit problems as long as the optimal base algorithm satisfies a high probability regret guarantee.

no code implementations • ICML 2020 • Eric Mazumdar, Aldo Pacchiano, Yi-An Ma, Peter L. Bartlett, Michael. I. Jordan

The resulting approximate Thompson sampling algorithm has logarithmic regret and its computational complexity does not scale with the time horizon of the algorithm.

no code implementations • ICML 2020 • Philip Ball, Jack Parker-Holder, Aldo Pacchiano, Krzysztof Choromanski, Stephen Roberts

Model-Based Reinforcement Learning (MBRL) offers a promising direction for sample efficient learning, often achieving state of the art results for continuous control tasks.

2 code implementations • NeurIPS 2020 • Jack Parker-Holder, Aldo Pacchiano, Krzysztof Choromanski, Stephen Roberts

Exploration is a key problem in reinforcement learning, since agents can only learn from data they acquire in the environment.

1 code implementation • ICLR 2020 • Xingyou Song, Wenbo Gao, Yuxiang Yang, Krzysztof Choromanski, Aldo Pacchiano, Yunhao Tang

We introduce ES-MAML, a new framework for solving the model agnostic meta learning (MAML) problem based on Evolution Strategies (ES).

no code implementations • 25 Sep 2019 • Xingyou Song, Krzysztof Choromanski, Jack Parker-Holder, Yunhao Tang, Wenbo Gao, Aldo Pacchiano, Tamas Sarlos, Deepali Jain, Yuxiang Yang

We present a neural architecture search algorithm to construct compact reinforcement learning (RL) policies, by combining ENAS and ES in a highly scalable and intuitive way.

no code implementations • 25 Sep 2019 • Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang, Anna Choromanska, Krzysztof Choromanski, Michael I. Jordan

We introduce a new approach for comparing reinforcement learning policies, using Wasserstein distances (WDs) in a newly defined latent behavioral space.

1 code implementation • 28 Jul 2019 • Ray Jiang, Aldo Pacchiano, Tom Stepleton, Heinrich Jiang, Silvia Chiappa

We propose an approach to fair classification that enforces independence between the classifier outputs and sensitive information by minimizing Wasserstein-1 distances.

no code implementations • 10 Jul 2019 • Xingyou Song, Krzysztof Choromanski, Jack Parker-Holder, Yunhao Tang, Wenbo Gao, Aldo Pacchiano, Tamas Sarlos, Deepali Jain, Yuxiang Yang

We present a neural architecture search algorithm to construct compact reinforcement learning (RL) policies, by combining ENAS and ES in a highly scalable and intuitive way.

no code implementations • 2 Jul 2019 • Jonathan N. Lee, Aldo Pacchiano, Michael. I. Jordan

Maximum a posteriori (MAP) inference is a fundamental computational paradigm for statistical inference.

1 code implementation • ICML 2020 • Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang, Anna Choromanska, Krzysztof Choromanski, Michael. I. Jordan

We introduce a new approach for comparing reinforcement learning policies, using Wasserstein distances (WDs) in a newly defined latent behavioral space.

no code implementations • 29 May 2019 • Krzysztof Choromanski, Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang

We propose a new class of structured methods for Monte Carlo (MC) sampling, called DPPMC, designed for high-dimensional nonisotropic distributions where samples are correlated to reduce the variance of the estimator via determinantal point processes.

1 code implementation • NeurIPS 2019 • Krzysztof Choromanski, Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang

ASEBO adapts to the geometry of the function and learns optimal sets of sensing directions, which are used to probe it, on-the-fly.

no code implementations • 7 Mar 2019 • Krzysztof Choromanski, Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang, Deepali Jain, Yuxiang Yang, Atil Iscen, Jasmine Hsu, Vikas Sindhwani

Interest in derivative-free optimization (DFO) and "evolutionary strategies" (ES) has recently surged in the Reinforcement Learning (RL) community, with growing evidence that they can match state of the art methods for policy optimization problems in Robotics.

no code implementations • NeurIPS 2018 • Kush Bhatia, Aldo Pacchiano, Nicolas Flammarion, Peter L. Bartlett, Michael. I. Jordan

In this paper, we study the problems of principle Generalized Eigenvector computation and Canonical Correlation Analysis in the stochastic setting.

no code implementations • NeurIPS 2018 • Mark Rowland, Krzysztof M. Choromanski, François Chalus, Aldo Pacchiano, Tamas Sarlos, Richard E. Turner, Adrian Weller

Monte Carlo sampling in high-dimensional, low-sample settings is important in many machine learning tasks.

no code implementations • 20 Nov 2018 • Kush Bhatia, Aldo Pacchiano, Nicolas Flammarion, Peter L. Bartlett, Michael. I. Jordan

In this paper, we study the problems of principal Generalized Eigenvector computation and Canonical Correlation Analysis in the stochastic setting.

no code implementations • 27 Feb 2018 • Aldo Pacchiano, Niladri S. Chatterji, Peter L. Bartlett

We also study the full information setting when the underlying losses are kernel functions and present an adapted exponential weights algorithm and a conditional gradient descent algorithm.

no code implementations • 12 Feb 2018 • Mohammed Amin Abdullah, Aldo Pacchiano, Moez Draief

We describe an application of Wasserstein distance to Reinforcement Learning.

no code implementations • 18 Feb 2015 • Aldo Pacchiano, Oliver Williams

Motivated by the problem of computing investment portfolio weightings we investigate various methods of clustering as alternatives to traditional mean-variance approaches.

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.