Search Results for author: Alec Koppel

Found 53 papers, 3 papers with code

GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment

no code implementations10 Oct 2024 Yuancheng Xu, Udari Madhushani Sehwag, Alec Koppel, Sicheng Zhu, Bang An, Furong Huang, Sumitra Ganesh

Traditional training-time methods finetune LLMs using human preference datasets but incur significant training costs and require repeated training to handle diverse user preferences.

Text Generation

SAIL: Self-Improving Efficient Online Alignment of Large Language Models

no code implementations21 Jun 2024 Mucong Ding, Souradip Chakraborty, Vibhu Agrawal, Zora Che, Alec Koppel, Mengdi Wang, Amrit Bedi, Furong Huang

Reinforcement Learning from Human Feedback (RLHF) is a key method for aligning large language models (LLMs) with human preferences.

Bilevel Optimization

Compressed Online Learning of Conditional Mean Embedding

no code implementations13 May 2024 Boya Hou, Sina Sanjari, Alec Koppel, Subhonmesh Bose

The conditional mean embedding (CME) encodes Markovian stochastic kernels through their actions on probability distributions embedded within the reproducing kernel Hilbert spaces (RKHS).

Operator learning

Towards Global Optimality for Practical Average Reward Reinforcement Learning without Mixing Time Oracles

no code implementations18 Mar 2024 Bhrij Patel, Wesley A. Suttle, Alec Koppel, Vaneet Aggarwal, Brian M. Sadler, Amrit Singh Bedi, Dinesh Manocha

In the context of average-reward reinforcement learning, the requirement for oracle knowledge of the mixing time, a measure of the duration a Markov chain under a fixed policy needs to achieve its stationary distribution, poses a significant challenge for the global convergence of policy gradient methods.

Policy Gradient Methods

Independent RL for Cooperative-Competitive Agents: A Mean-Field Perspective

no code implementations17 Mar 2024 Muhammad Aneeq uz Zaman, Alec Koppel, Mathieu Laurière, Tamer Başar

This MFTG NE is then shown to be $\mathcal{O}(1/M)$-NE for the finite population game where $M$ is a lower bound on the number of agents in each team.

Problem Decomposition Reinforcement Learning (RL)

Beyond Joint Demonstrations: Personalized Expert Guidance for Efficient Multi-Agent Reinforcement Learning

no code implementations13 Mar 2024 Peihong Yu, Manav Mishra, Alec Koppel, Carl Busart, Priya Narayan, Dinesh Manocha, Amrit Bedi, Pratap Tokekar

Multi-Agent Reinforcement Learning (MARL) algorithms face the challenge of efficient exploration due to the exponential increase in the size of the joint state-action space.

Efficient Exploration Multi-agent Reinforcement Learning +1

MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with Diverse Human Preferences

no code implementations14 Feb 2024 Souradip Chakraborty, Jiahao Qiu, Hui Yuan, Alec Koppel, Furong Huang, Dinesh Manocha, Amrit Singh Bedi, Mengdi Wang

Reinforcement Learning from Human Feedback (RLHF) aligns language models to human preferences by employing a singular reward model derived from preference data.

Diversity Fairness +1

Learning Payment-Free Resource Allocation Mechanisms

no code implementations18 Nov 2023 Sihan Zeng, Sujay Bhatt, Eleonora Kreacic, Parisa Hassanzadeh, Alec Koppel, Sumitra Ganesh

We consider the design of mechanisms that allocate limited resources among self-interested agents using neural networks.

Fairness

Byzantine-Resilient Decentralized Multi-Armed Bandits

no code implementations11 Oct 2023 Jingxuan Zhu, Alec Koppel, Alvaro Velasquez, Ji Liu

In decentralized cooperative multi-armed bandits (MAB), each agent observes a distinct stream of rewards, and seeks to exchange information with others to select a sequence of arms so as to minimize its regret.

Multi-Armed Bandits Recommendation Systems

PARL: A Unified Framework for Policy Alignment in Reinforcement Learning from Human Feedback

no code implementations3 Aug 2023 Souradip Chakraborty, Amrit Singh Bedi, Alec Koppel, Dinesh Manocha, Huazheng Wang, Mengdi Wang, Furong Huang

We present a novel unified bilevel optimization-based framework, \textsf{PARL}, formulated to address the recently highlighted critical issue of policy alignment in reinforcement learning using utility or preference-based feedback.

Bilevel Optimization Procedure Learning +2

Limited-Memory Greedy Quasi-Newton Method with Non-asymptotic Superlinear Convergence Rate

no code implementations27 Jun 2023 Zhan Gao, Aryan Mokhtari, Alec Koppel

Interestingly, our established non-asymptotic superlinear convergence rate demonstrates an explicit trade-off between the convergence speed and memory requirement, which to our knowledge, is the first of its kind.

A Gradient-based Approach for Online Robust Deep Neural Network Training with Noisy Labels

no code implementations8 Jun 2023 Yifan Yang, Alec Koppel, Zheng Zhang

In this paper, we propose a novel gradient-based approach to enable the detection of noisy labels for the online learning of model parameters, named Online Gradient-based Robust Selection (OGRS).

Learning with noisy labels

Sharpened Lazy Incremental Quasi-Newton Method

1 code implementation26 May 2023 Aakash Lahoti, Spandan Senapati, Ketan Rajawat, Alec Koppel

Specifically, they exhibit a superlinear rate with $O(d^2)$ cost in contrast to the linear rate of first-order methods with $O(d)$ cost and the quadratic rate of second-order methods with $O(d^3)$ cost.

Second-order methods

Scalable Multi-Agent Reinforcement Learning with General Utilities

no code implementations15 Feb 2023 Donghao Ying, Yuhao Ding, Alec Koppel, Javad Lavaei

The objective is to find a localized policy that maximizes the average of the team's local utility functions without the full observability of each agent in the team.

Multi-agent Reinforcement Learning reinforcement-learning +2

Beyond Exponentially Fast Mixing in Average-Reward Reinforcement Learning via Multi-Level Monte Carlo Actor-Critic

no code implementations28 Jan 2023 Wesley A. Suttle, Amrit Singh Bedi, Bhrij Patel, Brian M. Sadler, Alec Koppel, Dinesh Manocha

Many existing reinforcement learning (RL) methods employ stochastic gradient iteration on the back end, whose stability hinges upon a hypothesis that the data-generating process mixes exponentially fast with a rate parameter that appears in the step-size selection.

Reinforcement Learning (RL)

Oracle-free Reinforcement Learning in Mean-Field Games along a Single Sample Path

no code implementations24 Aug 2022 Muhammad Aneeq uz Zaman, Alec Koppel, Sujay Bhatt, Tamer Başar

Given that the underlying Markov Decision Process (MDP) of the agent is communicating, we provide finite sample convergence guarantees in terms of convergence of the mean-field and control policy to the mean-field equilibrium.

reinforcement-learning Reinforcement Learning (RL)

FedBC: Calibrating Global and Local Models via Federated Learning Beyond Consensus

no code implementations22 Jun 2022 Amrit Singh Bedi, Chen Fan, Alec Koppel, Anit Kumar Sahu, Brian M. Sadler, Furong Huang, Dinesh Manocha

In this work, we quantitatively calibrate the performance of global and local models in federated learning through a multi-criterion optimization-based framework, which we cast as a constrained program.

Federated Learning

Dealing with Sparse Rewards in Continuous Control Robotics via Heavy-Tailed Policies

no code implementations12 Jun 2022 Souradip Chakraborty, Amrit Singh Bedi, Alec Koppel, Pratap Tokekar, Dinesh Manocha

In this paper, we present a novel Heavy-Tailed Stochastic Policy Gradient (HT-PSG) algorithm to deal with the challenges of sparse rewards in continuous control problems.

continuous-control Continuous Control +1

Posterior Coreset Construction with Kernelized Stein Discrepancy for Model-Based Reinforcement Learning

no code implementations2 Jun 2022 Souradip Chakraborty, Amrit Singh Bedi, Alec Koppel, Brian M. Sadler, Furong Huang, Pratap Tokekar, Dinesh Manocha

Model-based approaches to reinforcement learning (MBRL) exhibit favorable performance in practice, but their theoretical guarantees in large spaces are mostly restricted to the setting when transition model is Gaussian or Lipschitz, and demands a posterior estimate whose representational complexity grows unbounded with time.

continuous-control Continuous Control +3

On the Hidden Biases of Policy Mirror Ascent in Continuous Action Spaces

no code implementations28 Jan 2022 Amrit Singh Bedi, Souradip Chakraborty, Anjaly Parayil, Brian Sadler, Pratap Tokekar, Alec Koppel

Doing so incurs a persistent bias that appears in the attenuation rate of the expected policy gradient norm, which is inversely proportional to the radius of the action space.

Occupancy Information Ratio: Infinite-Horizon, Information-Directed, Parameterized Policy Search

no code implementations21 Jan 2022 Wesley A. Suttle, Alec Koppel, Ji Liu

In this work, we propose an information-directed objective for infinite-horizon reinforcement learning (RL), called the occupancy information ratio (OIR), inspired by the information ratio objectives used in previous information-directed sampling schemes for multi-armed bandits and Markov decision processes as well as recent advances in general utility RL.

Multi-Armed Bandits Reinforcement Learning (RL)

Online, Informative MCMC Thinning with Kernelized Stein Discrepancy

1 code implementation18 Jan 2022 Cole Hawkins, Alec Koppel, Zheng Zhang

A fundamental challenge in Bayesian inference is efficient representation of a target distribution.

Bayesian Inference

Convergence Rates of Average-Reward Multi-agent Reinforcement Learning via Randomized Linear Programming

no code implementations22 Oct 2021 Alec Koppel, Amrit Singh Bedi, Bhargav Ganguly, Vaneet Aggarwal

We establish that the sample complexity to obtain near-globally optimal solutions matches tight dependencies on the cardinality of the state and action spaces, and exhibits classical scalings with respect to the network in accordance with multi-agent optimization.

Multi-agent Reinforcement Learning Reinforcement Learning (RL)

Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Primal-Dual Approach

no code implementations13 Sep 2021 Qinbo Bai, Amrit Singh Bedi, Mridul Agarwal, Alec Koppel, Vaneet Aggarwal

To achieve that, we advocate the use of randomized primal-dual approach to solve the CMDP problems and propose a conservative stochastic primal-dual algorithm (CSPDA) which is shown to exhibit $\tilde{\mathcal{O}}\left(1/\epsilon^2\right)$ sample complexity to achieve $\epsilon$-optimal cumulative reward with zero constraint violations.

Decision Making reinforcement-learning +1

Wasserstein-Splitting Gaussian Process Regression for Heterogeneous Online Bayesian Inference

no code implementations26 Jul 2021 Michael E. Kepler, Alec Koppel, Amrit Singh Bedi, Daniel J. Stilwell

Gaussian processes (GPs) are a well-known nonparametric Bayesian inference technique, but they suffer from scalability problems for large sample sizes, and their performance can degrade for non-stationary or spatially heterogeneous data.

Bayesian Inference Gaussian Processes +1

On the Sample Complexity and Metastability of Heavy-tailed Policy Search in Continuous Control

no code implementations15 Jun 2021 Amrit Singh Bedi, Anjaly Parayil, Junyu Zhang, Mengdi Wang, Alec Koppel

To close this gap, we step towards persistent exploration in continuous space through policy parameterizations defined by distributions of heavier tails defined by tail-index parameter alpha, which increases the likelihood of jumping in state space.

continuous-control Continuous Control +1

MARL with General Utilities via Decentralized Shadow Reward Actor-Critic

no code implementations29 May 2021 Junyu Zhang, Amrit Singh Bedi, Mengdi Wang, Alec Koppel

DSAC augments the classic critic step by requiring agents to (i) estimate their local occupancy measure in order to (ii) estimate the derivative of the local utility with respect to their occupancy measure, i. e., the "shadow reward".

Multi-agent Reinforcement Learning

Composable Learning with Sparse Kernel Representations

no code implementations26 Mar 2021 Ekaterina Tolstaya, Ethan Stump, Alec Koppel, Alejandro Ribeiro

We present a reinforcement learning algorithm for learning sparse non-parametric controllers in a Reproducing Kernel Hilbert Space.

Sparse Representations of Positive Functions via First and Second-Order Pseudo-Mirror Descent

no code implementations13 Nov 2020 Abhishek Chakraborty, Ketan Rajawat, Alec Koppel

We consider expected risk minimization problems when the range of the estimator is required to be nonnegative, motivated by the settings of maximum likelihood estimation (MLE) and trajectory optimization.

A Markov Decision Process Approach to Active Meta Learning

no code implementations10 Sep 2020 Bingjia Wang, Alec Koppel, Vikram Krishnamurthy

In supervised learning, we fit a single statistical model to a given data set, assuming that the data is associated with a singular task, which yields well-tuned models for specific use, but does not adapt well to new contexts.

Meta-Learning Scheduling

Variational Policy Gradient Method for Reinforcement Learning with General Utilities

no code implementations NeurIPS 2020 Junyu Zhang, Alec Koppel, Amrit Singh Bedi, Csaba Szepesvari, Mengdi Wang

Analogously to the Policy Gradient Theorem \cite{sutton2000policy} available for RL with cumulative rewards, we derive a new Variational Policy Gradient Theorem for RL with general utilities, which establishes that the parametrized policy gradient may be obtained as the solution of a stochastic saddle point problem involving the Fenchel dual of the utility function.

reinforcement-learning Reinforcement Learning +2

Balancing Rates and Variance via Adaptive Batch-Size for Stochastic Optimization Problems

no code implementations2 Jul 2020 Zhan Gao, Alec Koppel, Alejandro Ribeiro

Stochastic gradient descent is a canonical tool for addressing stochastic optimization problems, and forms the bedrock of modern machine learning and statistics.

Stochastic Optimization

Efficient Large-Scale Gaussian Process Bandits by Believing only Informative Actions

no code implementations L4DC 2020 Amrit Singh Bedi, Dheeraj Peddireddy, Vaneet Aggarwal, Alec Koppel

Experimentally, we observe state of the art accuracy and complexity tradeoffs for GP bandit algorithms on various hyper-parameter tuning tasks, suggesting the merits of managing the complexity of GPs in bandit settings

Bayesian Optimization

Consistent Online Gaussian Process Regression Without the Sample Complexity Bottleneck

no code implementations23 Apr 2020 Alec Koppel, Hrusikesha Pradhan, Ketan Rajawat

Gaussian processes provide a framework for nonlinear nonparametric Bayesian inference widely applicable across science and engineering.

Bayesian Inference Gaussian Processes +1

Collaborative Beamforming Under Localization Errors: A Discrete Optimization Approach

no code implementations27 Mar 2020 Erfaun Noorani, Yagiz Savas, Alec Koppel, John Baras, Ufuk Topcu, Brian M. Sadler

In particular, we formulate a discrete optimization problem to choose only a subset of agents to transmit the message signal so that the variance of the signal-to-noise ratio (SNR) received by the base station is minimized while the expected SNR exceeds a desired threshold.

Regret and Belief Complexity Trade-off in Gaussian Process Bandits via Information Thresholding

no code implementations23 Mar 2020 Amrit Singh Bedi, Dheeraj Peddireddy, Vaneet Aggarwal, Brian M. Sadler, Alec Koppel

Doing so permits us to precisely characterize the trade-off between regret bounds of GP bandit algorithms and complexity of the posterior distributions depending on the compression parameter $\epsilon$ for both discrete and continuous action sets.

Bayesian Optimization Decision Making +1

Cautious Reinforcement Learning via Distributional Risk in the Dual Domain

no code implementations27 Feb 2020 Junyu Zhang, Amrit Singh Bedi, Mengdi Wang, Alec Koppel

To ameliorate this issue, we propose a new definition of risk, which we call caution, as a penalty function added to the dual objective of the linear programming (LP) formulation of reinforcement learning.

reinforcement-learning Reinforcement Learning +1

On the Sample Complexity of Actor-Critic Method for Reinforcement Learning with Function Approximation

no code implementations18 Oct 2019 Harshat Kumar, Alec Koppel, Alejandro Ribeiro

Actor-critic algorithms combine the merits of both approaches by alternating between steps to estimate the value function and policy gradient updates.

reinforcement-learning Reinforcement Learning +1

Optimally Compressed Nonparametric Online Learning

no code implementations25 Sep 2019 Alec Koppel, Amrit Singh Bedi, Ketan Rajawat, Brian M. Sadler

Batch training of machine learning models based on neural networks is now well established, whereas to date streaming methods are largely based on linear models.

Adaptive Kernel Learning in Heterogeneous Networks

no code implementations1 Aug 2019 Hrusikesha Pradhan, Amrit Singh Bedi, Alec Koppel, Ketan Rajawat

We consider learning in decentralized heterogeneous networks: agents seek to minimize a convex functional that aggregates data across the network, while only having access to their local data streams.

Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies

no code implementations19 Jun 2019 Kaiqing Zhang, Alec Koppel, Hao Zhu, Tamer Başar

Under a further strict saddle points assumption, this result establishes convergence to essentially locally-optimal policies of the underlying problem, and thus bridges the gap in existing literature on the convergence of PG methods.

Autonomous Driving Policy Gradient Methods +1

Nonparametric Stochastic Compositional Gradient Descent for Q-Learning in Continuous Markov Decision Problems

1 code implementation19 Apr 2018 Alec Koppel, Ekaterina Tolstaya, Ethan Stump, Alejandro Ribeiro

We consider Markov Decision Problems defined over continuous state and action spaces, where an autonomous agent seeks to learn a map from its states to actions so as to maximize its long-term discounted accumulation of rewards.

Q-Learning Stochastic Optimization

Decentralized Online Learning with Kernels

no code implementations11 Oct 2017 Alec Koppel, Santiago Paternain, Cedric Richard, Alejandro Ribeiro

That is, we establish that with constant step-size selections agents' functions converge to a neighborhood of the globally optimal one while satisfying the consensus constraints as the penalty parameter is increased.

General Classification Multi-class Classification +2

Parsimonious Online Learning with Kernels via Sparse Projections in Function Space

no code implementations13 Dec 2016 Alec Koppel, Garrett Warnell, Ethan Stump, Alejandro Ribeiro

Despite their attractiveness, popular perception is that techniques for nonparametric function approximation do not scale to streaming data due to an intractable growth in the amount of storage they require.

General Classification

Proximity Without Consensus in Online Multi-Agent Optimization

no code implementations17 Jun 2016 Alec Koppel, Brian M. Sadler, Alejandro Ribeiro

To do so, we depart from the canonical decentralized optimization framework where agreement constraints are enforced, and instead formulate a problem where each agent minimizes a global objective while enforcing network proximity constraints.

Multiagent Systems Systems and Control Computation

A Class of Parallel Doubly Stochastic Algorithms for Large-Scale Learning

no code implementations15 Jun 2016 Aryan Mokhtari, Alec Koppel, Alejandro Ribeiro

Algorithms that are parallel in either of these dimensions exist, but RAPSA is the first attempt at a methodology that is parallel in both the selection of blocks and the selection of elements of the training set.

Image Classification

Decentralized Dynamic Discriminative Dictionary Learning

no code implementations3 May 2016 Alec Koppel, Garrett Warnell, Ethan Stump, Alejandro Ribeiro

We consider discriminative dictionary learning in a distributed online setting, where a network of agents aims to learn a common set of dictionary elements of a feature space and model parameters while sequentially receiving observations.

Dictionary Learning

Doubly Random Parallel Stochastic Methods for Large Scale Learning

no code implementations22 Mar 2016 Aryan Mokhtari, Alec Koppel, Alejandro Ribeiro

Algorithms that are parallel in either of these dimensions exist, but RAPSA is the first attempt at a methodology that is parallel in both, the selection of blocks and the selection of elements of the training set.

Cannot find the paper you are looking for? You can Submit a new open access paper.