Search Results for author: P. R. Kumar

Found 24 papers, 5 papers with code

Anchor-Changing Regularized Natural Policy Gradient for Multi-Objective Reinforcement Learning

2 code implementations10 Jun 2022 Ruida Zhou, Tao Liu, Dileep Kalathil, P. R. Kumar, Chao Tian

We study policy optimization for Markov decision processes (MDPs) with multiple reward value functions, which are to be jointly optimized according to given criteria such as proportional fairness (smooth concave scalarization), hard constraints (constrained MDP), and max-min trade-off.

Fairness Multi-Objective Reinforcement Learning +1

Learning from Few Samples: Transformation-Invariant SVMs with Composition and Locality at Multiple Scales

1 code implementation27 Sep 2021 Tao Liu, P. R. Kumar, Ruida Zhou, Xi Liu

Motivated by the problem of learning with small sample sizes, this paper shows how to incorporate into support-vector machines (SVMs) those properties that have made convolutional neural networks (CNNs) successful.

Terra: Blockage Resilience in Outdoor mmWave Networks

1 code implementation25 Sep 2022 Santosh Ganji, Jaewon Kim, P. R. Kumar

This allows the mobile to maintain time-synchronization with the base station, allowing it to revert to the LoS path when the temporary blockage disappears.

Detect Ground Reflections

Natural Actor-Critic for Robust Reinforcement Learning with Function Approximation

1 code implementation NeurIPS 2023 Ruida Zhou, Tao Liu, Min Cheng, Dileep Kalathil, P. R. Kumar, Chao Tian

We study robust reinforcement learning (RL) with the goal of determining a well-performing policy that is robust against model mismatch between the training simulator and the testing environment.

reinforcement-learning Reinforcement Learning (RL)

Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging

1 code implementation29 Oct 2018 Ping-Chun Hsieh, Xi Liu, Anirban Bhattacharya, P. R. Kumar

Sequential decision making for lifetime maximization is a critical problem in many real-world applications, such as medical treatment and portfolio selection.

Decision Making Multi-Armed Bandits

Throughput Optimal Decentralized Scheduling of Multi-Hop Networks with End-to-End Deadline Constraints: II Wireless Networks with Interference

no code implementations6 Sep 2017 Rahul Singh, P. R. Kumar, Eytan Modiano

The key difference arises due to the fact that in our set-up the packets loose their utility once their "age" has crossed their deadline, thus making the task of optimizing timely throughput much more challenging than that of ensuring network stability.

Scheduling

Belief Space Planning Simplified: Trajectory-Optimized LQG (T-LQG) (Extended Report)

no code implementations10 Aug 2016 Mohammadhussein Rafieisakhaei, Suman Chakravorty, P. R. Kumar

Planning under motion and observation uncertainties requires solution of a stochastic control problem in the space of feedback policies.

Robotics Optimization and Control

Exploration Through Reward Biasing: Reward-Biased Maximum Likelihood Estimation for Stochastic Multi-Armed Bandits

no code implementations2 Jul 2019 Xi Liu, Ping-Chun Hsieh, Anirban Bhattacharya, P. R. Kumar

To choose the bias-growth rate $\alpha(t)$ in RBMLE, we reveal the nontrivial interplay between $\alpha(t)$ and the regret bound that generally applies in both the Exponential Family as well as the sub-Gaussian/Exponential family bandits.

Multi-Armed Bandits

Learning in Networked Control Systems

no code implementations21 Mar 2020 Rahul Singh, P. R. Kumar

We design adaptive controller (learning rule) for a networked control system (NCS) in which data packets containing control information are transmitted across a lossy wireless channel.

Reward-Biased Maximum Likelihood Estimation for Linear Stochastic Bandits

no code implementations8 Oct 2020 Yu-Heng Hung, Ping-Chun Hsieh, Xi Liu, P. R. Kumar

Modifying the reward-biased maximum likelihood method originally proposed in the adaptive control literature, we propose novel learning algorithms to handle the explore-exploit trade-off in linear bandits problems as well as generalized linear bandits problems.

Computational Efficiency

Reward Biased Maximum Likelihood Estimation for Reinforcement Learning

no code implementations16 Nov 2020 Akshay Mete, Rahul Singh, Xi Liu, P. R. Kumar

The Reward-Biased Maximum Likelihood Estimate (RBMLE) for adaptive control of Markov chains was proposed to overcome the central obstacle of what is variously called the fundamental "closed-identifiability problem" of adaptive control, the "dual control problem", or, contemporaneously, the "exploration vs. exploitation problem".

Multi-Armed Bandits reinforcement-learning +2

An Efficient Network Solver for Dynamic Simulation of Power Systems Based on Hierarchical Inverse Computation and Modification

no code implementations22 May 2021 Lu Zhang, Bin Wang, Vivek Sarin, Weiping Shi, P. R. Kumar, Le Xie

In power system dynamic simulation, up to 90% of the computational time is devoted to solve the network equations, i. e., a set of linear equations.

Learning Policies with Zero or Bounded Constraint Violation for Constrained MDPs

no code implementations NeurIPS 2021 Tao Liu, Ruida Zhou, Dileep Kalathil, P. R. Kumar, Chao Tian

We show that when a strictly safe policy is known, then one can confine the system to zero constraint violation with arbitrarily high probability while keeping the reward regret of order $\tilde{\mathcal{O}}(\sqrt{K})$.

Safe Exploration

Policy Optimization for Constrained MDPs with Provable Fast Global Convergence

no code implementations31 Oct 2021 Tao Liu, Ruida Zhou, Dileep Kalathil, P. R. Kumar, Chao Tian

We propose a new algorithm called policy mirror descent-primal dual (PMD-PD) algorithm that can provably achieve a faster $\mathcal{O}(\log(T)/T)$ convergence rate for both the optimality gap and the constraint violation.

Augmented RBMLE-UCB Approach for Adaptive Control of Linear Quadratic Systems

no code implementations25 Jan 2022 Akshay Mete, Rahul Singh, P. R. Kumar

We consider the problem of controlling an unknown stochastic linear system with quadratic costs - called the adaptive LQ control problem.

Thompson Sampling

On an Information and Control Architecture for Future Electric Energy Systems

no code implementations1 Jun 2022 Le Xie, Tong Huang, P. R. Kumar, Anupam A. Thatte, Sanjoy K. Mitter

This paper presents considerations towards an information and control architecture for future electric energy systems driven by massive changes resulting from the societal goals of decarbonization and electrification.

Energy System Digitization in the Era of AI: A Three-Layered Approach towards Carbon Neutrality

no code implementations2 Nov 2022 Le Xie, Tong Huang, Xiangtian Zheng, Yan Liu, Mengdi Wang, Vijay Vittal, P. R. Kumar, Srinivas Shakkottai, Yi Cui

The transition towards carbon-neutral electricity is one of the biggest game changers in addressing climate change since it addresses the dual challenges of removing carbon emissions from the two largest sectors of emitters: electricity and transportation.

Decision Making

TERRA: Beam Management for Outdoor mm-Wave Networks

no code implementations10 Jan 2023 Santosh Ganji, Jaewon Kim, Romil Sonigra, P. R. Kumar

To avoid outage in transient pedestrian blockage of the LoS path, the mobile uses reflected or NLoS path available in indoor environments.

Management

Bounded (O(1)) Regret Recommendation Learning via Synthetic Controls Oracle

no code implementations29 Jan 2023 Enoch Hyunwook Kang, P. R. Kumar

In online exploration systems where users with fixed preferences repeatedly arrive, it has recently been shown that O(1), i. e., bounded regret, can be achieved when the system is modeled as a linear contextual bandit.

Recommendation Systems

Value-Biased Maximum Likelihood Estimation for Model-based Reinforcement Learning in Discounted Linear MDPs

no code implementations17 Oct 2023 Yu-Heng Hung, Ping-Chun Hsieh, Akshay Mete, P. R. Kumar

We consider the infinite-horizon linear Markov Decision Processes (MDPs), where the transition probabilities of the dynamic model can be linearly parameterized with the help of a predefined low-dimensional feature mapping.

Model-based Reinforcement Learning

Provable Policy Gradient Methods for Average-Reward Markov Potential Games

no code implementations9 Mar 2024 Min Cheng, Ruida Zhou, P. R. Kumar, Chao Tian

We prove that both algorithms based on independent policy gradient and independent natural policy gradient converge globally to a Nash equilibrium for the average reward criterion.

Policy Gradient Methods

Cannot find the paper you are looking for? You can Submit a new open access paper.