Search Results for author: Sayak Ray Chowdhury

Found 22 papers, 2 papers with code

Provably Robust DPO: Aligning Language Models with Noisy Feedback

no code implementations1 Mar 2024 Sayak Ray Chowdhury, Anush Kini, Nagarajan Natarajan

Our experiments on IMDb sentiment generation and Anthropic's helpful-harmless dataset show that rDPO is robust to noise in preference labels compared to vanilla DPO and other heuristics proposed by practitioners.

Provably Sample Efficient RLHF via Active Preference Optimization

no code implementations16 Feb 2024 Nirjhar Das, Souradip Chakraborty, Aldo Pacchiano, Sayak Ray Chowdhury

Experimental evaluations on a human preference dataset validate \texttt{APO}'s efficacy as a sample-efficient and practical solution to data collection for RLHF, facilitating alignment of LLMs with human preferences in a cost-effective and scalable manner.

GAR-meets-RAG Paradigm for Zero-Shot Information Retrieval

no code implementations31 Oct 2023 Daman Arora, Anush Kini, Sayak Ray Chowdhury, Nagarajan Natarajan, Gaurav Sinha, Amit Sharma

Given a query and a document corpus, the information retrieval (IR) task is to output a ranked list of relevant documents.

Passage Retrieval Re-Ranking +1

Differentially Private Reward Estimation with Preference Feedback

no code implementations30 Oct 2023 Sayak Ray Chowdhury, Xingyu Zhou, Nagarajan Natarajan

Within a standard minimax estimation framework, we provide tight upper and lower bounds on the error in estimating $\theta^*$ under both local and central models of DP.

Adversarial Attack

Differentially Private Episodic Reinforcement Learning with Heavy-tailed Rewards

no code implementations1 Jun 2023 Yulian Wu, Xingyu Zhou, Sayak Ray Chowdhury, Di Wang

Under each framework, we consider both joint differential privacy (JDP) and local differential privacy (LDP) models.

Multi-Armed Bandits reinforcement-learning

On Differentially Private Federated Linear Contextual Bandits

no code implementations27 Feb 2023 Xingyu Zhou, Sayak Ray Chowdhury

We first establish privacy and regret guarantees under silo-level local differential privacy, which fix the issues present in state-of-the-art algorithm.

Multi-Armed Bandits

Exploration in Linear Bandits with Rich Action Sets and its Implications for Inference

no code implementations23 Jul 2022 Debangshu Banerjee, Avishek Ghosh, Sayak Ray Chowdhury, Aditya Gopalan

Furthermore, while the previous result is shown to hold only in the asymptotic regime (as $n \to \infty$), our result for these "locally rich" action spaces is any-time.

Clustering Model Selection

Model Selection in Reinforcement Learning with General Function Approximations

no code implementations6 Jul 2022 Avishek Ghosh, Sayak Ray Chowdhury

We consider model selection for classic Reinforcement Learning (RL) environments -- Multi Armed Bandits (MABs) and Markov Decision Processes (MDPs) -- under general function approximations.

Model Selection Multi-Armed Bandits +2

Distributed Differential Privacy in Multi-Armed Bandits

no code implementations12 Jun 2022 Sayak Ray Chowdhury, Xingyu Zhou

This protocol achieves ($\epsilon,\delta$) or approximate-DP guarantee by sacrificing an additional additive $O\!\left(\!\frac{K\log T\sqrt{\log(1/\delta)}}{\epsilon}\!\right)\!$ cost in $T$-step cumulative regret.

Multi-Armed Bandits

Shuffle Private Linear Contextual Bandits

no code implementations11 Feb 2022 Sayak Ray Chowdhury, Xingyu Zhou

Prior work largely focus on two trust models of DP: the central model, where a central server is responsible for protecting users sensitive data, and the (stronger) local model, where information needs to be protected directly on user side.

Multi-Armed Bandits

Bregman Deviations of Generic Exponential Families

no code implementations18 Jan 2022 Sayak Ray Chowdhury, Patrick Saux, Odalric-Ambrym Maillard, Aditya Gopalan

For the practitioner, we instantiate this novel bound to several classical families, e. g., Gaussian, Bernoulli, Exponential, Weibull, Pareto, Poisson and Chi-square yielding explicit forms of the confidence sets and the Bregman information gain.

Differentially Private Regret Minimization in Episodic Markov Decision Processes

1 code implementation20 Dec 2021 Sayak Ray Chowdhury, Xingyu Zhou

We study regret minimization in finite horizon tabular Markov decision processes (MDPs) under the constraints of differential privacy (DP).

Decision Making Reinforcement Learning (RL)

Adaptive Control of Differentially Private Linear Quadratic Systems

no code implementations26 Aug 2021 Sayak Ray Chowdhury, Xingyu Zhou, Ness Shroff

In this paper, we study the problem of regret minimization in reinforcement learning (RL) under differential privacy constraints.

Reinforcement Learning (RL)

Model Selection for Generic Reinforcement Learning

no code implementations13 Jul 2021 Avishek Ghosh, Sayak Ray Chowdhury, Kannan Ramchandran

We propose and analyze a novel algorithm, namely \emph{Adaptive Reinforcement Learning (General)} (\texttt{ARL-GEN}) that adapts to the smallest such family where the true transition kernel $P^*$ lies.

Model Selection reinforcement-learning +1

No-regret Algorithms for Multi-task Bayesian Optimization

no code implementations20 Aug 2020 Sayak Ray Chowdhury, Aditya Gopalan

We consider multi-objective optimization (MOO) of an unknown vector-valued function in the non-parametric Bayesian optimization (BO) setting, with the aim being to learn points on the Pareto front of the objectives.

Bayesian Optimization

On Batch Bayesian Optimization

no code implementations4 Nov 2019 Sayak Ray Chowdhury, Aditya Gopalan

We present two algorithms for Bayesian optimization in the batch feedback setting, based on Gaussian process upper confidence bound and Thompson sampling approaches, along with frequentist regret guarantees and numerical results.

Bayesian Optimization Thompson Sampling

On Online Learning in Kernelized Markov Decision Processes

no code implementations4 Nov 2019 Sayak Ray Chowdhury, Aditya Gopalan

We develop algorithms with low regret for learning episodic Markov decision processes based on kernel approximation techniques.

Thompson Sampling

Bayesian Optimization under Heavy-tailed Payoffs

1 code implementation NeurIPS 2019 Sayak Ray Chowdhury, Aditya Gopalan

We resolve this gap by developing novel Bayesian optimization algorithms, based on kernel approximation techniques, with regret bounds matching the lower bound in order for the SE kernel.

Bayesian Optimization

Online Learning in Kernelized Markov Decision Processes

no code implementations21 May 2018 Sayak Ray Chowdhury, Aditya Gopalan

We consider online learning for minimizing regret in unknown, episodic Markov decision processes (MDPs) with continuous states and actions.

Misspecified Linear Bandits

no code implementations23 Apr 2017 Avishek Ghosh, Sayak Ray Chowdhury, Aditya Gopalan

Regret guarantees for state-of-the-art linear bandit algorithms such as Optimism in the Face of Uncertainty Linear bandit (OFUL) hold under the assumption that the arms expected rewards are perfectly linear in their features.

Learning-To-Rank

On Kernelized Multi-armed Bandits

no code implementations ICML 2017 Sayak Ray Chowdhury, Aditya Gopalan

We consider the stochastic bandit problem with a continuous set of arms, with the expected reward function over the arms assumed to be fixed but unknown.

Multi-Armed Bandits

Cannot find the paper you are looking for? You can Submit a new open access paper.