Search Results for author: Sayak Ray Chowdhury

Found 22 papers, 2 papers with code

Provably Robust DPO: Aligning Language Models with Noisy Feedback

no code implementations • 1 Mar 2024 • Sayak Ray Chowdhury, Anush Kini, Nagarajan Natarajan

Our experiments on IMDb sentiment generation and Anthropic's helpful-harmless dataset show that rDPO is robust to noise in preference labels compared to vanilla DPO and other heuristics proposed by practitioners.

Paper
Add Code

Provably Sample Efficient RLHF via Active Preference Optimization

no code implementations • 16 Feb 2024 • Nirjhar Das, Souradip Chakraborty, Aldo Pacchiano, Sayak Ray Chowdhury

Experimental evaluations on a human preference dataset validate \texttt{APO}'s efficacy as a sample-efficient and practical solution to data collection for RLHF, facilitating alignment of LLMs with human preferences in a cost-effective and scalable manner.

Paper
Add Code

GAR-meets-RAG Paradigm for Zero-Shot Information Retrieval

no code implementations • 31 Oct 2023 • Daman Arora, Anush Kini, Sayak Ray Chowdhury, Nagarajan Natarajan, Gaurav Sinha, Amit Sharma

Given a query and a document corpus, the information retrieval (IR) task is to output a ranked list of relevant documents.

Passage Retrieval Re-Ranking +1

Paper
Add Code

Differentially Private Reward Estimation with Preference Feedback

no code implementations • 30 Oct 2023 • Sayak Ray Chowdhury, Xingyu Zhou, Nagarajan Natarajan

Within a standard minimax estimation framework, we provide tight upper and lower bounds on the error in estimating $\theta^*$ under both local and central models of DP.

Adversarial Attack

Paper
Add Code

Differentially Private Episodic Reinforcement Learning with Heavy-tailed Rewards

no code implementations • 1 Jun 2023 • Yulian Wu, Xingyu Zhou, Sayak Ray Chowdhury, Di Wang

Under each framework, we consider both joint differential privacy (JDP) and local differential privacy (LDP) models.

Multi-Armed Bandits reinforcement-learning

Paper
Add Code

On Differentially Private Federated Linear Contextual Bandits

no code implementations • 27 Feb 2023 • Xingyu Zhou, Sayak Ray Chowdhury

We first establish privacy and regret guarantees under silo-level local differential privacy, which fix the issues present in state-of-the-art algorithm.

Multi-Armed Bandits

Paper
Add Code

Exploration in Linear Bandits with Rich Action Sets and its Implications for Inference

no code implementations • 23 Jul 2022 • Debangshu Banerjee, Avishek Ghosh, Sayak Ray Chowdhury, Aditya Gopalan

Furthermore, while the previous result is shown to hold only in the asymptotic regime (as $n \to \infty$), our result for these "locally rich" action spaces is any-time.

Clustering Model Selection

Paper
Add Code

Model Selection in Reinforcement Learning with General Function Approximations

no code implementations • 6 Jul 2022 • Avishek Ghosh, Sayak Ray Chowdhury

We consider model selection for classic Reinforcement Learning (RL) environments -- Multi Armed Bandits (MABs) and Markov Decision Processes (MDPs) -- under general function approximations.

Model Selection Multi-Armed Bandits +2

Paper
Add Code

Distributed Differential Privacy in Multi-Armed Bandits

no code implementations • 12 Jun 2022 • Sayak Ray Chowdhury, Xingyu Zhou

This protocol achieves ($\epsilon,\delta$) or approximate-DP guarantee by sacrificing an additional additive $O\!\left(\!\frac{K\log T\sqrt{\log(1/\delta)}}{\epsilon}\!\right)\!$ cost in $T$-step cumulative regret.

Multi-Armed Bandits

Paper
Add Code

Shuffle Private Linear Contextual Bandits

no code implementations • 11 Feb 2022 • Sayak Ray Chowdhury, Xingyu Zhou

Prior work largely focus on two trust models of DP: the central model, where a central server is responsible for protecting users sensitive data, and the (stronger) local model, where information needs to be protected directly on user side.

Multi-Armed Bandits

Paper
Add Code

Bregman Deviations of Generic Exponential Families

no code implementations • 18 Jan 2022 • Sayak Ray Chowdhury, Patrick Saux, Odalric-Ambrym Maillard, Aditya Gopalan

For the practitioner, we instantiate this novel bound to several classical families, e. g., Gaussian, Bernoulli, Exponential, Weibull, Pareto, Poisson and Chi-square yielding explicit forms of the confidence sets and the Bregman information gain.

Paper
Add Code

Differentially Private Regret Minimization in Episodic Markov Decision Processes

1 code implementation • 20 Dec 2021 • Sayak Ray Chowdhury, Xingyu Zhou

We study regret minimization in finite horizon tabular Markov decision processes (MDPs) under the constraints of differential privacy (DP).

Decision Making Reinforcement Learning (RL)

Paper
Code

Adaptive Control of Differentially Private Linear Quadratic Systems

no code implementations • 26 Aug 2021 • Sayak Ray Chowdhury, Xingyu Zhou, Ness Shroff

In this paper, we study the problem of regret minimization in reinforcement learning (RL) under differential privacy constraints.

Reinforcement Learning (RL)

Paper
Add Code

Model Selection for Generic Reinforcement Learning

no code implementations • 13 Jul 2021 • Avishek Ghosh, Sayak Ray Chowdhury, Kannan Ramchandran

We propose and analyze a novel algorithm, namely \emph{Adaptive Reinforcement Learning (General)} (\texttt{ARL-GEN}) that adapts to the smallest such family where the true transition kernel $P^*$ lies.

Model Selection reinforcement-learning +1

Paper
Add Code

Value Function Approximations via Kernel Embeddings for No-Regret Reinforcement Learning

no code implementations • 16 Nov 2020 • Sayak Ray Chowdhury, Rafael Oliveira

We consider the regret minimization problem in reinforcement learning (RL) in the episodic setting.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

No-regret Algorithms for Multi-task Bayesian Optimization

no code implementations • 20 Aug 2020 • Sayak Ray Chowdhury, Aditya Gopalan

We consider multi-objective optimization (MOO) of an unknown vector-valued function in the non-parametric Bayesian optimization (BO) setting, with the aim being to learn points on the Pareto front of the objectives.

Bayesian Optimization

Paper
Add Code

On Batch Bayesian Optimization

no code implementations • 4 Nov 2019 • Sayak Ray Chowdhury, Aditya Gopalan

We present two algorithms for Bayesian optimization in the batch feedback setting, based on Gaussian process upper confidence bound and Thompson sampling approaches, along with frequentist regret guarantees and numerical results.

Bayesian Optimization Thompson Sampling

Paper
Add Code

On Online Learning in Kernelized Markov Decision Processes

no code implementations • 4 Nov 2019 • Sayak Ray Chowdhury, Aditya Gopalan

We develop algorithms with low regret for learning episodic Markov decision processes based on kernel approximation techniques.

Thompson Sampling

Paper
Add Code

Bayesian Optimization under Heavy-tailed Payoffs

1 code implementation • NeurIPS 2019 • Sayak Ray Chowdhury, Aditya Gopalan

We resolve this gap by developing novel Bayesian optimization algorithms, based on kernel approximation techniques, with regret bounds matching the lower bound in order for the SE kernel.

Bayesian Optimization

Paper
Code

Online Learning in Kernelized Markov Decision Processes

no code implementations • 21 May 2018 • Sayak Ray Chowdhury, Aditya Gopalan

We consider online learning for minimizing regret in unknown, episodic Markov decision processes (MDPs) with continuous states and actions.

Paper
Add Code

Misspecified Linear Bandits

no code implementations • 23 Apr 2017 • Avishek Ghosh, Sayak Ray Chowdhury, Aditya Gopalan

Regret guarantees for state-of-the-art linear bandit algorithms such as Optimism in the Face of Uncertainty Linear bandit (OFUL) hold under the assumption that the arms expected rewards are perfectly linear in their features.

Learning-To-Rank

Paper
Add Code

On Kernelized Multi-armed Bandits

no code implementations • ICML 2017 • Sayak Ray Chowdhury, Aditya Gopalan

We consider the stochastic bandit problem with a continuous set of arms, with the expected reward function over the arms assumed to be fixed but unknown.

Multi-Armed Bandits

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.