no code implementations • 1 Mar 2024 • Sayak Ray Chowdhury, Anush Kini, Nagarajan Natarajan
Our experiments on IMDb sentiment generation and Anthropic's helpful-harmless dataset show that rDPO is robust to noise in preference labels compared to vanilla DPO and other heuristics proposed by practitioners.
no code implementations • 16 Feb 2024 • Nirjhar Das, Souradip Chakraborty, Aldo Pacchiano, Sayak Ray Chowdhury
Experimental evaluations on a human preference dataset validate \texttt{APO}'s efficacy as a sample-efficient and practical solution to data collection for RLHF, facilitating alignment of LLMs with human preferences in a cost-effective and scalable manner.
no code implementations • 31 Oct 2023 • Daman Arora, Anush Kini, Sayak Ray Chowdhury, Nagarajan Natarajan, Gaurav Sinha, Amit Sharma
Given a query and a document corpus, the information retrieval (IR) task is to output a ranked list of relevant documents.
no code implementations • 30 Oct 2023 • Sayak Ray Chowdhury, Xingyu Zhou, Nagarajan Natarajan
Within a standard minimax estimation framework, we provide tight upper and lower bounds on the error in estimating $\theta^*$ under both local and central models of DP.
no code implementations • 1 Jun 2023 • Yulian Wu, Xingyu Zhou, Sayak Ray Chowdhury, Di Wang
Under each framework, we consider both joint differential privacy (JDP) and local differential privacy (LDP) models.
no code implementations • 27 Feb 2023 • Xingyu Zhou, Sayak Ray Chowdhury
We first establish privacy and regret guarantees under silo-level local differential privacy, which fix the issues present in state-of-the-art algorithm.
no code implementations • 23 Jul 2022 • Debangshu Banerjee, Avishek Ghosh, Sayak Ray Chowdhury, Aditya Gopalan
Furthermore, while the previous result is shown to hold only in the asymptotic regime (as $n \to \infty$), our result for these "locally rich" action spaces is any-time.
no code implementations • 6 Jul 2022 • Avishek Ghosh, Sayak Ray Chowdhury
We consider model selection for classic Reinforcement Learning (RL) environments -- Multi Armed Bandits (MABs) and Markov Decision Processes (MDPs) -- under general function approximations.
no code implementations • 12 Jun 2022 • Sayak Ray Chowdhury, Xingyu Zhou
This protocol achieves ($\epsilon,\delta$) or approximate-DP guarantee by sacrificing an additional additive $O\!\left(\!\frac{K\log T\sqrt{\log(1/\delta)}}{\epsilon}\!\right)\!$ cost in $T$-step cumulative regret.
no code implementations • 11 Feb 2022 • Sayak Ray Chowdhury, Xingyu Zhou
Prior work largely focus on two trust models of DP: the central model, where a central server is responsible for protecting users sensitive data, and the (stronger) local model, where information needs to be protected directly on user side.
no code implementations • 18 Jan 2022 • Sayak Ray Chowdhury, Patrick Saux, Odalric-Ambrym Maillard, Aditya Gopalan
For the practitioner, we instantiate this novel bound to several classical families, e. g., Gaussian, Bernoulli, Exponential, Weibull, Pareto, Poisson and Chi-square yielding explicit forms of the confidence sets and the Bregman information gain.
1 code implementation • 20 Dec 2021 • Sayak Ray Chowdhury, Xingyu Zhou
We study regret minimization in finite horizon tabular Markov decision processes (MDPs) under the constraints of differential privacy (DP).
no code implementations • 26 Aug 2021 • Sayak Ray Chowdhury, Xingyu Zhou, Ness Shroff
In this paper, we study the problem of regret minimization in reinforcement learning (RL) under differential privacy constraints.
no code implementations • 13 Jul 2021 • Avishek Ghosh, Sayak Ray Chowdhury, Kannan Ramchandran
We propose and analyze a novel algorithm, namely \emph{Adaptive Reinforcement Learning (General)} (\texttt{ARL-GEN}) that adapts to the smallest such family where the true transition kernel $P^*$ lies.
no code implementations • 16 Nov 2020 • Sayak Ray Chowdhury, Rafael Oliveira
We consider the regret minimization problem in reinforcement learning (RL) in the episodic setting.
no code implementations • 20 Aug 2020 • Sayak Ray Chowdhury, Aditya Gopalan
We consider multi-objective optimization (MOO) of an unknown vector-valued function in the non-parametric Bayesian optimization (BO) setting, with the aim being to learn points on the Pareto front of the objectives.
no code implementations • 4 Nov 2019 • Sayak Ray Chowdhury, Aditya Gopalan
We present two algorithms for Bayesian optimization in the batch feedback setting, based on Gaussian process upper confidence bound and Thompson sampling approaches, along with frequentist regret guarantees and numerical results.
no code implementations • 4 Nov 2019 • Sayak Ray Chowdhury, Aditya Gopalan
We develop algorithms with low regret for learning episodic Markov decision processes based on kernel approximation techniques.
1 code implementation • NeurIPS 2019 • Sayak Ray Chowdhury, Aditya Gopalan
We resolve this gap by developing novel Bayesian optimization algorithms, based on kernel approximation techniques, with regret bounds matching the lower bound in order for the SE kernel.
no code implementations • 21 May 2018 • Sayak Ray Chowdhury, Aditya Gopalan
We consider online learning for minimizing regret in unknown, episodic Markov decision processes (MDPs) with continuous states and actions.
no code implementations • 23 Apr 2017 • Avishek Ghosh, Sayak Ray Chowdhury, Aditya Gopalan
Regret guarantees for state-of-the-art linear bandit algorithms such as Optimism in the Face of Uncertainty Linear bandit (OFUL) hold under the assumption that the arms expected rewards are perfectly linear in their features.
no code implementations • ICML 2017 • Sayak Ray Chowdhury, Aditya Gopalan
We consider the stochastic bandit problem with a continuous set of arms, with the expected reward function over the arms assumed to be fixed but unknown.