Search Results for author: Abbas Kazerouni

Found 7 papers, 1 papers with code

Practical Policy Optimization with Personalized Experimentation

no code implementations30 Mar 2023 Mia Garrard, Hanson Wang, Ben Letham, Shaun Singh, Abbas Kazerouni, Sarah Tan, Zehui Wang, Yin Huang, Yichun Hu, Chad Zhou, Norm Zhou, Eytan Bakshy

Many organizations measure treatment effects via an experimentation platform to evaluate the casual effect of product variations prior to full-scale deployment.

Multi-armed Bandits with Cost Subsidy

no code implementations3 Nov 2020 Deeksha Sinha, Karthik Abinav Sankararama, Abbas Kazerouni, Vashist Avadhanula

We then establish a fundamental lower bound on the performance of any online learning algorithm for this problem, highlighting the hardness of our problem in comparison to the classical MAB problem.

Multi-Armed Bandits Thompson Sampling

Active Learning for Skewed Data Sets

no code implementations23 May 2020 Abbas Kazerouni, Qi Zhao, Jing Xie, Sandeep Tata, Marc Najork

Furthermore, there is usually only a small amount of initial training data available when building machine-learned models to solve such problems.

Active Learning

Best Arm Identification in Generalized Linear Bandits

no code implementations20 May 2019 Abbas Kazerouni, Lawrence M. Wein

Motivated by drug design, we consider the best-arm identification problem in generalized linear bandits.

Learning to Price with Reference Effects

no code implementations29 Aug 2017 Abbas Kazerouni, Benjamin Van Roy

As a firm varies the price of a product, consumers exhibit reference effects, making purchase decisions based not only on the prevailing price but also the product's price history.

Thompson Sampling

A Tutorial on Thompson Sampling

2 code implementations7 Jul 2017 Daniel Russo, Benjamin Van Roy, Abbas Kazerouni, Ian Osband, Zheng Wen

Thompson sampling is an algorithm for online decision problems where actions are taken sequentially in a manner that must balance between exploiting what is known to maximize immediate performance and investing to accumulate new information that may improve future performance.

Active Learning Product Recommendation +1

Conservative Contextual Linear Bandits

no code implementations NeurIPS 2017 Abbas Kazerouni, Mohammad Ghavamzadeh, Yasin Abbasi-Yadkori, Benjamin Van Roy

We prove an upper-bound on the regret of CLUCB and show that it can be decomposed into two terms: 1) an upper-bound for the regret of the standard linear UCB algorithm that grows with the time horizon and 2) a constant (does not grow with the time horizon) term that accounts for the loss of being conservative in order to satisfy the safety constraint.

Decision Making Marketing

Cannot find the paper you are looking for? You can Submit a new open access paper.