Search Results for author: Aleksandrs Slivkins

Found 47 papers, 4 papers with code

Can large language models explore in-context?

no code implementations22 Mar 2024 Akshay Krishnamurthy, Keegan Harris, Dylan J. Foster, Cyril Zhang, Aleksandrs Slivkins

We investigate the extent to which contemporary Large Language Models (LLMs) can engage in exploration, a core capability in reinforcement learning and decision making.

Decision Making

Impact of Decentralized Learning on Player Utilities in Stackelberg Games

no code implementations29 Feb 2024 Kate Donahue, Nicole Immorlica, Meena Jagadeesan, Brendan Lucier, Aleksandrs Slivkins

To better understand such cases, we examine the learning dynamics of the two-agent system and the implications for each agent's objective.

Chatbot Recommendation Systems

Incentivized Exploration via Filtered Posterior Sampling

no code implementations20 Feb 2024 Anand Kalvit, Aleksandrs Slivkins, Yonatan Gur

We study "incentivized exploration" (IE) in social learning problems where the principal (a recommendation algorithm) can leverage information asymmetry to incentivize sequentially-arriving agents to take exploratory actions.

Multi-Armed Bandits

Robust and Performance Incentivizing Algorithms for Multi-Armed Bandits with Strategic Agents

no code implementations13 Dec 2023 Seyed A. Esmaeili, Suho Shin, Aleksandrs Slivkins

We identify a class of MAB algorithms which we call performance incentivizing which satisfy a collection of properties and show that they lead to mechanisms that incentivize top level performance at equilibrium and are robust under any strategy profile.

Multi-Armed Bandits

Algorithmic Persuasion Through Simulation

no code implementations29 Nov 2023 Keegan Harris, Nicole Immorlica, Brendan Lucier, Aleksandrs Slivkins

After a fixed number of queries, the sender commits to a messaging policy and the receiver takes the action that maximizes her expected utility given the message she receives.

Oracle-Efficient Pessimism: Offline Policy Optimization in Contextual Bandits

no code implementations13 Jun 2023 Lequn Wang, Akshay Krishnamurthy, Aleksandrs Slivkins

We consider offline policy optimization (OPO) in contextual bandits, where one is given a fixed dataset of logged interactions.

Multi-Armed Bandits

Contextual Bandits with Packing and Covering Constraints: A Modular Lagrangian Approach via Regression

no code implementations14 Nov 2022 Aleksandrs Slivkins, Karthik Abinav Sankararaman, Dylan J. Foster

We consider contextual bandits with linear constraints (CBwLC), a variant of contextual bandits in which the algorithm consumes multiple resources subject to linear constraints on total consumption.

Multi-Armed Bandits regression

Incentivizing Combinatorial Bandit Exploration

no code implementations1 Jun 2022 Xinyan Hu, Dung Daniel Ngo, Aleksandrs Slivkins, Zhiwei Steven Wu

The users are free to choose other actions and need to be incentivized to follow the algorithm's recommendations.

Thompson Sampling

Content Filtering with Inattentive Information Consumers

no code implementations27 May 2022 Ian Ball, James Bono, Justin Grana, Nicole Immorlica, Brendan Lucier, Aleksandrs Slivkins

We develop a model of content filtering as a game between the filter and the content consumer, where the latter incurs information costs for examining the content.

Misinformation Recommendation Systems

Bandits with Knapsacks beyond the Worst Case

no code implementations NeurIPS 2021 Karthik Abinav Sankararaman, Aleksandrs Slivkins

Third, we provide a "generalreduction" from BwK to bandits which takes advantage of some known helpful structure, and apply this reduction to combinatorial semi-bandits, linear contextual bandits, and multinomial-logit bandits.

Multi-Armed Bandits

Exploration and Incentives in Reinforcement Learning

no code implementations28 Feb 2021 Max Simchowitz, Aleksandrs Slivkins

How do you incentivize self-interested agents to $\textit{explore}$ when they prefer to $\textit{exploit}$?

reinforcement-learning Reinforcement Learning (RL)

Competing Bandits: The Perils of Exploration Under Competition

no code implementations20 Jul 2020 Guy Aridor, Yishay Mansour, Aleksandrs Slivkins, Zhiwei Steven Wu

Users arrive one by one and choose between the two firms, so that each firm makes progress on its bandit problem only if it is chosen.

Multi-Armed Bandits

Adaptive Discretization for Adversarial Lipschitz Bandits

no code implementations22 Jun 2020 Chara Podimata, Aleksandrs Slivkins

We provide the first algorithm for adaptive discretization in the adversarial version, and derive instance-dependent regret bounds.

Multi-Armed Bandits

Greedy Algorithm almost Dominates in Smoothed Contextual Bandits

no code implementations19 May 2020 Manish Raghavan, Aleksandrs Slivkins, Jennifer Wortman Vaughan, Zhiwei Steven Wu

Online learning algorithms, widely used to power search and content optimization on the web, must balance exploration and exploitation, potentially sacrificing the experience of current users in order to gain information that will lead to better decisions in the future.

Multi-Armed Bandits

The Price of Incentivizing Exploration: A Characterization via Thompson Sampling and Sample Complexity

no code implementations3 Feb 2020 Mark Sellke, Aleksandrs Slivkins

The performance loss due to incentives is therefore limited to the initial rounds when these data points are collected.

Multi-Armed Bandits Thompson Sampling

Bandits with Knapsacks beyond the Worst-Case

no code implementations1 Feb 2020 Karthik Abinav Sankararaman, Aleksandrs Slivkins

Third, we provide a general "reduction" from BwK to bandits which takes advantage of some known helpful structure, and apply this reduction to combinatorial semi-bandits, linear contextual bandits, and multinomial-logit bandits.

Multi-Armed Bandits

Corruption-robust exploration in episodic reinforcement learning

no code implementations20 Nov 2019 Thodoris Lykouris, Max Simchowitz, Aleksandrs Slivkins, Wen Sun

We initiate the study of multi-stage episodic reinforcement learning under adversarial corruptions in both the rewards and the transition probabilities of the underlying system extending recent results for the special case of stochastic bandits.

Multi-Armed Bandits reinforcement-learning +1

Introduction to Multi-Armed Bandits

1 code implementation15 Apr 2019 Aleksandrs Slivkins

This book provides a more introductory, textbook-like treatment of the subject.

Multi-Armed Bandits

Bayesian Exploration with Heterogeneous Agents

no code implementations19 Feb 2019 Nicole Immorlica, Jieming Mao, Aleksandrs Slivkins, Zhiwei Steven Wu

We consider Bayesian Exploration: a simple model in which the recommendation system (the "principal") controls the information flow to the users (the "agents") and strives to incentivize exploration via information asymmetry.

Recommendation Systems

Adversarial Bandits with Knapsacks

no code implementations28 Nov 2018 Nicole Immorlica, Karthik Abinav Sankararaman, Robert Schapire, Aleksandrs Slivkins

We suggest a new algorithm for the stochastic version, which builds on the framework of regret minimization in repeated games and admits a substantially simpler analysis compared to prior work.

Multi-Armed Bandits Scheduling

The Externalities of Exploration and How Data Diversity Helps Exploitation

no code implementations1 Jun 2018 Manish Raghavan, Aleksandrs Slivkins, Jennifer Wortman Vaughan, Zhiwei Steven Wu

Returning to group-level effects, we show that under the same conditions, negative group externalities essentially vanish under the greedy algorithm.

Multi-Armed Bandits

Combinatorial Semi-Bandits with Knapsacks

no code implementations23 May 2017 Karthik Abinav Sankararaman, Aleksandrs Slivkins

We unify two prominent lines of work on multi-armed bandits: bandits with knapsacks (BwK) and combinatorial semi-bandits.

Multi-Armed Bandits

Competing Bandits: Learning under Competition

no code implementations27 Feb 2017 Yishay Mansour, Aleksandrs Slivkins, Zhiwei Steven Wu

Most modern systems strive to learn from interactions with users, and many engage in exploration: making potentially suboptimal choices for the sake of acquiring new information.

Multidimensional Dynamic Pricing for Welfare Maximization

no code implementations19 Jul 2016 Aaron Roth, Aleksandrs Slivkins, Jonathan Ullman, Zhiwei Steven Wu

We are able to apply this technique to the setting of unit demand buyers despite the fact that in that setting the goods are not divisible, and the natural fractional relaxation of a unit demand valuation is not strongly concave.

Bayesian Exploration: Incentivizing Exploration in Bayesian Games

no code implementations24 Feb 2016 Yishay Mansour, Aleksandrs Slivkins, Vasilis Syrgkanis, Zhiwei Steven Wu

As a key technical tool, we introduce the concept of explorable actions, the actions which some incentive-compatible policy can recommend with non-zero probability.

Contextual Dueling Bandits

no code implementations23 Feb 2015 Miroslav Dudík, Katja Hofmann, Robert E. Schapire, Aleksandrs Slivkins, Masrour Zoghi

The first of these algorithms achieves particularly low regret, even when data is adversarial, although its time and space requirements are linear in the size of the policy space.

Resourceful Contextual Bandits

no code implementations27 Feb 2014 Ashwinkumar Badanidiyuru, John Langford, Aleksandrs Slivkins

We study contextual bandits with ancillary constraints on resources, which are common in real-world applications such as choosing ads or dynamic pricing of items.

Multi-Armed Bandits

Bandits and Experts in Metric Spaces

no code implementations4 Dec 2013 Robert Kleinberg, Aleksandrs Slivkins, Eli Upfal

In this work we study a very general setting for the multi-armed bandit problem in which the strategies form a metric space, and the payoff function satisfies a Lipschitz condition with respect to the metric.

Dynamic Ad Allocation: Bandits with Budgets

no code implementations1 Jun 2013 Aleksandrs Slivkins

We consider an application of multi-armed bandits to internet advertising (specifically, to dynamic ad allocation in the pay-per-click model, with uncertainty on the click probabilities).

Multi-Armed Bandits

Bandits with Knapsacks

no code implementations11 May 2013 Ashwinkumar Badanidiyuru, Robert Kleinberg, Aleksandrs Slivkins

As one example of a concrete application, we consider the problem of dynamic posted pricing with limited supply and obtain the first algorithm whose regret, with respect to the optimal dynamic policy, is sublinear in the supply.

Scheduling

Multi-armed bandits on implicit metric spaces

no code implementations NeurIPS 2011 Aleksandrs Slivkins

For any given problem instance such a classification implicitly defines a similarity metric space, but the numerical similarity information is not available to the algorithm.

General Classification Multi-Armed Bandits

Dynamic Pricing with Limited Supply

no code implementations20 Aug 2011 Moshe Babaioff, Shaddin Dughmi, Robert Kleinberg, Aleksandrs Slivkins

The performance guarantee for the same mechanism can be improved to $O(\sqrt{k} \log n)$, with a distribution-dependent constant, if $k/n$ is sufficiently small.

Multi-Armed Bandits

Contextual Bandits with Similarity Information

no code implementations23 Jul 2009 Aleksandrs Slivkins

A particularly simple way to represent similarity information in the contextual bandit setting is via a "similarity distance" between the context-arm pairs which gives an upper bound on the difference between the respective expected payoffs.

Multi-Armed Bandits

Characterizing Truthful Multi-Armed Bandit Mechanisms

no code implementations12 Dec 2008 Moshe Babaioff, Yogeshwer Sharma, Aleksandrs Slivkins

We investigate how the design of multi-armed bandit algorithms is affected by the restriction that the resulting mechanism must be truthful.

Multi-Armed Bandits in Metric Spaces

2 code implementations29 Sep 2008 Robert Kleinberg, Aleksandrs Slivkins, Eli Upfal

In this work we study a very general setting for the multi-armed bandit problem in which the strategies form a metric space, and the payoff function satisfies a Lipschitz condition with respect to the metric.

Multi-Armed Bandits

Cannot find the paper you are looking for? You can Submit a new open access paper.