no code implementations • 6 Jun 2022 • Kerem Bozgan, Cem Tekin

We consider the Pareto set identification (PSI) problem in multi-objective multi-armed bandits (MO-MAB) with contaminated reward observations.

no code implementations • 9 May 2022 • Artun Saday, İlker Demirel, Yiğit Yıldırım, Cem Tekin

We analyze the interplay between the algorithm parameters, a discernibility margin, regret, communication cost, and the arms' suboptimality gaps.

no code implementations • 13 Dec 2021 • Ilker Demirel, Mehmet Ufuk Ozdemir, Cem Tekin

In this work, we tackle a different critical task through the lens of \textit{linear stochastic bandits}, where the aim is to keep the actions' outcomes close to a target level while respecting a \textit{two-sided} safety constraint, which we call \textit{leveling}.

no code implementations • 29 Nov 2021 • Sepehr Elahi, Baran Atalar, Sevda Öğüt, Cem Tekin

In federated multi-armed bandit problems, maximizing global reward while satisfying minimum privacy requirements to protect clients is the main goal.

1 code implementation • 26 Nov 2021 • Ilker Demirel, Ahmet Alparslan Celik, Cem Tekin

We propose ESCADA, a novel and generic multi-armed bandit (MAB) algorithm tailored for the leveling task, to make safe, personalized, and context-aware dose recommendations.

no code implementations • 23 Oct 2021 • Çağın Ararat, Cem Tekin

We introduce vector optimization problems with stochastic bandit feedback, in which preferences among designs are encoded by a polyhedral ordering cone $C$.

no code implementations • 5 Oct 2021 • Andi Nika, Sepehr Elahi, Cem Tekin

We consider a contextual bandit problem with a combinatorial action set and time-varying base arm availability.

no code implementations • 8 Sep 2021 • Mahed Abroshan, Kai Hou Yip, Cem Tekin, Mihaela van der Schaar

Secondly, such datasets are usually imperfect, additionally cursed with missing values in the attributes of features.

1 code implementation • 28 Aug 2020 • Andi Nika, Sepehr Elahi, Cem Tekin

We consider contextual combinatorial volatile multi-armed bandit (CCV-MAB), in which at each round, the learner observes a set of available base arms and their contexts, and then, selects a super arm that contains $K$ base arms in order to maximize its cumulative reward.

1 code implementation • 24 Jun 2020 • Andi Nika, Kerem Bozgan, Sepehr Elahi, Çağın Ararat, Cem Tekin

We consider the problem of optimizing a vector-valued objective function $\boldsymbol{f}$ sampled from a Gaussian Process (GP) whose index set is a well-behaved, compact metric space $({\cal X}, d)$ of designs.

no code implementations • 26 Jul 2019 • Alihan Hüyük, Cem Tekin

The algorithm we propose for the second setting also attains bounded regret for the multiarmed bandit with satisficing objectives.

no code implementations • 7 Jul 2019 • Alihan Hüyük, Cem Tekin

Influence maximization, adaptive routing, and dynamic spectrum allocation all require choosing the right action from a large set of alternatives.

no code implementations • 1 Jul 2019 • Eralp Turgay, Cem Bulucu, Cem Tekin

As our learning model, we consider a structured contextual multi-armed bandit (CMAB) with high-dimensional arm (action) and context (data) sets, where the rewards depend only on a few relevant dimensions of the joint context-arm set, possibly in a non-linear way.

no code implementations • NeurIPS 2019 • Xueru Zhang, Mohammad Mahdi Khalili, Cem Tekin, Mingyan Liu

Machine Learning (ML) models trained on data from multiple demographic groups can inherit representation disparity (Hashimoto et al., 2018) that may exist in the data: the model may be less favorable to groups contributing less to the training process; this in turn can degrade population retention in these groups over time, and exacerbate representation disparity in the long run.

no code implementations • 7 Sep 2018 • Alihan Hüyük, Cem Tekin

We analyze the regret of combinatorial Thompson sampling (CTS) for the combinatorial multi-armed bandit with probabilistically triggered arms under the semi-bandit feedback setting.

no code implementations • 11 Mar 2018 • Doruk Öner, Altuğ Karakurt, Atilla Eryilmaz, Cem Tekin

In this paper, we introduce the COmbinatorial Multi-Objective Multi-Armed Bandit (COMO-MAB) problem that captures the challenges of combinatorial and multi-objective online learning simultaneously.

no code implementations • 11 Mar 2018 • Eralp Turğay, Doruk Öner, Cem Tekin

Essentially, the contextual Pareto regret is the sum of the distances of the arms chosen by the learner to the context dependent Pareto front.

no code implementations • 18 Aug 2017 • Cem Tekin, Eralp Turgay

In this case, the optimal arm given a context is the one that maximizes the expected reward in the non-dominant objective among all arms that maximize the expected reward in the dominant objective.

no code implementations • 24 Jul 2017 • A. Ömer Sarıtaç, Cem Tekin

Under the assumption that the arm triggering probabilities (ATPs) are positive for all arms, we prove that a class of upper confidence bound (UCB) policies, named Combinatorial UCB with exploration rate $\kappa$ (CUCB-$\kappa$), and Combinatorial Thompson Sampling (CTS), which estimates the expected states of the arms via Thompson sampling, achieve bounded regret.

no code implementations • 10 May 2017 • Sabrina Klos, Cem Tekin, Mihaela van der Schaar, Anja Klein

In our algorithm, a local controller (LC) in the mobile device of a worker regularly observes the worker's context, her/his decisions to accept or decline tasks and the quality in completing tasks.

no code implementations • 21 May 2016 • Nima Akbarzadeh, Cem Tekin

In the GRBP, the learner proceeds in a sequence of rounds, where each round is a Markov Decision Process (MDP) with two actions (arms): a continuation action that moves the learner randomly over the state space around the current state; and a terminal action that moves the learner directly into one of the two terminal states (goal and dead-end state).

no code implementations • 23 Dec 2015 • Cem Tekin, Jinsung Yoon, Mihaela van der Schaar

Extracting actionable intelligence from distributed, heterogeneous, correlated and high-dimensional data sources requires run-time processing and learning both locally and globally.

no code implementations • 4 Aug 2015 • Cem Tekin, Mihaela van der Schaar

After the $stop$ action is taken, the learner collects a terminal reward, and observes the costs and terminal rewards associated with each step of the episode.

no code implementations • 29 Mar 2015 • Onur Atan, Cem Tekin, Mihaela van der Schaar

In the case in which rewards of all arms are deterministic functions of a single unknown parameter, we construct a greedy policy that achieves {\em bounded regret}, with a bound that depends on the single true parameter of the problem.

no code implementations • 7 Feb 2015 • Cem Tekin, Mihaela van der Schaar

A key challenge for such systems is to accurately predict what type of content each of its consumers prefers in a certain context, and adapt these predictions to the evolving consumers' preferences, contexts and content characteristics.

no code implementations • 5 Feb 2015 • Cem Tekin, Mihaela van der Schaar

We prove a general regret bound for our algorithm whose time order depends only on the maximum number of relevant dimensions among all the actions, which in the special case where the relevance relation is single-valued (a function), reduces to $\tilde{O}(T^{2(\sqrt{2}-1)})$; in the absence of a relevance relation, the best known contextual bandit algorithms achieve regret $\tilde{O}(T^{(D+1)/(D+2)})$, where $D$ is the full dimension of the context vector.

no code implementations • NeurIPS 2014 • Cem Tekin, Mihaela van der Schaar

When the relation is a function, i. e., the reward of an action only depends on the context of a single type, and the expected reward of an action is Lipschitz continuous in the context of its relevant type, we propose an algorithm that achieves $\tilde{O}(T^{\gamma})$ regret with a high probability, where $\gamma=2/(1+\sqrt{2})$.

no code implementations • 13 Nov 2014 • SaiDhiraj Amuru, Cem Tekin, Mihaela van der Schaar, R. Michael Buehrer

We first present novel online learning algorithms to maximize the jamming efficacy against static transmitter-receiver pairs and prove that our learning algorithm converges to the optimal (in terms of the error rate inflicted at the victim and the energy used) jamming strategy.

no code implementations • 29 Oct 2014 • Onur Atan, Cem Tekin, Mihaela van der Schaar

Specifically, we prove that the parameter-free (worst-case) regret is sublinear in time, and decreases with the informativeness of the arms.

no code implementations • 26 Sep 2013 • Cem Tekin, Simpson Zhang, Mihaela van der Schaar

In contrast to centralized recommender systems, in which there is a single centralized seller who has access to the complete inventory of items as well as the complete record of sales and user information, in decentralized recommender systems each seller/learner only has access to the inventory of items and user information for its own products and not the products and user information of other sellers, but can get commission if it sells an item of another seller.

no code implementations • 21 Aug 2013 • Cem Tekin, Mihaela van der Schaar

At each moment of time, an instance characterized by a certain context may arrive to each learner; based on the context, the learner can select one of its own actions (which gives a reward and provides information) or request assistance from another learner.

no code implementations • 21 Aug 2013 • Cem Tekin, Mihaela van der Schaar

We model the problem of joint classification by the distributed and heterogeneous learners from multiple data sources as a distributed contextual bandit problem where each data is characterized by a specific context.

no code implementations • 2 Jul 2013 • Cem Tekin, Mihaela van der Schaar

Distributed, online data mining systems have emerged as a result of applications requiring analysis of large amounts of correlated and high-dimensional data produced by multiple distributed data sources.

no code implementations • 15 May 2013 • Cem Tekin, Mingyan Liu

In an online contract selection problem there is a seller which offers a set of contracts to sequentially arriving buyers whose types are drawn from an unknown distribution.

no code implementations • 20 Jul 2011 • Cem Tekin, Mingyan Liu

In an uncontrolled restless bandit problem, there is a finite set of arms, each of which when pulled yields a positive reward.

no code implementations • 14 Jul 2010 • Cem Tekin, Mingyan Liu

The player receives a state-dependent reward each time it plays an arm.

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.