no code implementations • 16 Aug 2023 • Artun Saday, Yaşar Cahit Yıldırım, Cem Tekin
In addition, we define a weaker notion of regret called robust satisficing regret, in which our algorithm achieves a sublinear upper bound independent of the amount of distribution shift.
no code implementations • 6 Jun 2022 • Kerem Bozgan, Cem Tekin
We consider the Pareto set identification (PSI) problem in multi-objective multi-armed bandits (MO-MAB) with contaminated reward observations.
no code implementations • 9 May 2022 • Ilker Demirel, Yigit Yildirim, Cem Tekin
We demonstrate Fed-MoM-UCB's effectiveness against the baselines in the presence of Byzantine attacks via experiments.
no code implementations • 13 Dec 2021 • Ilker Demirel, Mehmet Ufuk Ozdemir, Cem Tekin
In this work, we tackle a different critical task through the lens of \textit{linear stochastic bandits}, where the aim is to keep the actions' outcomes close to a target level while respecting a \textit{two-sided} safety constraint, which we call \textit{leveling}.
no code implementations • 29 Nov 2021 • Sepehr Elahi, Baran Atalar, Sevda Öğüt, Cem Tekin
In federated multi-armed bandit problems, maximizing global reward while satisfying minimum privacy requirements to protect clients is the main goal.
1 code implementation • 26 Nov 2021 • Ilker Demirel, Ahmet Alparslan Celik, Cem Tekin
We propose ESCADA, a novel and generic multi-armed bandit (MAB) algorithm tailored for the leveling task, to make safe, personalized, and context-aware dose recommendations.
no code implementations • 23 Oct 2021 • Çağın Ararat, Cem Tekin
We introduce vector optimization problems with stochastic bandit feedback, in which preferences among designs are encoded by a polyhedral ordering cone $C$.
no code implementations • 5 Oct 2021 • Andi Nika, Sepehr Elahi, Cem Tekin
We consider a contextual bandit problem with a combinatorial action set and time-varying base arm availability.
no code implementations • 8 Sep 2021 • Mahed Abroshan, Kai Hou Yip, Cem Tekin, Mihaela van der Schaar
Secondly, such datasets are usually imperfect, additionally cursed with missing values in the attributes of features.
1 code implementation • ICLR 2021 • Alihan Hüyük, Daniel Jarrett, Cem Tekin, Mihaela van der Schaar
Understanding human behavior from observed data is critical for transparency and accountability in decision-making.
1 code implementation • 28 Aug 2020 • Andi Nika, Sepehr Elahi, Cem Tekin
We consider contextual combinatorial volatile multi-armed bandit (CCV-MAB), in which at each round, the learner observes a set of available base arms and their contexts, and then, selects a super arm that contains $K$ base arms in order to maximize its cumulative reward.
1 code implementation • 24 Jun 2020 • Andi Nika, Kerem Bozgan, Sepehr Elahi, Çağın Ararat, Cem Tekin
We consider the problem of optimizing a vector-valued objective function $\boldsymbol{f}$ sampled from a Gaussian Process (GP) whose index set is a well-behaved, compact metric space $({\cal X}, d)$ of designs.
no code implementations • 26 Jul 2019 • Alihan Hüyük, Cem Tekin
The algorithm we propose for the second setting also attains bounded regret for the multiarmed bandit with satisficing objectives.
no code implementations • 7 Jul 2019 • Alihan Hüyük, Cem Tekin
Influence maximization, adaptive routing, and dynamic spectrum allocation all require choosing the right action from a large set of alternatives.
no code implementations • 1 Jul 2019 • Eralp Turgay, Cem Bulucu, Cem Tekin
As our learning model, we consider a structured contextual multi-armed bandit (CMAB) with high-dimensional arm (action) and context (data) sets, where the rewards depend only on a few relevant dimensions of the joint context-arm set, possibly in a non-linear way.
no code implementations • NeurIPS 2019 • Xueru Zhang, Mohammad Mahdi Khalili, Cem Tekin, Mingyan Liu
Machine Learning (ML) models trained on data from multiple demographic groups can inherit representation disparity (Hashimoto et al., 2018) that may exist in the data: the model may be less favorable to groups contributing less to the training process; this in turn can degrade population retention in these groups over time, and exacerbate representation disparity in the long run.
no code implementations • 7 Sep 2018 • Alihan Hüyük, Cem Tekin
We analyze the regret of combinatorial Thompson sampling (CTS) for the combinatorial multi-armed bandit with probabilistically triggered arms under the semi-bandit feedback setting.
no code implementations • 11 Mar 2018 • Eralp Turğay, Doruk Öner, Cem Tekin
Essentially, the contextual Pareto regret is the sum of the distances of the arms chosen by the learner to the context dependent Pareto front.
no code implementations • 11 Mar 2018 • Doruk Öner, Altuğ Karakurt, Atilla Eryilmaz, Cem Tekin
In this paper, we introduce the COmbinatorial Multi-Objective Multi-Armed Bandit (COMO-MAB) problem that captures the challenges of combinatorial and multi-objective online learning simultaneously.
no code implementations • 18 Aug 2017 • Cem Tekin, Eralp Turgay
In this case, the optimal arm given a context is the one that maximizes the expected reward in the non-dominant objective among all arms that maximize the expected reward in the dominant objective.
no code implementations • 24 Jul 2017 • A. Ömer Sarıtaç, Cem Tekin
Under the assumption that the arm triggering probabilities (ATPs) are positive for all arms, we prove that a class of upper confidence bound (UCB) policies, named Combinatorial UCB with exploration rate $\kappa$ (CUCB-$\kappa$), and Combinatorial Thompson Sampling (CTS), which estimates the expected states of the arms via Thompson sampling, achieve bounded regret.
no code implementations • 10 May 2017 • Sabrina Klos, Cem Tekin, Mihaela van der Schaar, Anja Klein
In our algorithm, a local controller (LC) in the mobile device of a worker regularly observes the worker's context, her/his decisions to accept or decline tasks and the quality in completing tasks.
no code implementations • 21 May 2016 • Nima Akbarzadeh, Cem Tekin
In the GRBP, the learner proceeds in a sequence of rounds, where each round is a Markov Decision Process (MDP) with two actions (arms): a continuation action that moves the learner randomly over the state space around the current state; and a terminal action that moves the learner directly into one of the two terminal states (goal and dead-end state).
no code implementations • 23 Dec 2015 • Cem Tekin, Jinsung Yoon, Mihaela van der Schaar
Extracting actionable intelligence from distributed, heterogeneous, correlated and high-dimensional data sources requires run-time processing and learning both locally and globally.
no code implementations • 4 Aug 2015 • Cem Tekin, Mihaela van der Schaar
After the $stop$ action is taken, the learner collects a terminal reward, and observes the costs and terminal rewards associated with each step of the episode.
no code implementations • 29 Mar 2015 • Onur Atan, Cem Tekin, Mihaela van der Schaar
In the case in which rewards of all arms are deterministic functions of a single unknown parameter, we construct a greedy policy that achieves {\em bounded regret}, with a bound that depends on the single true parameter of the problem.
no code implementations • 7 Feb 2015 • Cem Tekin, Mihaela van der Schaar
A key challenge for such systems is to accurately predict what type of content each of its consumers prefers in a certain context, and adapt these predictions to the evolving consumers' preferences, contexts and content characteristics.
no code implementations • 5 Feb 2015 • Cem Tekin, Mihaela van der Schaar
We prove a general regret bound for our algorithm whose time order depends only on the maximum number of relevant dimensions among all the actions, which in the special case where the relevance relation is single-valued (a function), reduces to $\tilde{O}(T^{2(\sqrt{2}-1)})$; in the absence of a relevance relation, the best known contextual bandit algorithms achieve regret $\tilde{O}(T^{(D+1)/(D+2)})$, where $D$ is the full dimension of the context vector.
no code implementations • NeurIPS 2014 • Cem Tekin, Mihaela van der Schaar
When the relation is a function, i. e., the reward of an action only depends on the context of a single type, and the expected reward of an action is Lipschitz continuous in the context of its relevant type, we propose an algorithm that achieves $\tilde{O}(T^{\gamma})$ regret with a high probability, where $\gamma=2/(1+\sqrt{2})$.
no code implementations • 13 Nov 2014 • SaiDhiraj Amuru, Cem Tekin, Mihaela van der Schaar, R. Michael Buehrer
We first present novel online learning algorithms to maximize the jamming efficacy against static transmitter-receiver pairs and prove that our learning algorithm converges to the optimal (in terms of the error rate inflicted at the victim and the energy used) jamming strategy.
no code implementations • 29 Oct 2014 • Onur Atan, Cem Tekin, Mihaela van der Schaar
Specifically, we prove that the parameter-free (worst-case) regret is sublinear in time, and decreases with the informativeness of the arms.
no code implementations • 26 Sep 2013 • Cem Tekin, Simpson Zhang, Mihaela van der Schaar
In contrast to centralized recommender systems, in which there is a single centralized seller who has access to the complete inventory of items as well as the complete record of sales and user information, in decentralized recommender systems each seller/learner only has access to the inventory of items and user information for its own products and not the products and user information of other sellers, but can get commission if it sells an item of another seller.
no code implementations • 21 Aug 2013 • Cem Tekin, Mihaela van der Schaar
We model the problem of joint classification by the distributed and heterogeneous learners from multiple data sources as a distributed contextual bandit problem where each data is characterized by a specific context.
no code implementations • 21 Aug 2013 • Cem Tekin, Mihaela van der Schaar
At each moment of time, an instance characterized by a certain context may arrive to each learner; based on the context, the learner can select one of its own actions (which gives a reward and provides information) or request assistance from another learner.
no code implementations • 2 Jul 2013 • Cem Tekin, Mihaela van der Schaar
Distributed, online data mining systems have emerged as a result of applications requiring analysis of large amounts of correlated and high-dimensional data produced by multiple distributed data sources.
no code implementations • 15 May 2013 • Cem Tekin, Mingyan Liu
In an online contract selection problem there is a seller which offers a set of contracts to sequentially arriving buyers whose types are drawn from an unknown distribution.
no code implementations • 20 Jul 2011 • Cem Tekin, Mingyan Liu
In an uncontrolled restless bandit problem, there is a finite set of arms, each of which when pulled yields a positive reward.
no code implementations • 14 Jul 2010 • Cem Tekin, Mingyan Liu
The player receives a state-dependent reward each time it plays an arm.