Search Results for author: Yishay Mansour

Found 102 papers, 5 papers with code

Domain Adaptation with Multiple Sources

no code implementations NeurIPS 2008 Yishay Mansour, Mehryar Mohri, Afshin Rostamizadeh

The problem consists of combining these hypotheses to derive a hypothesis with small error with respect to the target domain.

Domain Adaptation

Domain Adaptation: Learning Bounds and Algorithms

no code implementations19 Feb 2009 Yishay Mansour, Mehryar Mohri, Afshin Rostamizadeh

This motivates our analysis of the problem of minimizing the empirical discrepancy for various loss functions for which we also give novel algorithms.

Domain Adaptation Generalization Bounds

Learning Bounds for Importance Weighting

no code implementations NeurIPS 2010 Corinna Cortes, Yishay Mansour, Mehryar Mohri

This paper presents an analysis of importance weighting for learning from finite samples and gives a series of theoretical and algorithmic results.

Learning Multiple Tasks using Shared Hypotheses

no code implementations NeurIPS 2012 Koby Crammer, Yishay Mansour

In this work we consider a setting where we have a very large number of related tasks with few examples from each individual task.

Generalization Bounds

Thompson Sampling for Complex Bandit Problems

no code implementations3 Nov 2013 Aditya Gopalan, Shie Mannor, Yishay Mansour

We consider stochastic multi-armed bandit problems with complex actions over a set of basic arms, where the decision maker plays a complex action rather than a basic arm in each round.

Thompson Sampling

Nonstochastic Multi-Armed Bandits with Graph-Structured Feedback

no code implementations30 Sep 2014 Noga Alon, Nicolò Cesa-Bianchi, Claudio Gentile, Shie Mannor, Yishay Mansour, Ohad Shamir

This naturally models several situations where the losses of different actions are related, and knowing the loss of one action provides information on the loss of other actions.

Multi-Armed Bandits

On the Complexity of Learning with Kernels

no code implementations5 Nov 2014 Nicolò Cesa-Bianchi, Yishay Mansour, Ohad Shamir

In this paper, we study lower bounds on the error attainable by such methods as a function of the number of entries observed in the kernel matrix or the rank of an approximate kernel matrix.

Classification with Low Rank and Missing Data

no code implementations14 Jan 2015 Elad Hazan, Roi Livni, Yishay Mansour

We consider classification and regression tasks where we have missing data and assume that the (clean) data resides in a low rank subspace.

Classification General Classification +1

Label Efficient Learning by Exploiting Multi-class Output Codes

no code implementations10 Nov 2015 Maria Florina Balcan, Travis Dick, Yishay Mansour

We present a new perspective on the popular multi-class algorithmic techniques of one-vs-all and error correcting output codes.

Delay and Cooperation in Nonstochastic Bandits

no code implementations15 Feb 2016 Nicolo' Cesa-Bianchi, Claudio Gentile, Yishay Mansour, Alberto Minora

We introduce \textsc{Exp3-Coop}, a cooperative version of the {\sc Exp3} algorithm and prove that with $K$ actions and $N$ agents the average per-agent regret after $T$ rounds is at most of order $\sqrt{\bigl(d+1 + \tfrac{K}{N}\alpha_{\le d}\bigr)(T\ln K)}$, where $\alpha_{\le d}$ is the independence number of the $d$-th power of the connected communication graph $G$.

Bayesian Exploration: Incentivizing Exploration in Bayesian Games

no code implementations24 Feb 2016 Yishay Mansour, Aleksandrs Slivkins, Vasilis Syrgkanis, Zhiwei Steven Wu

As a key technical tool, we introduce the concept of explorable actions, the actions which some incentive-compatible policy can recommend with non-zero probability.

Online Learning with Low Rank Experts

no code implementations21 Mar 2016 Elad Hazan, Tomer Koren, Roi Livni, Yishay Mansour

We consider the problem of prediction with expert advice when the losses of the experts have low-dimensional structure: they are restricted to an unknown $d$-dimensional subspace.

Predicting Counterfactuals from Large Historical Data and Small Randomized Trials

no code implementations24 Oct 2016 Nir Rosenfeld, Yishay Mansour, Elad Yom-Tov

The conventional way to answer this counterfactual question is to estimate the effect of the new treatment in comparison to that of the conventional treatment by running a controlled, randomized experiment.

counterfactual

Online Pricing with Strategic and Patient Buyers

no code implementations NeurIPS 2016 Michal Feldman, Tomer Koren, Roi Livni, Yishay Mansour, Aviv Zohar

We consider a seller with an unlimited supply of a single good, who is faced with a stream of $T$ buyers.

Automatic Representation for Lifetime Value Recommender Systems

no code implementations23 Feb 2017 Assaf Hallak, Yishay Mansour, Elad Yom-Tov

The LTV approach considers the future implications of the item recommendation, and seeks to maximize the cumulative gain over time.

Recommendation Systems Reinforcement Learning (RL)

Bandits with Movement Costs and Adaptive Pricing

no code implementations24 Feb 2017 Tomer Koren, Roi Livni, Yishay Mansour

In this setting, we give a new algorithm that establishes a regret of $\widetilde{O}(\sqrt{kT} + T/k)$, where $k$ is the number of actions and $T$ is the time horizon.

Competing Bandits: Learning under Competition

no code implementations27 Feb 2017 Yishay Mansour, Aleksandrs Slivkins, Zhiwei Steven Wu

Most modern systems strive to learn from interactions with users, and many engage in exploration: making potentially suboptimal choices for the sake of acquiring new information.

Efficient PAC Learning from the Crowd

no code implementations21 Mar 2017 Pranjal Awasthi, Avrim Blum, Nika Haghtalab, Yishay Mansour

When a noticeable fraction of the labelers are perfect, and the rest behave arbitrarily, we show that any $\mathcal{F}$ that can be efficiently learned in the traditional realizable PAC model can be learned in a computationally efficient manner by querying the crowd, despite high amounts of noise in the responses.

Computational Efficiency PAC learning

Submultiplicative Glivenko-Cantelli and Uniform Convergence of Revenues

no code implementations NeurIPS 2017 Noga Alon, Moshe Babaioff, Yannai A. Gonczarowski, Yishay Mansour, Shay Moran, Amir Yehudayoff

In this work we derive a variant of the classic Glivenko-Cantelli Theorem, which asserts uniform convergence of the empirical Cumulative Distribution Function (CDF) to the CDF of the underlying distribution.

Discriminative Learning of Prediction Intervals

no code implementations16 Oct 2017 Nir Rosenfeld, Yishay Mansour, Elad Yom-Tov

Most current methods for constructing prediction intervals offer guarantees for a single new test point.

Prediction Intervals

Multi-Armed Bandits with Metric Movement Costs

no code implementations NeurIPS 2017 Tomer Koren, Roi Livni, Yishay Mansour

We consider the non-stochastic Multi-Armed Bandit problem in a setting where there is a fixed and known metric on the action space that determines a cost for switching between any pair of actions.

Multi-Armed Bandits

Are All Experts Equally Good? A Study of Analyst Earnings Estimates

no code implementations13 May 2018 Amir Ban, Yishay Mansour

We note that this would make it possible to aggregate multiple predictions into a result that is more accurate than their consensus average, and that the improvement prospects grow with the amount of differentiation.

Online Linear Quadratic Control

no code implementations ICML 2018 Alon Cohen, Avinatan Hassidim, Tomer Koren, Nevena Lazic, Yishay Mansour, Kunal Talwar

We study the problem of controlling linear time-invariant systems with known noisy dynamics and adversarially chosen quadratic losses.

Improved Generalization Bounds for Adversarially Robust Learning

no code implementations4 Oct 2018 Idan Attias, Aryeh Kontorovich, Yishay Mansour

For binary classification, the algorithm of Feige et al. (2015) uses a regret minimization algorithm and an ERM oracle as a black box; we adapt it for the multiclass and regression settings.

Binary Classification General Classification +3

Adversarial Online Learning with noise

no code implementations22 Oct 2018 Alon Resler, Yishay Mansour

We present and study models of adversarial online learning where the feedback observed by the learner is noisy, and the feedback is either full information feedback or bandit feedback.

Learning to Screen

no code implementations NeurIPS 2019 Alon Cohen, Avinatan Hassidim, Haim Kaplan, Yishay Mansour, Shay Moran

(ii) In the second variant it is assumed that before the process starts, the algorithm has an access to a training set of $n$ items drawn independently from the same unknown distribution (e. g.\ data of candidates from previous recruitment seasons).

Differentially Private Learning of Geometric Concepts

no code implementations13 Feb 2019 Haim Kaplan, Yishay Mansour, Yossi Matias, Uri Stemmer

We present differentially private efficient algorithms for learning union of polygons in the plane (which are not necessarily convex).

PAC learning

Learning Linear-Quadratic Regulators Efficiently with only $\sqrt{T}$ Regret

no code implementations17 Feb 2019 Alon Cohen, Tomer Koren, Yishay Mansour

We present the first computationally-efficient algorithm with $\widetilde O(\sqrt{T})$ regret for learning in Linear Quadratic Control systems with unknown dynamics.

Open-Ended Question Answering

Competitive ratio versus regret minimization: achieving the best of both worlds

no code implementations7 Apr 2019 Amit Daniely, Yishay Mansour

Our end result is an online algorithm that can combine a "base" online algorithm, having a guaranteed competitive ratio, with a range of online algorithms that guarantee a small regret over any interval of time.

Online Convex Optimization in Adversarial Markov Decision Processes

no code implementations19 May 2019 Aviv Rosenberg, Yishay Mansour

We consider online learning in episodic loop-free Markov decision processes (MDPs), where the loss function can change arbitrarily between episodes, and the transition function is not known to the learner.

Unknown mixing times in apprenticeship and reinforcement learning

no code implementations23 May 2019 Tom Zahavy, Alon Cohen, Haim Kaplan, Yishay Mansour

We derive and analyze learning algorithms for apprenticeship learning, policy evaluation, and policy gradient for average reward criteria.

reinforcement-learning Reinforcement Learning (RL)

Efficient candidate screening under multiple tests and implications for fairness

no code implementations27 May 2019 Lee Cohen, Zachary C. Lipton, Yishay Mansour

We analyze the optimal employer policy both when the employer sets a fixed number of tests per candidate and when the employer can set a dynamic policy, assigning further tests adaptively based on results from the previous tests.

Fairness

Top-k Combinatorial Bandits with Full-Bandit Feedback

no code implementations28 May 2019 Idan Rejwan, Yishay Mansour

Top-k Combinatorial Bandits generalize multi-armed bandits, where at each round any subset of $k$ out of $n$ arms may be chosen and the sum of the rewards is gained.

Multi-Armed Bandits

ROI Maximization in Stochastic Online Decision-Making

no code implementations NeurIPS 2021 Nicolò Cesa-Bianchi, Tommaso Cesari, Yishay Mansour, Vianney Perchet

We introduce a novel theoretical framework for Return On Investment (ROI) maximization in repeated decision-making.

Decision Making

Graph-based Discriminators: Sample Complexity and Expressiveness

no code implementations NeurIPS 2019 Roi Livni, Yishay Mansour

A function $g\in \mathcal{G}$ distinguishes between two distributions, if the expected value of $g$, on a $k$-tuple of i. i. d examples, on the two distributions is (significantly) different.

Learning Theory

Thompson Sampling for Adversarial Bit Prediction

no code implementations21 Jun 2019 Yuval Lewi, Haim Kaplan, Yishay Mansour

We also bound the regret of those sequences, the worse case sequences have regret $O(\sqrt{T})$ and the best case sequence have regret $O(1)$.

Thompson Sampling

Individual Regret in Cooperative Nonstochastic Multi-Armed Bandits

no code implementations NeurIPS 2019 Yogev Bar-On, Yishay Mansour

We study agents communicating over an underlying network by exchanging messages, in order to optimize their individual regret in a common nonstochastic multi-armed bandit problem.

Multi-Armed Bandits

Apprenticeship Learning via Frank-Wolfe

no code implementations5 Nov 2019 Tom Zahavy, Alon Cohen, Haim Kaplan, Yishay Mansour

Specifically, we show that a variation of the FW method that is based on taking "away steps" achieves a linear rate of convergence when applied to AL and that a stochastic version of the FW algorithm can be used to avoid precise estimation of feature expectations.

Privately Learning Thresholds: Closing the Exponential Gap

no code implementations22 Nov 2019 Haim Kaplan, Katrina Ligett, Yishay Mansour, Moni Naor, Uri Stemmer

This problem has received much attention recently; unlike the non-private case, where the sample complexity is independent of the domain size and just depends on the desired accuracy and confidence, for private learning the sample complexity must depend on the domain size $X$ (even for approximate differential privacy).

Online Stochastic Shortest Path with Bandit Feedback and Unknown Transition Function

no code implementations NeurIPS 2019 Aviv Rosenberg, Yishay Mansour

We consider online learning in episodic loop-free Markov decision processes (MDPs), where the loss function can change arbitrarily between episodes.

Near-optimal Regret Bounds for Stochastic Shortest Path

no code implementations ICML 2020 Alon Cohen, Haim Kaplan, Yishay Mansour, Aviv Rosenberg

In this work we remove this dependence on the minimum cost---we give an algorithm that guarantees a regret bound of $\widetilde{O}(B_\star |S| \sqrt{|A| K})$, where $B_\star$ is an upper bound on the expected cost of the optimal policy, $S$ is the set of states, $A$ is the set of actions and $K$ is the number of episodes.

Reinforcement Learning (RL)

Prediction with Corrupted Expert Advice

no code implementations NeurIPS 2020 Idan Amir, Idan Attias, Tomer Koren, Roi Livni, Yishay Mansour

We revisit the fundamental problem of prediction with expert advice, in a setting where the environment is benign and generates losses stochastically, but the feedback observed by the learner is subject to a moderate adversarial corruption.

Adversarially Robust Streaming Algorithms via Differential Privacy

no code implementations NeurIPS 2020 Avinatan Hassidim, Haim Kaplan, Yishay Mansour, Yossi Matias, Uri Stemmer

A streaming algorithm is said to be adversarially robust if its accuracy guarantees are maintained even when the data stream is chosen maliciously, by an adaptive adversary.

Adversarial Robustness

Private Learning of Halfspaces: Simplifying the Construction and Reducing the Sample Complexity

no code implementations NeurIPS 2020 Haim Kaplan, Yishay Mansour, Uri Stemmer, Eliad Tsfadia

We present a differentially private learner for halfspaces over a finite grid $G$ in $\mathbb{R}^d$ with sample complexity $\approx d^{2. 5}\cdot 2^{\log^*|G|}$, which improves the state-of-the-art result of [Beimel et al., COLT 2019] by a $d^2$ factor.

Sample Complexity of Uniform Convergence for Multicalibration

no code implementations NeurIPS 2020 Eliran Shabat, Lee Cohen, Yishay Mansour

There is a growing interest in societal concerns in machine learning systems, especially in fairness.

Fairness

Reinforcement Learning with Feedback Graphs

no code implementations NeurIPS 2020 Christoph Dann, Yishay Mansour, Mehryar Mohri, Ayush Sekhari, Karthik Sridharan

We study episodic reinforcement learning in Markov decision processes when the agent receives additional feedback per step in the form of several transition observations.

reinforcement-learning Reinforcement Learning (RL)

Stochastic Shortest Path with Adversarially Changing Costs

no code implementations20 Jun 2020 Aviv Rosenberg, Yishay Mansour

Stochastic shortest path (SSP) is a well-known problem in planning and control, in which an agent has to reach a goal state in minimum total expected cost.

A Theory of Multiple-Source Adaptation with Limited Target Labeled Data

no code implementations19 Jul 2020 Yishay Mansour, Mehryar Mohri, Jae Ro, Ananda Theertha Suresh, Ke wu

We present a theoretical and algorithmic study of the multiple-source domain adaptation problem in the common scenario where the learner has access only to a limited amount of labeled target data, but where the learner has at disposal a large amount of labeled data from multiple source domains.

Domain Adaptation Model Selection

Competing Bandits: The Perils of Exploration Under Competition

no code implementations20 Jul 2020 Guy Aridor, Yishay Mansour, Aleksandrs Slivkins, Zhiwei Steven Wu

Users arrive one by one and choose between the two firms, so that each firm makes progress on its bandit problem only if it is chosen.

Multi-Armed Bandits

Detecting malicious PDF using CNN

no code implementations ICLR 2020 Raphael Fettaya, Yishay Mansour

We show, using a data set of 90000 files downloadable online, that our approach maintains a high detection rate (94%) of PDF malware and even detects new malicious files, still undetected by most antiviruses.

Clustering Computer Security

Beyond Individual and Group Fairness

no code implementations21 Aug 2020 Pranjal Awasthi, Corinna Cortes, Yishay Mansour, Mehryar Mohri

In the adversarial setting, we design efficient algorithms with competitive ratio guarantees.

Fairness

Oracle-Efficient Regret Minimization in Factored MDPs with Unknown Structure

1 code implementation NeurIPS 2021 Aviv Rosenberg, Yishay Mansour

We study regret minimization in non-episodic factored Markov decision processes (FMDPs), where all existing algorithms make the strong assumption that the factored structure of the FMDP is known to the learner in advance.

The Sparse Vector Technique, Revisited

no code implementations2 Oct 2020 Haim Kaplan, Yishay Mansour, Uri Stemmer

This simple algorithm privately tests whether the value of a given query on a database is close to what we expect it to be.

Adversarial Dueling Bandits

no code implementations27 Oct 2020 Aadirupa Saha, Tomer Koren, Yishay Mansour

We introduce the problem of regret minimization in Adversarial Dueling Bandits.

Learning Adversarial Markov Decision Processes with Delayed Feedback

no code implementations29 Dec 2020 Tal Lancewicki, Aviv Rosenberg, Yishay Mansour

We present novel algorithms based on policy optimization that achieve near-optimal high-probability regret of $\widetilde O ( \sqrt{K} + \sqrt{D} )$ under full-information feedback, where $K$ is the number of episodes and $D = \sum_{k} d^k$ is the total delay.

Recommendation Systems

Separating Adaptive Streaming from Oblivious Streaming

no code implementations26 Jan 2021 Haim Kaplan, Yishay Mansour, Kobbi Nissim, Uri Stemmer

We present a streaming problem for which every adversarially-robust streaming algorithm must use polynomial space, while there exists a classical (oblivious) streaming algorithm that uses only polylogarithmic space.

Data Structures and Algorithms

Online Markov Decision Processes with Aggregate Bandit Feedback

no code implementations31 Jan 2021 Alon Cohen, Haim Kaplan, Tomer Koren, Yishay Mansour

We study a novel variant of online finite-horizon Markov Decision Processes with adversarially changing loss functions and initially unknown dynamics.

Minimax Regret for Stochastic Shortest Path

no code implementations NeurIPS 2021 Alon Cohen, Yonathan Efroni, Yishay Mansour, Aviv Rosenberg

In this work we show that the minimax regret for this setting is $\widetilde O(\sqrt{ (B_\star^2 + B_\star) |S| |A| K})$ where $B_\star$ is a bound on the expected cost of the optimal policy from any state, $S$ is the state space, and $A$ is the action space.

Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions

no code implementations4 Jun 2021 Tal Lancewicki, Shahar Segal, Tomer Koren, Yishay Mansour

We study the stochastic Multi-Armed Bandit (MAB) problem with random delays in the feedback received by the algorithm.

Multi-Armed Bandits

Differentially Private Multi-Armed Bandits in the Shuffle Model

no code implementations NeurIPS 2021 Jay Tenenbaum, Haim Kaplan, Yishay Mansour, Uri Stemmer

We give an $(\varepsilon,\delta)$-differentially private algorithm for the multi-armed bandit (MAB) problem in the shuffle model with a distribution-dependent regret of $O\left(\left(\sum_{a\in [k]:\Delta_a>0}\frac{\log T}{\Delta_a}\right)+\frac{k\sqrt{\log\frac{1}{\delta}}\log T}{\varepsilon}\right)$, and a distribution-independent regret of $O\left(\sqrt{kT\log T}+\frac{k\sqrt{\log\frac{1}{\delta}}\log T}{\varepsilon}\right)$, where $T$ is the number of rounds, $\Delta_a$ is the suboptimality gap of the arm $a$, and $k$ is the total number of arms.

Multi-Armed Bandits

Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations

no code implementations NeurIPS 2021 Christoph Dann, Yishay Mansour, Mehryar Mohri, Ayush Sekhari, Karthik Sridharan

In this work, we consider the more realistic setting of agnostic RL with rich observation spaces and a fixed class of policies $\Pi$ that may not contain any near-optimal policy.

reinforcement-learning Reinforcement Learning (RL)

Optimal Rates for Random Order Online Optimization

no code implementations NeurIPS 2021 Uri Sherman, Tomer Koren, Yishay Mansour

We study online convex optimization in the random order model, recently proposed by \citet{garber2020online}, where the loss functions may be chosen by an adversary, but are then presented to the online algorithm in a uniformly random order.

Dueling Bandits with Team Comparisons

no code implementations NeurIPS 2021 Lee Cohen, Ulrike Schmidt-Kraepelin, Yishay Mansour

We introduce the dueling teams problem, a new online-learning setting in which the learner observes noisy comparisons of disjoint pairs of $k$-sized teams from a universe of $n$ players.

FriendlyCore: Practical Differentially Private Aggregation

no code implementations19 Oct 2021 Eliad Tsfadia, Edith Cohen, Haim Kaplan, Yishay Mansour, Uri Stemmer

Differentially private algorithms for common metric aggregation tasks, such as clustering or averaging, often have limited practicality due to their complexity or to the large number of data points that is required for accurate results.

Clustering

Nonstochastic Bandits with Composite Anonymous Feedback

no code implementations6 Dec 2021 Nicolò Cesa-Bianchi, Tommaso Cesari, Roberto Colomboni, Claudio Gentile, Yishay Mansour

We investigate a nonstochastic bandit setting in which the loss of an action is not immediately charged to the player, but rather spread over the subsequent rounds in an adversarial way.

Cooperative Online Learning in Stochastic and Adversarial MDPs

no code implementations31 Jan 2022 Tal Lancewicki, Aviv Rosenberg, Yishay Mansour

We study cooperative online learning in stochastic and adversarial Markov decision process (MDP).

Reinforcement Learning (RL)

Fair Wrapping for Black-box Predictions

1 code implementation31 Jan 2022 Alexander Soen, Ibrahim Alabdulmohsin, Sanmi Koyejo, Yishay Mansour, Nyalleng Moorosi, Richard Nock, Ke Sun, Lexing Xie

We introduce a new family of techniques to post-process ("wrap") a black-box classifier in order to reduce its bias.

Fairness

Monotone Learning

no code implementations10 Feb 2022 Olivier Bousquet, Amit Daniely, Haim Kaplan, Yishay Mansour, Shay Moran, Uri Stemmer

Our transformation readily implies monotone learners in a variety of contexts: for example it extends Pestov's result to classification tasks with an arbitrary number of labels.

Binary Classification Classification +1

A Characterization of Semi-Supervised Adversarially-Robust PAC Learnability

no code implementations11 Feb 2022 Idan Attias, Steve Hanneke, Yishay Mansour

This shows that there is a significant benefit in semi-supervised robust learning even in the worst-case distribution-free model, and establishes a gap between the supervised and semi-supervised label complexities which is known not to hold in standard non-robust PAC learning.

PAC learning

Finding Safe Zones of policies Markov Decision Processes

no code implementations23 Feb 2022 Lee Cohen, Yishay Mansour, Michal Moshkovitz

Given a policy of a Markov Decision Process, we define a SafeZone as a subset of states, such that most of the policy's trajectories are confined to this subset.

Benign Underfitting of Stochastic Gradient Descent

no code implementations27 Feb 2022 Tomer Koren, Roi Livni, Yishay Mansour, Uri Sherman

We study to what extent may stochastic gradient descent (SGD) be understood as a "conventional" learning rule that achieves generalization performance by obtaining a good fit to training data.

Learning Efficiently Function Approximation for Contextual MDP

no code implementations2 Mar 2022 Orin Levy, Yishay Mansour

We study learning contextual MDPs using a function approximation for both the rewards and the dynamics.

Modeling Attrition in Recommender Systems with Departing Bandits

no code implementations25 Mar 2022 Omer Ben-Porat, Lee Cohen, Liu Leqi, Zachary C. Lipton, Yishay Mansour

We first address the case where all users share the same type, demonstrating that a recent UCB-based algorithm is optimal.

Multi-Armed Bandits Recommendation Systems

Strategizing against Learners in Bayesian Games

no code implementations17 May 2022 Yishay Mansour, Mehryar Mohri, Jon Schneider, Balasubramanian Sivan

We study repeated two-player games where one of the players, the learner, employs a no-regret learning strategy, while the other, the optimizer, is a rational utility maximizer.

What killed the Convex Booster ?

no code implementations19 May 2022 Yishay Mansour, Richard Nock, Robert C. Williamson

A landmark negative result of Long and Servedio established a worst-case spectacular failure of a supervised learning trio (loss, algorithm, model) otherwise praised for its high precision machinery.

Guarantees for Epsilon-Greedy Reinforcement Learning with Function Approximation

no code implementations19 Jun 2022 Christoph Dann, Yishay Mansour, Mehryar Mohri, Ayush Sekhari, Karthik Sridharan

This paper presents a theoretical analysis of such policies and provides the first regret and sample-complexity bounds for reinforcement learning with myopic exploration.

reinforcement-learning Reinforcement Learning (RL)

Optimism in Face of a Context: Regret Guarantees for Stochastic Contextual MDP

no code implementations22 Jul 2022 Orin Levy, Yishay Mansour

For the latter, our algorithm obtains regret bound of $\widetilde{O}( (H+{1}/{p_{min}})H|S|^{3/2}\sqrt{|A|T\log(\max\{|\mathcal{G}|,|\mathcal{P}|\}/\delta)})$ with probability $1-\delta$, where $\mathcal{P}$ and $\mathcal{G}$ are finite and realizable function classes used to approximate the dynamics and rewards respectively, $p_{min}$ is the minimum reachability parameter, $S$ is the set of states, $A$ the set of actions, $H$ the horizon, and $T$ the number of episodes.

Regret Minimization and Convergence to Equilibria in General-sum Markov Games

no code implementations28 Jul 2022 Liad Erez, Tal Lancewicki, Uri Sherman, Tomer Koren, Yishay Mansour

Our key observation is that online learning via policy optimization in Markov games essentially reduces to a form of weighted regret minimization, with unknown weights determined by the path length of the agents' policy sequence.

Dueling Convex Optimization with General Preferences

no code implementations27 Sep 2022 Aadirupa Saha, Tomer Koren, Yishay Mansour

We address the problem of \emph{convex optimization with dueling feedback}, where the goal is to minimize a convex function given a weaker form of \emph{dueling} feedback.

Eluder-based Regret for Stochastic Contextual MDPs

no code implementations27 Nov 2022 Orin Levy, Asaf Cassel, Alon Cohen, Yishay Mansour

To the best of our knowledge, our algorithm is the first efficient and rate-optimal regret minimization algorithm for CMDPs that operates under the general offline function approximation setting.

regression

Concurrent Shuffle Differential Privacy Under Continual Observation

no code implementations29 Jan 2023 Jay Tenenbaum, Haim Kaplan, Yishay Mansour, Uri Stemmer

the counter problem) and show that the concurrent shuffle model allows for significantly improved error compared to a standard (single) shuffle model.

Improved Regret for Efficient Online Reinforcement Learning with Linear Function Approximation

no code implementations30 Jan 2023 Uri Sherman, Tomer Koren, Yishay Mansour

We study reinforcement learning with linear function approximation and adversarially changing cost functions, a setup that has mostly been considered under simplifying assumptions such as full information feedback or exploratory conditions. We present a computationally efficient policy optimization algorithm for the challenging general setting of unknown dynamics and bandit feedback, featuring a combination of mirror-descent and least squares policy evaluation in an auxiliary MDP used to compute exploration bonuses. Our algorithm obtains an $\widetilde O(K^{6/7})$ regret bound, improving significantly over previous state-of-the-art of $\widetilde O (K^{14/15})$ in this setting.

reinforcement-learning Reinforcement Learning (RL)

Uniswap Liquidity Provision: An Online Learning Approach

1 code implementation1 Feb 2023 Yogev Bar-On, Yishay Mansour

Decentralized Exchanges (DEXs) are new types of marketplaces leveraging Blockchain technology.

Pseudonorm Approachability and Applications to Regret Minimization

no code implementations3 Feb 2023 Christoph Dann, Yishay Mansour, Mehryar Mohri, Jon Schneider, Balasubramanian Sivan

We then use that to show, modulo mild normalization assumptions, that there exists an $\ell_\infty$-approachability algorithm whose convergence is independent of the dimension of the original vectorial payoff.

On Differentially Private Online Predictions

no code implementations27 Feb 2023 Haim Kaplan, Yishay Mansour, Shay Moran, Kobbi Nissim, Uri Stemmer

In this work we introduce an interactive variant of joint differential privacy towards handling online processes in which existing privacy definitions seem too restrictive.

Efficient Rate Optimal Regret for Adversarial Contextual MDPs Using Online Function Approximation

no code implementations2 Mar 2023 Orin Levy, Alon Cohen, Asaf Cassel, Yishay Mansour

To the best of our knowledge, our algorithm is the first efficient rate optimal regret minimization algorithm for adversarial CMDPs that operates under the minimal standard assumption of online function approximation.

regression

The tree reconstruction game: phylogenetic reconstruction using reinforcement learning

no code implementations12 Mar 2023 Dana Azouri, Oz Granit, Michael Alburquerque, Yishay Mansour, Tal Pupko, Itay Mayrose

Our proposed method does not require likelihood calculation with every step, nor is it limited to greedy uphill moves in the likelihood space.

Q-Learning reinforcement-learning +1

Rate-Optimal Policy Optimization for Linear Markov Decision Processes

no code implementations28 Aug 2023 Uri Sherman, Alon Cohen, Tomer Koren, Yishay Mansour

We study regret minimization in online episodic linear Markov Decision Processes, and obtain rate-optimal $\widetilde O (\sqrt K)$ regret where $K$ denotes the number of episodes.

Faster Convergence with Multiway Preferences

no code implementations19 Dec 2023 Aadirupa Saha, Vitaly Feldman, Tomer Koren, Yishay Mansour

We next study a $m$-multiway comparison (`battling') feedback, where the learner can get to see the argmin feedback of $m$-subset of queried points and show a convergence rate of $\smash{\widetilde O}(\frac{d}{ \min\{\log m, d\}\epsilon })$.

Principal-Agent Reward Shaping in MDPs

1 code implementation30 Dec 2023 Omer Ben-Porat, Yishay Mansour, Michal Moshkovitz, Boaz Taitler

Principal-agent problems arise when one party acts on behalf of another, leading to conflicts of interest.

Learnability Gaps of Strategic Classification

no code implementations29 Feb 2024 Lee Cohen, Yishay Mansour, Shay Moran, Han Shao

We essentially show that any learnable class is also strategically learnable: we first consider a fully informative setting, where the manipulation structure (which is modeled by a manipulation graph $G^\star$) is known and during training time the learner has access to both the pre-manipulation data and post-manipulation data.

Classification Multi-Label Learning

Learning-Augmented Algorithms with Explicit Predictors

no code implementations12 Mar 2024 Marek Elias, Haim Kaplan, Yishay Mansour, Shay Moran

Recent advances in algorithmic design show how to utilize predictions obtained by machine learning models from past and present data.

Scheduling

Cannot find the paper you are looking for? You can Submit a new open access paper.