Search Results for author: Mohammad Ghavamzadeh

Found 96 papers, 20 papers with code

Discovering Personalized Semantics for Soft Attributes in Recommender Systems using Concept Activation Vectors

2 code implementations • 6 Feb 2022 • Christina Göpfert, Alex Haig, Yinlam Chow, Chih-Wei Hsu, Ivan Vendrov, Tyler Lu, Deepak Ramachandran, Hubert Pham, Mohammad Ghavamzadeh, Craig Boutilier

Interactive recommender systems have emerged as a promising paradigm to overcome the limitations of the primitive user feedback used by traditional recommender systems (e. g., clicks, item consumption, ratings).

Recommendation Systems

32,761

Paper
Code

DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models

2 code implementations • 25 May 2023 • Ying Fan, Olivia Watkins, Yuqing Du, Hao liu, MoonKyung Ryu, Craig Boutilier, Pieter Abbeel, Mohammad Ghavamzadeh, Kangwook Lee, Kimin Lee

We focus on diffusion models, defining the fine-tuning task as an RL problem, and updating the pre-trained text-to-image diffusion models using policy gradient to maximize the feedback-trained reward.

reinforcement-learning Reinforcement Learning (RL)

32,758

Paper
Code

Benchmarking Batch Deep Reinforcement Learning Algorithms

4 code implementations • 3 Oct 2019 • Scott Fujimoto, Edoardo Conti, Mohammad Ghavamzadeh, Joelle Pineau

Widely-used deep reinforcement learning algorithms have been shown to fail in the batch setting--learning from a fixed data set without interaction with the environment.

Benchmarking Q-Learning +2

573

Paper
Code

Prediction, Consistency, Curvature: Representation Learning for Locally-Linear Control

1 code implementation • ICLR 2020 • Nir Levine, Yin-Lam Chow, Rui Shu, Ang Li, Mohammad Ghavamzadeh, Hung Bui

A promising approach is to embed the high-dimensional observations into a lower-dimensional latent representation space, estimate the latent dynamics model, then utilize this model for control in the latent space.

Decision Making Open-Ended Question Answering +1

Paper
Code

Mirror Descent Policy Optimization

1 code implementation • ICLR 2022 • Manan Tomar, Lior Shani, Yonathan Efroni, Mohammad Ghavamzadeh

Overall, MDPO is derived from the MD principles, offers a unified approach to viewing a number of popular RL algorithms, and performs better than or on-par with TRPO, PPO, and SAC in a number of continuous control tasks.

Continuous Control Reinforcement Learning (RL)

Paper
Code

Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity

1 code implementation • 6 Jun 2020 • Bo Liu, Ian Gemp, Mohammad Ghavamzadeh, Ji Liu, Sridhar Mahadevan, Marek Petrik

In this paper, we introduce proximal gradient temporal difference learning, which provides a principled way of designing and analyzing true stochastic gradient temporal difference learning algorithms.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Robust Reinforcement Learning using Offline Data

1 code implementation • 10 Aug 2022 • Kishan Panaganti, Zaiyan Xu, Dileep Kalathil, Mohammad Ghavamzadeh

The goal of robust reinforcement learning (RL) is to learn a policy that is robust against the uncertainty in model parameters.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Predictive Coding for Locally-Linear Control

1 code implementation • ICML 2020 • Rui Shu, Tung Nguyen, Yin-Lam Chow, Tuan Pham, Khoat Than, Mohammad Ghavamzadeh, Stefano Ermon, Hung H. Bui

High-dimensional observations and unknown dynamics are major challenges when applying optimal control to many real-world decision making tasks.

Decision Making

Paper
Code

Deep Bayesian Quadrature Policy Optimization

1 code implementation • 28 Jun 2020 • Akella Ravi Tej, Kamyar Azizzadenesheli, Mohammad Ghavamzadeh, Anima Anandkumar, Yisong Yue

On the other hand, more sample efficient alternatives like Bayesian quadrature methods have received little attention due to their high computational complexity.

Continuous Control Policy Gradient Methods

Paper
Code

Efficient Risk-Averse Reinforcement Learning

2 code implementations • 10 May 2022 • Ido Greenberg, Yinlam Chow, Mohammad Ghavamzadeh, Shie Mannor

In risk-averse reinforcement learning (RL), the goal is to optimize some risk measure of the returns.

Autonomous Driving reinforcement-learning +1

Paper
Code

A Lyapunov-based Approach to Safe Reinforcement Learning

1 code implementation • NeurIPS 2018 • Yin-Lam Chow, Ofir Nachum, Edgar Duenez-Guzman, Mohammad Ghavamzadeh

In many real-world reinforcement learning (RL) problems, besides optimizing the main objective function, an agent must concurrently avoid violating a number of constraints.

Decision Making reinforcement-learning +2

Paper
Code

Lyapunov-based Safe Policy Optimization for Continuous Control

1 code implementation • 28 Jan 2019 • Yin-Lam Chow, Ofir Nachum, Aleksandra Faust, Edgar Duenez-Guzman, Mohammad Ghavamzadeh

We formulate these problems as constrained Markov decision processes (CMDPs) and present safe policy optimization algorithms that are based on a Lyapunov approach to solve them.

Continuous Control Robot Navigation

Paper
Code

Policy-Aware Model Learning for Policy Gradient Methods

1 code implementation • 28 Feb 2020 • Romina Abachi, Mohammad Ghavamzadeh, Amir-Massoud Farahmand

This is in contrast to conventional model learning approaches, such as those based on maximum likelihood estimate, that learn a predictive model of the environment without explicitly considering the interaction of the model and the planner.

Model-based Reinforcement Learning Policy Gradient Methods

Paper
Code

Confidence-aware Reward Optimization for Fine-tuning Text-to-Image Models

1 code implementation • 2 Apr 2024 • KyuYoung Kim, Jongheon Jeong, Minyong An, Mohammad Ghavamzadeh, Krishnamurthy Dvijotham, Jinwoo Shin, Kimin Lee

To investigate this issue in depth, we introduce the Text-Image Alignment Assessment (TIA2) benchmark, which comprises a diverse collection of text prompts, images, and human annotations.

Paper
Code

Neural Lyapunov Redesign

1 code implementation • 6 Jun 2020 • Arash Mehrjou, Mohammad Ghavamzadeh, Bernhard Schölkopf

We provide theoretical results on the class of systems that can be treated with the proposed algorithm and empirically evaluate the effectiveness of our method using an exemplary dynamical system.

Paper
Code

Non-stationary Bandits and Meta-Learning with a Small Set of Optimal Arms

1 code implementation • 25 Feb 2022 • MohammadJavad Azizi, Thang Duong, Yasin Abbasi-Yadkori, András György, Claire Vernade, Mohammad Ghavamzadeh

We study a sequential decision problem where the learner faces a sequence of $K$-armed bandit tasks.

Meta-Learning

Paper
Code

Bottleneck Conditional Density Estimation

1 code implementation • ICML 2017 • Rui Shu, Hung H. Bui, Mohammad Ghavamzadeh

We introduce a new framework for training deep generative models for high-dimensional conditional density estimation.

Density Estimation

Paper
Code

Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies

1 code implementation • NeurIPS 2019 • Yonathan Efroni, Nadav Merlis, Mohammad Ghavamzadeh, Shie Mannor

In this paper, we focus on model-based RL in the finite-state finite-horizon MDP setting and establish that exploring with \emph{greedy policies} -- act by \emph{1-step planning} -- can achieve tight minimax performance in terms of regret, $\tilde{\mathcal{O}}(\sqrt{HSAT})$.

Model-based Reinforcement Learning reinforcement-learning +1

Paper
Code

Bridging Distributionally Robust Learning and Offline RL: An Approach to Mitigate Distribution Shift and Partial Data Coverage

1 code implementation • 27 Oct 2023 • Kishan Panaganti, Zaiyan Xu, Dileep Kalathil, Mohammad Ghavamzadeh

The goal of an offline reinforcement learning (RL) algorithm is to learn optimal polices using historical (offline) data, without access to the environment for online exploration.

Offline RL Reinforcement Learning (RL)

Paper
Code

Model-Independent Online Learning for Influence Maximization

no code implementations • ICML 2017 • Sharan Vaswani, Branislav Kveton, Zheng Wen, Mohammad Ghavamzadeh, Laks Lakshmanan, Mark Schmidt

We consider influence maximization (IM) in social networks, which is the problem of maximizing the number of users that become aware of a product by selecting a set of "seed" users to expose the product to.

Paper
Add Code

More Robust Doubly Robust Off-policy Evaluation

no code implementations • ICML 2018 • Mehrdad Farajtabar, Yin-Lam Chow, Mohammad Ghavamzadeh

In particular, we focus on the doubly robust (DR) estimators that consist of an importance sampling (IS) component and a performance model, and utilize the low (or zero) bias of IS and low variance of the model at the same time.

Multi-Armed Bandits Off-policy evaluation

Paper
Add Code

Optimizing over a Restricted Policy Class in Markov Decision Processes

no code implementations • 26 Feb 2018 • Ershad Banijamali, Yasin Abbasi-Yadkori, Mohammad Ghavamzadeh, Nikos Vlassis

However, under a condition that is akin to the occupancy measures of the base policies having large overlap, we show that there exists an efficient algorithm that finds a policy that is almost as good as the best convex combination of the base policies.

Policy Gradient Methods

Paper
Add Code

Robust Locally-Linear Controllable Embedding

no code implementations • 15 Oct 2017 • Ershad Banijamali, Rui Shu, Mohammad Ghavamzadeh, Hung Bui, Ali Ghodsi

We also propose a principled variational approximation of the embedding posterior that takes the future observation into account, and thus, makes the variational approximation more robust against the noise.

Paper
Add Code

Path Consistency Learning in Tsallis Entropy Regularized MDPs

no code implementations • ICML 2018 • Ofir Nachum, Yin-Lam Chow, Mohammad Ghavamzadeh

In this paper, we follow the work of Nachum et al. (2017) in the soft ERL setting, and propose a class of novel path consistency learning (PCL) algorithms, called {\em sparse PCL}, for the sparse ERL problem that can work with both on-policy and off-policy data.

Paper
Add Code

Disentangling Dynamics and Content for Control and Planning

no code implementations • 24 Nov 2017 • Ershad Banijamali, Ahmad Khajenezhad, Ali Ghodsi, Mohammad Ghavamzadeh

In this paper, We study the problem of learning a controllable representation for high-dimensional observations of dynamical systems.

Paper
Add Code

Active Learning for Accurate Estimation of Linear Models

no code implementations • ICML 2017 • Carlos Riquelme, Mohammad Ghavamzadeh, Alessandro Lazaric

We explore the sequential decision making problem where the goal is to estimate uniformly well a number of linear models, given a shared budget of random contexts independently sampled from a known distribution.

Active Learning Decision Making

Paper
Add Code

Online Learning to Rank in Stochastic Click Models

no code implementations • ICML 2017 • Masrour Zoghi, Tomas Tunys, Mohammad Ghavamzadeh, Branislav Kveton, Csaba Szepesvari, Zheng Wen

In this work, we propose BatchRank, the first online learning to rank algorithm for a broad class of click models.

Information Retrieval Learning-To-Rank +1

Paper
Add Code

Risk-Constrained Reinforcement Learning with Percentile Risk Criteria

no code implementations • 5 Dec 2015 • Yin-Lam Chow, Mohammad Ghavamzadeh, Lucas Janson, Marco Pavone

In many sequential decision-making problems one is interested in minimizing an expected cumulative cost while taking into account \emph{risk}, i. e., increased awareness of events of small probability and high consequences.

Decision Making Marketing +2

Paper
Add Code

Conservative Contextual Linear Bandits

no code implementations • NeurIPS 2017 • Abbas Kazerouni, Mohammad Ghavamzadeh, Yasin Abbasi-Yadkori, Benjamin Van Roy

We prove an upper-bound on the regret of CLUCB and show that it can be decomposed into two terms: 1) an upper-bound for the regret of the standard linear UCB algorithm that grows with the time horizon and 2) a constant (does not grow with the time horizon) term that accounts for the loss of being conservative in order to satisfy the safety constraint.

Decision Making Marketing

Paper
Add Code

Bayesian Reinforcement Learning: A Survey

no code implementations • 14 Sep 2016 • Mohammad Ghavamzadeh, Shie Mannor, Joelle Pineau, Aviv Tamar

The objective of the paper is to provide a comprehensive survey on Bayesian RL algorithms and their theoretical and empirical properties.

Bayesian Inference reinforcement-learning +1

Paper
Add Code

Graphical Model Sketch

no code implementations • 9 Feb 2016 • Branislav Kveton, Hung Bui, Mohammad Ghavamzadeh, Georgios Theocharous, S. Muthukrishnan, Siqi Sun

Graphical models are a popular approach to modeling structured data but they are unsuitable for high-cardinality variables.

Paper
Add Code

Safe Policy Improvement by Minimizing Robust Baseline Regret

no code implementations • NeurIPS 2016 • Marek Petrik, Yin-Lam Chow, Mohammad Ghavamzadeh

We show that our formulation is NP-hard and propose an approximate algorithm.

Decision Making Decision Making Under Uncertainty

Paper
Add Code

Personalized Advertisement Recommendation: A Ranking Approach to Address the Ubiquitous Click Sparsity Problem

no code implementations • 6 Mar 2016 • Sougata Chaudhuri, Georgios Theocharous, Mohammad Ghavamzadeh

We study the problem of personalized advertisement recommendation (PAR), which consist of a user visiting a system (website) and the system displaying one of $K$ ads to the user.

Paper
Add Code

Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits

no code implementations • 16 Jul 2015 • Alexandra Carpentier, Alessandro Lazaric, Mohammad Ghavamzadeh, Rémi Munos, Peter Auer, András Antos

If the variance of the distributions were known, one could design an optimal sampling strategy by collecting a number of independent samples per distribution that is proportional to their variance.

Active Learning Multi-Armed Bandits

Paper
Add Code

A Generalized Kernel Approach to Structured Output Learning

no code implementations • 10 May 2012 • Hachem Kadri, Mohammad Ghavamzadeh, Philippe Preux

Finally, we evaluate the performance of our KDE approach using both covariance and conditional covariance kernels on two structured output problems, and compare it to the state-of-the-art kernel-based structured output regression methods.

regression

Paper
Add Code

Policy Gradient for Coherent Risk Measures

no code implementations • NeurIPS 2015 • Aviv Tamar, Yin-Lam Chow, Mohammad Ghavamzadeh, Shie Mannor

For static risk measures, our approach is in the spirit of policy gradient algorithms and combines a standard sampling approach with convex programming.

Policy Gradient Methods

Paper
Add Code

Variance-Constrained Actor-Critic Algorithms for Discounted and Average Reward MDPs

no code implementations • 25 Mar 2014 • Prashanth L. A., Mohammad Ghavamzadeh

For each formulation, we first define a measure of variability for a policy, which in turn gives us a set of risk-sensitive criteria to optimize.

Decision Making

Paper
Add Code

Algorithms for CVaR Optimization in MDPs

no code implementations • NeurIPS 2014 • Yin-Lam Chow, Mohammad Ghavamzadeh

In many sequential decision-making problems we may want to manage risk by minimizing some measure of variability in costs in addition to minimizing a standard criterion.

Decision Making

Paper
Add Code

Classification-based Approximate Policy Iteration: Experiments and Extended Discussions

no code implementations • 2 Jul 2014 • Amir-Massoud Farahmand, Doina Precup, André M. S. Barreto, Mohammad Ghavamzadeh

We introduce a general classification-based approximate policy iteration (CAPI) framework, which encompasses a large class of algorithms that can exploit regularities of both the value function and the policy space, depending on what is advantageous.

Classification General Classification

Paper
Add Code

Approximate Modified Policy Iteration

no code implementations • 14 May 2012 • Bruno Scherrer, Victor Gabillon, Mohammad Ghavamzadeh, Matthieu Geist

Modified policy iteration (MPI) is a dynamic programming (DP) algorithm that contains the two celebrated policy and value iteration methods.

General Classification

Paper
Add Code

Risk-Sensitive Generative Adversarial Imitation Learning

no code implementations • 13 Aug 2018 • Jonathan Lacotte, Mohammad Ghavamzadeh, Yin-Lam Chow, Marco Pavone

We then derive two different versions of our RS-GAIL optimization problem that aim at matching the risk profiles of the agent and the expert w. r. t.

Imitation Learning

Paper
Add Code

A Block Coordinate Ascent Algorithm for Mean-Variance Optimization

no code implementations • NeurIPS 2018 • Bo Liu, Tengyang Xie, Yangyang Xu, Mohammad Ghavamzadeh, Yin-Lam Chow, Daoming Lyu, Daesub Yoon

Risk management in dynamic decision problems is a primary concern in many fields, including financial investment, autonomous driving, and healthcare.

Autonomous Driving Management

Paper
Add Code

Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits

no code implementations • 13 Nov 2018 • Branislav Kveton, Csaba Szepesvari, Sharan Vaswani, Zheng Wen, Mohammad Ghavamzadeh, Tor Lattimore

Specifically, it pulls the arm with the highest mean reward in a non-parametric bootstrap sample of its history with pseudo rewards.

Multi-Armed Bandits

Paper
Add Code

Actor-Critic Algorithms for Risk-Sensitive MDPs

no code implementations • NeurIPS 2013 • Prashanth L. A., Mohammad Ghavamzadeh

For each formulation, we first define a measure of variability for a policy, which in turn gives us a set of risk-sensitive criteria to optimize.

Decision Making

Paper
Add Code

Approximate Dynamic Programming Finally Performs Well in the Game of Tetris

no code implementations • NeurIPS 2013 • Victor Gabillon, Mohammad Ghavamzadeh, Bruno Scherrer

A close look at the literature of this game shows that while ADP algorithms, that have been (almost) entirely based on approximating the value function (value function based), have performed poorly in Tetris, the methods that search directly in the space of policies by learning the policy parameters using an optimization black box, such as the cross entropy (CE) method, have achieved the best reported results.

Paper
Add Code

Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence

no code implementations • NeurIPS 2012 • Victor Gabillon, Mohammad Ghavamzadeh, Alessandro Lazaric

We study the problem of identifying the best arm(s) in the stochastic multi-armed bandit setting.

Paper
Add Code

Multi-Bandit Best Arm Identification

no code implementations • NeurIPS 2011 • Victor Gabillon, Mohammad Ghavamzadeh, Alessandro Lazaric, Sébastien Bubeck

We first propose an algorithm called Gap-based Exploration (GapE) that focuses on the arms whose mean is close to the mean of the best arm in the same bandit (i. e., small gap).

Paper
Add Code

Speedy Q-Learning

no code implementations • NeurIPS 2011 • Mohammad Ghavamzadeh, Hilbert J. Kappen, Mohammad G. Azar, Rémi Munos

We introduce a new convergent variant of Q-learning, called speedy Q-learning, to address the problem of slow convergence in the standard form of the Q-learning algorithm.

Q-Learning

Paper
Add Code

LSTD with Random Projections

no code implementations • NeurIPS 2010 • Mohammad Ghavamzadeh, Alessandro Lazaric, Odalric Maillard, Rémi Munos

We provide a thorough theoretical analysis of the LSTD with random projections and derive performance bounds for the resulting algorithm.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Regularized Policy Iteration

no code implementations • NeurIPS 2008 • Amir M. Farahmand, Mohammad Ghavamzadeh, Shie Mannor, Csaba Szepesvári

In this paper we consider approximate policy-iteration-based reinforcement learning algorithms.

L2 Regularization reinforcement-learning +1

Paper
Add Code

Perturbed-History Exploration in Stochastic Multi-Armed Bandits

no code implementations • 26 Feb 2019 • Branislav Kveton, Csaba Szepesvari, Mohammad Ghavamzadeh, Craig Boutilier

Finally, we empirically evaluate PHE and show that it is competitive with state-of-the-art baselines.

Multi-Armed Bandits

Paper
Add Code

Perturbed-History Exploration in Stochastic Linear Bandits

no code implementations • 21 Mar 2019 • Branislav Kveton, Csaba Szepesvari, Mohammad Ghavamzadeh, Craig Boutilier

We evaluate our algorithms empirically and show that they are practical.

Paper
Add Code

Binary Classification with Bounded Abstention Rate

no code implementations • 23 May 2019 • Shubhanshu Shekhar, Mohammad Ghavamzadeh, Tara Javidi

We then propose a plug-in classifier that employs unlabeled samples to decide the region of abstention and derive an upper-bound on the excess risk of our classifier under standard \emph{H\"older smoothness} and \emph{margin} assumptions.

Binary Classification Classification +1

Paper
Add Code

Active Learning for Binary Classification with Abstention

no code implementations • 1 Jun 2019 • Shubhanshu Shekhar, Mohammad Ghavamzadeh, Tara Javidi

We construct and analyze active learning algorithms for the problem of binary classification with abstention.

Active Learning Binary Classification +2

Paper
Add Code

Randomized Exploration in Generalized Linear Bandits

no code implementations • 21 Jun 2019 • Branislav Kveton, Manzil Zaheer, Csaba Szepesvari, Lihong Li, Mohammad Ghavamzadeh, Craig Boutilier

The first, GLM-TSL, samples a generalized linear model (GLM) from the Laplace approximation to the posterior distribution.

Paper
Add Code

Online Planning with Lookahead Policies

no code implementations • NeurIPS 2020 • Yonathan Efroni, Mohammad Ghavamzadeh, Shie Mannor

This is the first work that proves improved sample complexity as a result of {\em increasing} the lookahead horizon in online planning.

Paper
Add Code

Multi-step Greedy Reinforcement Learning Algorithms

no code implementations • ICML 2020 • Manan Tomar, Yonathan Efroni, Mohammad Ghavamzadeh

We derive model-free RL algorithms based on $\kappa$-PI and $\kappa$-VI in which the surrogate problem can be solved by any discrete or continuous action RL method, such as DQN and TRPO.

Continuous Control Game of Go +3

Paper
Add Code

Adaptive Sampling for Estimating Multiple Probability Distributions

no code implementations • 28 Oct 2019 • Shubhanshu Shekhar, Tara Javidi, Mohammad Ghavamzadeh

We consider the problem of allocating samples to a finite set of discrete distributions in order to learn them uniformly well in terms of four common distance measures: $\ell_2^2$, $\ell_1$, $f$-divergence, and separation distance.

Paper
Add Code

Improved Algorithms for Conservative Exploration in Bandits

no code implementations • 8 Feb 2020 • Evrard Garcelon, Mohammad Ghavamzadeh, Alessandro Lazaric, Matteo Pirotta

In this case, it is desirable to deploy online learning algorithms (e. g., a multi-armed bandit algorithm) that interact with the system to learn a better/optimal policy under the constraint that during the learning process the performance is almost never worse than the performance of the baseline itself.

Marketing Recommendation Systems

Paper
Add Code

Conservative Exploration in Reinforcement Learning

no code implementations • 8 Feb 2020 • Evrard Garcelon, Mohammad Ghavamzadeh, Alessandro Lazaric, Matteo Pirotta

While learning in an unknown Markov Decision Process (MDP), an agent should trade off exploration to discover new information about the MDP, and exploitation of the current knowledge to maximize the reward.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Active Model Estimation in Markov Decision Processes

no code implementations • 6 Mar 2020 • Jean Tarbouriech, Shubhanshu Shekhar, Matteo Pirotta, Mohammad Ghavamzadeh, Alessandro Lazaric

Using a number of simple domains with heterogeneous noise in their transitions, we show that our heuristic-based algorithm outperforms both our original algorithm and the maximum entropy algorithm in the small sample regime, while achieving similar asymptotic performance as that of the original algorithm.

Common Sense Reasoning Efficient Exploration

Paper
Add Code

Variational Model-based Policy Optimization

no code implementations • 9 Jun 2020 • Yin-Lam Chow, Brandon Cui, MoonKyung Ryu, Mohammad Ghavamzadeh

Model-based reinforcement learning (RL) algorithms allow us to combine model-generated data with those collected from interaction with the real system in order to alleviate the data efficiency problem in RL.

Continuous Control Model-based Reinforcement Learning +1

Paper
Add Code

Stochastic Bandits with Linear Constraints

no code implementations • 17 Jun 2020 • Aldo Pacchiano, Mohammad Ghavamzadeh, Peter Bartlett, Heinrich Jiang

We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB), and prove an $\widetilde{\mathcal{O}}(\frac{d\sqrt{T}}{\tau-c_0})$ bound on its $T$-round regret, where the denominator is the difference between the constraint threshold and the cost of a known feasible action.

Multi-Armed Bandits

Paper
Add Code

Control-Aware Representations for Model-based Reinforcement Learning

no code implementations • ICLR 2021 • Brandon Cui, Yin-Lam Chow, Mohammad Ghavamzadeh

We first formulate a LCE model to learn representations that are suitable to be used by a policy iteration style algorithm in the latent space.

Model-based Reinforcement Learning reinforcement-learning +2

Paper
Add Code

Finite-Sample Analysis of Proximal Gradient TD Algorithms

no code implementations • 6 Jun 2020 • Bo Liu, Ji Liu, Mohammad Ghavamzadeh, Sridhar Mahadevan, Marek Petrik

In this paper, we analyze the convergence rate of the gradient temporal difference learning (GTD) family of algorithms.

Paper
Add Code

Adaptive Sampling for Estimating Probability Distributions

no code implementations • ICML 2020 • Shubhanshu Shekhar, Tara Javidi, Mohammad Ghavamzadeh

We consider the problem of allocating a fixed budget of samples to a finite set of discrete distributions to learn them uniformly well (minimizing the maximum error) in terms of four common distance measures: $\ell_2^2$, $\ell_1$, $f$-divergence, and separation distance.

Paper
Add Code

Variance-Reduced Off-Policy Memory-Efficient Policy Search

no code implementations • 14 Sep 2020 • Daoming Lyu, Qi Qi, Mohammad Ghavamzadeh, Hengshuai Yao, Tianbao Yang, Bo Liu

To achieve variance-reduced off-policy-stable policy optimization, we propose an algorithm family that is memory-efficient, stochastically variance-reduced, and capable of learning from off-policy samples.

Reinforcement Learning (RL) Stochastic Optimization

Paper
Add Code

A Review of Uncertainty Quantification in Deep Learning: Techniques, Applications and Challenges

no code implementations • 12 Nov 2020 • Moloud Abdar, Farhad Pourpanah, Sadiq Hussain, Dana Rezazadegan, Li Liu, Mohammad Ghavamzadeh, Paul Fieguth, Xiaochun Cao, Abbas Khosravi, U Rajendra Acharya, Vladimir Makarenkov, Saeid Nahavandi

Uncertainty quantification (UQ) plays a pivotal role in reduction of uncertainties during both optimization and decision making processes.

Decision Making Ensemble Learning +10

Paper
Add Code

Soft-Robust Algorithms for Batch Reinforcement Learning

no code implementations • 30 Nov 2020 • Elita A. Lobo, Mohammad Ghavamzadeh, Marek Petrik

In reinforcement learning, robust policies for high-stakes decision-making problems with limited data are usually computed by optimizing the percentile criterion, which minimizes the probability of a catastrophic failure.

Decision Making reinforcement-learning +1

Paper
Add Code

Non-Stationary Latent Bandits

no code implementations • 1 Dec 2020 • Joey Hong, Branislav Kveton, Manzil Zaheer, Yinlam Chow, Amr Ahmed, Mohammad Ghavamzadeh, Craig Boutilier

The key idea is to frame this problem as a latent bandit, where the prototypical models of user behavior are learned offline and the latent state of the user is inferred online from its interactions with the models.

Recommendation Systems Thompson Sampling

Paper
Add Code

Adaptive Sampling for Minimax Fair Classification

no code implementations • NeurIPS 2021 • Shubhanshu Shekhar, Greg Fields, Mohammad Ghavamzadeh, Tara Javidi

Machine learning models trained on uncurated datasets can often end up adversely affecting inputs belonging to underrepresented groups.

Classification General Classification

Paper
Add Code

Fixed-Budget Best-Arm Identification in Structured Bandits

no code implementations • 9 Jun 2021 • Mohammad Javad Azizi, Branislav Kveton, Mohammad Ghavamzadeh

We analyze our algorithm in linear and generalized linear models (GLMs), and propose a practical implementation based on a G-optimal design.

Multi-Armed Bandits

Paper
Add Code

Thompson Sampling with a Mixture Prior

no code implementations • 10 Jun 2021 • Joey Hong, Branislav Kveton, Manzil Zaheer, Mohammad Ghavamzadeh, Craig Boutilier

We study Thompson sampling (TS) in online decision making, where the uncertain environment is sampled from a mixture distribution.

Decision Making Multi-Task Learning +3

Paper
Add Code

Feature and Parameter Selection in Stochastic Linear Bandits

no code implementations • 9 Jun 2021 • Ahmadreza Moradipari, Berkay Turan, Yasin Abbasi-Yadkori, Mahnoosh Alizadeh, Mohammad Ghavamzadeh

In the second setting, the reward parameter of the LB problem is arbitrarily selected from $M$ models represented as (possibly) overlapping balls in $\mathbb R^d$.

feature selection Model Selection

Paper
Add Code

Hierarchical Bayesian Bandits

no code implementations • 12 Nov 2021 • Joey Hong, Branislav Kveton, Manzil Zaheer, Mohammad Ghavamzadeh

We provide a unified view of all these problems, as learning to act in a hierarchical Bayesian bandit.

Federated Learning Thompson Sampling

Paper
Add Code

Lyapunov-based Safe Policy Optimization

no code implementations • 27 Sep 2018 • Yinlam Chow, Ofir Nachum, Mohammad Ghavamzadeh, Edgar Guzman-Duenez

In many reinforcement learning applications, it is crucial that the agent interacts with the environment only through safe policies, i. e.,~policies that do not take the agent to certain undesirable situations.

Paper
Add Code

Safe Policy Learning for Continuous Control

no code implementations • 25 Sep 2019 • Yinlam Chow, Ofir Nachum, Aleksandra Faust, Edgar Duenez-Guzman, Mohammad Ghavamzadeh

We study continuous action reinforcement learning problems in which it is crucial that the agent interacts with the environment only through safe policies, i. e.,~policies that keep the agent in desirable situations, both during training and at convergence.

Continuous Control

Paper
Add Code

Multi-step Greedy Policies in Model-Free Deep Reinforcement Learning

no code implementations • 25 Sep 2019 • Yonathan Efroni, Manan Tomar, Mohammad Ghavamzadeh

In this work, we explore the benefits of multi-step greedy policies in model-free RL when employed in the framework of multi-step Dynamic Programming (DP): multi-step Policy and Value Iteration.

Continuous Control Game of Go +3

Paper
Add Code

Deep Hierarchy in Bandits

no code implementations • 3 Feb 2022 • Joey Hong, Branislav Kveton, Sumeet Katariya, Manzil Zaheer, Mohammad Ghavamzadeh

We use this exact posterior to analyze the Bayes regret of HierTS in Gaussian bandits.

Thompson Sampling

Paper
Add Code

Meta-Learning for Simple Regret Minimization

1 code implementation • 25 Feb 2022 • MohammadJavad Azizi, Branislav Kveton, Mohammad Ghavamzadeh, Sumeet Katariya

The Bayesian algorithm has access to a prior distribution over the meta-parameters and its meta simple regret over $m$ bandit tasks with horizon $n$ is mere $\tilde{O}(m / \sqrt{n})$.

Meta-Learning

Paper
Code

Collaborative Multi-agent Stochastic Linear Bandits

no code implementations • 12 May 2022 • Ahmadreza Moradipari, Mohammad Ghavamzadeh, Mahnoosh Alizadeh

We propose a distributed upper confidence bound (UCB) algorithm and prove a high probability bound on its $T$-round regret in which we include a linear growth of regret associated with each communication round.

Paper
Add Code

Multi-Environment Meta-Learning in Stochastic Linear Bandits

no code implementations • 12 May 2022 • Ahmadreza Moradipari, Mohammad Ghavamzadeh, Taha Rajabzadeh, Christos Thrampoulidis, Mahnoosh Alizadeh

In this work we investigate meta-learning (or learning-to-learn) approaches in multi-task linear stochastic bandit problems that can originate from multiple environments.

Meta-Learning

Paper
Add Code

A Mixture-of-Expert Approach to RL-based Dialogue Management

no code implementations • 31 May 2022 • Yinlam Chow, Aza Tulepbergenov, Ofir Nachum, MoonKyung Ryu, Mohammad Ghavamzadeh, Craig Boutilier

Despite recent advancements in language models (LMs), their application to dialogue management (DM) problems and ability to carry on rich conversations remain a challenge.

Attribute Dialogue Management +3

Paper
Add Code

Reinforcement Learning of Multi-Domain Dialog Policies Via Action Embeddings

no code implementations • 3rd Conversational AI Workshop at 33rd Conference on Neural Information Processing Systems (NeurIPS 2019) 2019 • Jorge A. Mendez, Alborz Geramifard, Mohammad Ghavamzadeh, Bing Liu

Learning task-oriented dialog policies via reinforcement learning typically requires large amounts of interaction with users, which in practice renders such methods unusable for real-world applications.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

RASR: Risk-Averse Soft-Robust MDPs with EVaR and Entropic Risk

no code implementations • 9 Sep 2022 • Jia Lin Hau, Marek Petrik, Mohammad Ghavamzadeh, Reazul Russel

We propose and analyze a new framework to jointly model the risk associated with epistemic and aleatory uncertainties in finite-horizon and discounted infinite-horizon MDPs.

Reinforcement Learning (RL) Safe Reinforcement Learning

Paper
Add Code

Operator Splitting Value Iteration

no code implementations • 25 Nov 2022 • Amin Rakhsha, Andrew Wang, Mohammad Ghavamzadeh, Amir-Massoud Farahmand

We introduce new planning and reinforcement learning algorithms for discounted MDPs that utilize an approximate model of the environment to accelerate the convergence of the value function.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Multi-Task Off-Policy Learning from Bandit Feedback

no code implementations • 9 Dec 2022 • Joey Hong, Branislav Kveton, Sumeet Katariya, Manzil Zaheer, Mohammad Ghavamzadeh

We prove per-task bounds on the suboptimality of the learned policies, which show a clear improvement over not using the hierarchical model.

Learning-To-Rank Recommendation Systems

Paper
Add Code

Aligning Text-to-Image Models using Human Feedback

no code implementations • 23 Feb 2023 • Kimin Lee, Hao liu, MoonKyung Ryu, Olivia Watkins, Yuqing Du, Craig Boutilier, Pieter Abbeel, Mohammad Ghavamzadeh, Shixiang Shane Gu

Our results demonstrate the potential for learning from human feedback to significantly improve text-to-image models.

Image Generation

Paper
Add Code

A Review of Deep Learning for Video Captioning

no code implementations • 22 Apr 2023 • Moloud Abdar, Meenakshi Kollati, Swaraja Kuraparthi, Farhad Pourpanah, Daniel McDuff, Mohammad Ghavamzadeh, Shuicheng Yan, Abduallah Mohamed, Abbas Khosravi, Erik Cambria, Fatih Porikli

Video captioning (VC) is a fast-moving, cross-disciplinary area of research that bridges work in the fields of computer vision, natural language processing (NLP), linguistics, and human-computer interaction.

Dense Video Captioning Question Answering +3

Paper
Add Code

On Dynamic Programming Decompositions of Static Risk Measures in Markov Decision Processes

no code implementations • NeurIPS 2023 • Jia Lin Hau, Erick Delage, Mohammad Ghavamzadeh, Marek Petrik

However, we show that these popular decompositions for Conditional-Value-at-Risk (CVaR) and Entropic-Value-at-Risk (EVaR) are inherently suboptimal regardless of the discretization level.

Reinforcement Learning (RL)

Paper
Add Code

Private and Communication-Efficient Algorithms for Entropy Estimation

no code implementations • 12 May 2023 • Gecia Bravo-Hermsdorff, Róbert Busa-Fekete, Mohammad Ghavamzadeh, Andres Muñoz Medina, Umar Syed

For a joint distribution over many variables whose conditional independence is given by a tree, we describe algorithms for estimating Shannon entropy that require a number of samples that is linear in the number of variables, compared to the quadratic sample complexity of prior work.

Paper
Add Code

Bayesian Regret Minimization in Offline Bandits

no code implementations • 2 Jun 2023 • Mohammad Ghavamzadeh, Marek Petrik, Guy Tennenholtz

We study how to make decisions that minimize Bayesian regret in offline linear bandits.

Paper
Add Code

Factual and Personalized Recommendations using Language Models and Reinforcement Learning

no code implementations • 9 Oct 2023 • Jihwan Jeong, Yinlam Chow, Guy Tennenholtz, Chih-Wei Hsu, Azamat Tulepbergenov, Mohammad Ghavamzadeh, Craig Boutilier

Recommender systems (RSs) play a central role in connecting users to content, products, and services, matching candidate items to users based on their preferences.

Language Modelling Recommendation Systems +1

Paper
Add Code

Preference Elicitation with Soft Attributes in Interactive Recommendation

no code implementations • 22 Oct 2023 • Erdem Biyik, Fan Yao, Yinlam Chow, Alex Haig, Chih-Wei Hsu, Mohammad Ghavamzadeh, Craig Boutilier

Leveraging concept activation vectors for soft attribute semantics, we develop novel preference elicitation methods that can accommodate soft attributes and bring together both item and attribute-based preference elicitation.

Attribute Recommendation Systems

Paper
Add Code

Maximum Entropy Model Correction in Reinforcement Learning

no code implementations • 29 Nov 2023 • Amin Rakhsha, Mete Kemertas, Mohammad Ghavamzadeh, Amir-Massoud Farahmand

We propose and theoretically analyze an approach for planning with an approximate model in reinforcement learning that can reduce the adverse impact of model error.

Density Estimation reinforcement-learning

Paper
Add Code

Contextual Bandits with Stage-wise Constraints

no code implementations • 15 Jan 2024 • Aldo Pacchiano, Mohammad Ghavamzadeh, Peter Bartlett

In the setting that the constraint is in expectation, we further specialize our results to multi-armed bandits and propose a computationally efficient algorithm for this setting with regret analysis.

Multi-Armed Bandits

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.