Search Results for author: Xuezhou Zhang

Found 25 papers, 6 papers with code

Provably Efficient Reinforcement Learning for Online Adaptive Influence Maximization

no code implementations29 Jun 2022 Kaixuan Huang, Yu Wu, Xuezhou Zhang, Shenyinying Tu, Qingyun Wu, Mengdi Wang, Huazheng Wang

Online influence maximization aims to maximize the influence spread of a content in a social network with unknown network model by selecting a few seed nodes.

Model-based Reinforcement Learning reinforcement-learning

Decentralized Gossip-Based Stochastic Bilevel Optimization over Communication Networks

no code implementations22 Jun 2022 Shuoguang Yang, Xuezhou Zhang, Mengdi Wang

This paper studies the problem of distributed bilevel optimization over a network where agents can only communicate with neighbors, including examples from multi-task, multi-agent learning and federated learning.

Bilevel Optimization Federated Learning +2

Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization

no code implementations5 Jun 2022 Hui Yuan, Chengzhuo Ni, Huazheng Wang, Xuezhou Zhang, Le Cong, Csaba Szepesvári, Mengdi Wang

We propose a Thompson Sampling-guided Directed Evolution (TS-DE) framework for sequence optimization, where the sequence-to-function mapping is unknown and querying a single value is subject to costly and noisy measurements.

Learning Theory Machine Learning

Byzantine-Robust Online and Offline Distributed Reinforcement Learning

no code implementations1 Jun 2022 Yiding Chen, Xuezhou Zhang, Kaiqing Zhang, Mengdi Wang, Xiaojin Zhu

We consider a distributed reinforcement learning setting where multiple agents separately explore the environment and communicate their experiences through a central server.


Provable Benefits of Representational Transfer in Reinforcement Learning

1 code implementation29 May 2022 Alekh Agarwal, Yuda Song, Wen Sun, Kaiwen Wang, Mengdi Wang, Xuezhou Zhang

We study the problem of representational transfer in RL, where an agent first pretrains in a number of source tasks to discover a shared representation, which is subsequently used to learn a good policy in a target task.

reinforcement-learning Representation Learning

Off-Policy Fitted Q-Evaluation with Differentiable Function Approximators: Z-Estimation and Inference Theory

no code implementations10 Feb 2022 Ruiqi Zhang, Xuezhou Zhang, Chengzhuo Ni, Mengdi Wang

We approach this problem using the Z-estimation theory and establish the following results: The FQE estimation error is asymptotically normal with explicit variance determined jointly by the tangent space of the function class at the ground truth, the reward structure, and the distribution shift due to off-policy learning; The finite-sample FQE error bound is dominated by the same variance term, and it can also be bounded by function class-dependent divergence, which measures how the off-policy distribution shift intertwines with the function approximator.

Optimal Estimation of Off-Policy Policy Gradient via Double Fitted Iteration

no code implementations31 Jan 2022 Chengzhuo Ni, Ruiqi Zhang, Xiang Ji, Xuezhou Zhang, Mengdi Wang

Policy gradient (PG) estimation becomes a challenge when we are not allowed to sample with the target policy but only have access to a dataset generated by some unknown behavior policy.

Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning Approach

1 code implementation31 Jan 2022 Xuezhou Zhang, Yuda Song, Masatoshi Uehara, Mengdi Wang, Alekh Agarwal, Wen Sun

We present BRIEE (Block-structured Representation learning with Interleaved Explore Exploit), an algorithm for efficient reinforcement learning in Markov Decision Processes with block-structured dynamics (i. e., Block MDPs), where rich observations are generated from a set of unknown latent states.

reinforcement-learning Representation Learning

Representation Learning for Online and Offline RL in Low-rank MDPs

no code implementations ICLR 2022 Masatoshi Uehara, Xuezhou Zhang, Wen Sun

This work studies the question of Representation Learning in RL: how can we learn a compact low-dimensional representation such that on top of the representation we can perform RL procedures such as exploration and exploitation, in a sample efficient manner.

Offline RL Representation Learning

Corruption-Robust Offline Reinforcement Learning

no code implementations11 Jun 2021 Xuezhou Zhang, Yiding Chen, Jerry Zhu, Wen Sun

Surprisingly, in this case, the knowledge of $\epsilon$ is necessary, as we show that being adaptive to unknown $\epsilon$ is impossible. This again contrasts with recent results on corruption-robust online RL and implies that robust offline RL is a strictly harder problem.

Adversarial Robustness Offline RL +1

Controllable and Diverse Text Generation in E-commerce

no code implementations23 Feb 2021 Huajie Shao, Jun Wang, Haohong Lin, Xuezhou Zhang, Aston Zhang, Heng Ji, Tarek Abdelzaher

The algorithm is injected into a Conditional Variational Autoencoder (CVAE), allowing \textit{Apex} to control both (i) the order of keywords in the generated sentences (conditioned on the input keywords and their order), and (ii) the trade-off between diversity and accuracy.

Text Generation

Reward Poisoning in Reinforcement Learning: Attacks Against Unknown Learners in Unknown Environments

no code implementations16 Feb 2021 Amin Rakhsha, Xuezhou Zhang, Xiaojin Zhu, Adish Singla

We study black-box reward poisoning attacks against reinforcement learning (RL), in which an adversary aims to manipulate the rewards to mislead a sequence of RL agents with unknown algorithms to learn a nefarious policy in an environment unknown to the adversary a priori.


Robust Policy Gradient against Strong Data Corruption

1 code implementation11 Feb 2021 Xuezhou Zhang, Yiding Chen, Xiaojin Zhu, Wen Sun

Our first result shows that no algorithm can find a better than $O(\epsilon)$-optimal policy under our attack model.

Continuous Control

Using Machine Teaching to Investigate Human Assumptions when Teaching Reinforcement Learners

no code implementations5 Sep 2020 Yun-Shiuan Chuang, Xuezhou Zhang, Yuzhe ma, Mark K. Ho, Joseph L. Austerweil, Xiaojin Zhu

To solve the machine teaching optimization problem, we use a deep learning approximation method which simulates learners in the environment and learns to predict how feedback affects the learner's internal states.


The Sample Complexity of Teaching-by-Reinforcement on Q-Learning

no code implementations16 Jun 2020 Xuezhou Zhang, Shubham Kumar Bharti, Yuzhe ma, Adish Singla, Xiaojin Zhu

Our TDim results provide the minimum number of samples needed for reinforcement learning, and we discuss their connections to standard PAC-style RL sample complexity and teaching-by-demonstration sample complexity results.

Q-Learning reinforcement-learning

Task-agnostic Exploration in Reinforcement Learning

no code implementations NeurIPS 2020 Xuezhou Zhang, Yuzhe ma, Adish Singla

To address these challenges, we propose the \textit{task-agnostic RL} framework: In the exploration phase, the agent first collects trajectories by exploring the MDP without the guidance of a reward function.

Efficient Exploration reinforcement-learning

Online Data Poisoning Attacks

no code implementations L4DC 2020 Xuezhou Zhang, Xiaojin Zhu, Laurent Lessard

We study data poisoning attacks in the online learning setting, where training data arrive sequentially, and the attacker is eavesdropping the data stream and has the ability to contaminate the current data point to affect the online learning process.

Data Poisoning online learning +1

Neural Additive Models: Interpretable Machine Learning with Neural Nets

6 code implementations NeurIPS 2021 Rishabh Agarwal, Levi Melnick, Nicholas Frosst, Xuezhou Zhang, Ben Lengerich, Rich Caruana, Geoffrey Hinton

They perform similarly to existing state-of-the-art generalized additive models in accuracy, but are more flexible because they are based on neural nets instead of boosted trees.

Additive models Decision Making +2

Adaptive Reward-Poisoning Attacks against Reinforcement Learning

no code implementations ICML 2020 Xuezhou Zhang, Yuzhe ma, Adish Singla, Xiaojin Zhu

In reward-poisoning attacks against reinforcement learning (RL), an attacker can perturb the environment reward $r_t$ into $r_t+\delta_t$ at each step, with the goal of forcing the RL agent to learn a nefarious policy.


Policy Poisoning in Batch Reinforcement Learning and Control

1 code implementation NeurIPS 2019 Yuzhe Ma, Xuezhou Zhang, Wen Sun, Xiaojin Zhu

We study a security threat to batch reinforcement learning and control where the attacker aims to poison the learned policy.


Online Data Poisoning Attack

no code implementations5 Mar 2019 Xuezhou Zhang, Xiaojin Zhu, Laurent Lessard

We study data poisoning attacks in the online setting where training items arrive sequentially, and the attacker may perturb the current item to manipulate online learning.

Data Poisoning online learning +1

Axiomatic Interpretability for Multiclass Additive Models

2 code implementations22 Oct 2018 Xuezhou Zhang, Sarah Tan, Paul Koch, Yin Lou, Urszula Chajewska, Rich Caruana

In the first part of this paper, we generalize a state-of-the-art GAM learning algorithm based on boosted trees to the multiclass setting, and show that this multiclass algorithm outperforms existing GAM learning algorithms and sometimes matches the performance of full complexity models such as gradient boosted trees.

Additive models

An Optimal Control Approach to Sequential Machine Teaching

no code implementations15 Oct 2018 Laurent Lessard, Xuezhou Zhang, Xiaojin Zhu

Our key insight is to formulate sequential machine teaching as a time-optimal control problem.

Teacher Improves Learning by Selecting a Training Subset

no code implementations25 Feb 2018 Yuzhe Ma, Robert Nowak, Philippe Rigollet, Xuezhou Zhang, Xiaojin Zhu

We call a learner super-teachable if a teacher can trim down an iid training set while making the learner learn even better.

General Classification

Training Set Debugging Using Trusted Items

no code implementations24 Jan 2018 Xuezhou Zhang, Xiaojin Zhu, Stephen J. Wright

The set of trusted items may not by itself be adequate for learning, so we propose an algorithm that uses these items to identify bugs in the training set and thus im- proves learning.

Bilevel Optimization Machine Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.