Search Results for author: Xuezhou Zhang

Found 31 papers, 7 papers with code

Training Set Debugging Using Trusted Items

no code implementations24 Jan 2018 Xuezhou Zhang, Xiaojin Zhu, Stephen J. Wright

The set of trusted items may not by itself be adequate for learning, so we propose an algorithm that uses these items to identify bugs in the training set and thus im- proves learning.

BIG-bench Machine Learning Bilevel Optimization

Teacher Improves Learning by Selecting a Training Subset

no code implementations25 Feb 2018 Yuzhe Ma, Robert Nowak, Philippe Rigollet, Xuezhou Zhang, Xiaojin Zhu

We call a learner super-teachable if a teacher can trim down an iid training set while making the learner learn even better.

General Classification regression

An Optimal Control Approach to Sequential Machine Teaching

no code implementations15 Oct 2018 Laurent Lessard, Xuezhou Zhang, Xiaojin Zhu

Our key insight is to formulate sequential machine teaching as a time-optimal control problem.

Axiomatic Interpretability for Multiclass Additive Models

1 code implementation22 Oct 2018 Xuezhou Zhang, Sarah Tan, Paul Koch, Yin Lou, Urszula Chajewska, Rich Caruana

In the first part of this paper, we generalize a state-of-the-art GAM learning algorithm based on boosted trees to the multiclass setting, and show that this multiclass algorithm outperforms existing GAM learning algorithms and sometimes matches the performance of full complexity models such as gradient boosted trees.

Additive models Binary Classification +1

Online Data Poisoning Attack

no code implementations5 Mar 2019 Xuezhou Zhang, Xiaojin Zhu, Laurent Lessard

We study data poisoning attacks in the online setting where training items arrive sequentially, and the attacker may perturb the current item to manipulate online learning.

Data Poisoning Model Predictive Control +2

Policy Poisoning in Batch Reinforcement Learning and Control

1 code implementation NeurIPS 2019 Yuzhe Ma, Xuezhou Zhang, Wen Sun, Xiaojin Zhu

We study a security threat to batch reinforcement learning and control where the attacker aims to poison the learned policy.

reinforcement-learning Reinforcement Learning (RL)

Adaptive Reward-Poisoning Attacks against Reinforcement Learning

no code implementations ICML 2020 Xuezhou Zhang, Yuzhe ma, Adish Singla, Xiaojin Zhu

In reward-poisoning attacks against reinforcement learning (RL), an attacker can perturb the environment reward $r_t$ into $r_t+\delta_t$ at each step, with the goal of forcing the RL agent to learn a nefarious policy.

reinforcement-learning Reinforcement Learning (RL)

Neural Additive Models: Interpretable Machine Learning with Neural Nets

6 code implementations NeurIPS 2021 Rishabh Agarwal, Levi Melnick, Nicholas Frosst, Xuezhou Zhang, Ben Lengerich, Rich Caruana, Geoffrey Hinton

They perform similarly to existing state-of-the-art generalized additive models in accuracy, but are more flexible because they are based on neural nets instead of boosted trees.

Additive models BIG-bench Machine Learning +3

Online Data Poisoning Attacks

no code implementations L4DC 2020 Xuezhou Zhang, Xiaojin Zhu, Laurent Lessard

We study data poisoning attacks in the online learning setting, where training data arrive sequentially, and the attacker is eavesdropping the data stream and has the ability to contaminate the current data point to affect the online learning process.

Data Poisoning Model Predictive Control +2

The Sample Complexity of Teaching-by-Reinforcement on Q-Learning

no code implementations16 Jun 2020 Xuezhou Zhang, Shubham Kumar Bharti, Yuzhe ma, Adish Singla, Xiaojin Zhu

Our TDim results provide the minimum number of samples needed for reinforcement learning, and we discuss their connections to standard PAC-style RL sample complexity and teaching-by-demonstration sample complexity results.

Q-Learning reinforcement-learning +1

Task-agnostic Exploration in Reinforcement Learning

no code implementations NeurIPS 2020 Xuezhou Zhang, Yuzhe ma, Adish Singla

To address these challenges, we propose the \textit{task-agnostic RL} framework: In the exploration phase, the agent first collects trajectories by exploring the MDP without the guidance of a reward function.

Efficient Exploration reinforcement-learning +1

Using Machine Teaching to Investigate Human Assumptions when Teaching Reinforcement Learners

no code implementations5 Sep 2020 Yun-Shiuan Chuang, Xuezhou Zhang, Yuzhe ma, Mark K. Ho, Joseph L. Austerweil, Xiaojin Zhu

To solve the machine teaching optimization problem, we use a deep learning approximation method which simulates learners in the environment and learns to predict how feedback affects the learner's internal states.

Q-Learning

Robust Policy Gradient against Strong Data Corruption

1 code implementation11 Feb 2021 Xuezhou Zhang, Yiding Chen, Xiaojin Zhu, Wen Sun

Our first result shows that no algorithm can find a better than $O(\epsilon)$-optimal policy under our attack model.

Continuous Control

Reward Poisoning in Reinforcement Learning: Attacks Against Unknown Learners in Unknown Environments

no code implementations16 Feb 2021 Amin Rakhsha, Xuezhou Zhang, Xiaojin Zhu, Adish Singla

We study black-box reward poisoning attacks against reinforcement learning (RL), in which an adversary aims to manipulate the rewards to mislead a sequence of RL agents with unknown algorithms to learn a nefarious policy in an environment unknown to the adversary a priori.

reinforcement-learning Reinforcement Learning (RL)

Controllable and Diverse Text Generation in E-commerce

no code implementations23 Feb 2021 Huajie Shao, Jun Wang, Haohong Lin, Xuezhou Zhang, Aston Zhang, Heng Ji, Tarek Abdelzaher

The algorithm is injected into a Conditional Variational Autoencoder (CVAE), allowing \textit{Apex} to control both (i) the order of keywords in the generated sentences (conditioned on the input keywords and their order), and (ii) the trade-off between diversity and accuracy.

Text Generation

Corruption-Robust Offline Reinforcement Learning

no code implementations11 Jun 2021 Xuezhou Zhang, Yiding Chen, Jerry Zhu, Wen Sun

Surprisingly, in this case, the knowledge of $\epsilon$ is necessary, as we show that being adaptive to unknown $\epsilon$ is impossible. This again contrasts with recent results on corruption-robust online RL and implies that robust offline RL is a strictly harder problem.

Adversarial Robustness Offline RL +2

Representation Learning for Online and Offline RL in Low-rank MDPs

no code implementations ICLR 2022 Masatoshi Uehara, Xuezhou Zhang, Wen Sun

This work studies the question of Representation Learning in RL: how can we learn a compact low-dimensional representation such that on top of the representation we can perform RL procedures such as exploration and exploitation, in a sample efficient manner.

Offline RL Representation Learning

Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning Approach

1 code implementation31 Jan 2022 Xuezhou Zhang, Yuda Song, Masatoshi Uehara, Mengdi Wang, Alekh Agarwal, Wen Sun

We present BRIEE (Block-structured Representation learning with Interleaved Explore Exploit), an algorithm for efficient reinforcement learning in Markov Decision Processes with block-structured dynamics (i. e., Block MDPs), where rich observations are generated from a set of unknown latent states.

reinforcement-learning Reinforcement Learning (RL) +1

Optimal Estimation of Off-Policy Policy Gradient via Double Fitted Iteration

no code implementations31 Jan 2022 Chengzhuo Ni, Ruiqi Zhang, Xiang Ji, Xuezhou Zhang, Mengdi Wang

Policy gradient (PG) estimation becomes a challenge when we are not allowed to sample with the target policy but only have access to a dataset generated by some unknown behavior policy.

Off-Policy Fitted Q-Evaluation with Differentiable Function Approximators: Z-Estimation and Inference Theory

no code implementations10 Feb 2022 Ruiqi Zhang, Xuezhou Zhang, Chengzhuo Ni, Mengdi Wang

We approach this problem using the Z-estimation theory and establish the following results: The FQE estimation error is asymptotically normal with explicit variance determined jointly by the tangent space of the function class at the ground truth, the reward structure, and the distribution shift due to off-policy learning; The finite-sample FQE error bound is dominated by the same variance term, and it can also be bounded by function class-dependent divergence, which measures how the off-policy distribution shift intertwines with the function approximator.

Off-policy evaluation

Provable Benefits of Representational Transfer in Reinforcement Learning

1 code implementation29 May 2022 Alekh Agarwal, Yuda Song, Wen Sun, Kaiwen Wang, Mengdi Wang, Xuezhou Zhang

We study the problem of representational transfer in RL, where an agent first pretrains in a number of source tasks to discover a shared representation, which is subsequently used to learn a good policy in a \emph{target task}.

reinforcement-learning Reinforcement Learning (RL) +1

Byzantine-Robust Online and Offline Distributed Reinforcement Learning

no code implementations1 Jun 2022 Yiding Chen, Xuezhou Zhang, Kaiqing Zhang, Mengdi Wang, Xiaojin Zhu

We consider a distributed reinforcement learning setting where multiple agents separately explore the environment and communicate their experiences through a central server.

reinforcement-learning Reinforcement Learning (RL)

Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization

no code implementations5 Jun 2022 Hui Yuan, Chengzhuo Ni, Huazheng Wang, Xuezhou Zhang, Le Cong, Csaba Szepesvári, Mengdi Wang

We propose a Thompson Sampling-guided Directed Evolution (TS-DE) framework for sequence optimization, where the sequence-to-function mapping is unknown and querying a single value is subject to costly and noisy measurements.

BIG-bench Machine Learning Evolutionary Algorithms +2

Decentralized Gossip-Based Stochastic Bilevel Optimization over Communication Networks

no code implementations22 Jun 2022 Shuoguang Yang, Xuezhou Zhang, Mengdi Wang

This paper studies the problem of distributed bilevel optimization over a network where agents can only communicate with neighbors, including examples from multi-task, multi-agent learning and federated learning.

Bilevel Optimization Federated Learning +3

Provably Efficient Reinforcement Learning for Online Adaptive Influence Maximization

no code implementations29 Jun 2022 Kaixuan Huang, Yu Wu, Xuezhou Zhang, Shenyinying Tu, Qingyun Wu, Mengdi Wang, Huazheng Wang

Online influence maximization aims to maximize the influence spread of a content in a social network with unknown network model by selecting a few seed nodes.

Model-based Reinforcement Learning reinforcement-learning +1

Representation Learning for General-sum Low-rank Markov Games

no code implementations30 Oct 2022 Chengzhuo Ni, Yuda Song, Xuezhou Zhang, Chi Jin, Mengdi Wang

To our best knowledge, this is the first sample-efficient algorithm for multi-agent general-sum Markov games that incorporates (non-linear) function approximation.

Representation Learning

Provable Defense against Backdoor Policies in Reinforcement Learning

1 code implementation18 Nov 2022 Shubham Kumar Bharti, Xuezhou Zhang, Adish Singla, Xiaojin Zhu

Instead, our defense mechanism sanitizes the backdoor policy by projecting observed states to a 'safe subspace', estimated from a small number of interactions with a clean (non-triggered) environment.

reinforcement-learning Reinforcement Learning (RL)

Provably Efficient Representation Learning with Tractable Planning in Low-Rank POMDP

no code implementations21 Jun 2023 Jiacheng Guo, Zihao Li, Huazheng Wang, Mengdi Wang, Zhuoran Yang, Xuezhou Zhang

In this paper, we study representation learning in partially observable Markov Decision Processes (POMDPs), where the agent learns a decoder function that maps a series of high-dimensional raw observations to a compact representation and uses it for more efficient exploration and planning.

Efficient Exploration Representation Learning

Improved Algorithms for Adversarial Bandits with Unbounded Losses

no code implementations3 Oct 2023 Mingyu Chen, Xuezhou Zhang

We consider the Adversarial Multi-Armed Bandits (MAB) problem with unbounded losses, where the algorithms have no prior knowledge on the sizes of the losses.

Multi-Armed Bandits

Federated Multi-Level Optimization over Decentralized Networks

no code implementations10 Oct 2023 Shuoguang Yang, Xuezhou Zhang, Mengdi Wang

Multi-level optimization has gained increasing attention in recent years, as it provides a powerful framework for solving complex optimization problems that arise in many fields, such as meta-learning, multi-player games, reinforcement learning, and nested composition optimization.

Distributed Optimization Meta-Learning +1

Scale-free Adversarial Reinforcement Learning

no code implementations1 Mar 2024 Mingyu Chen, Xuezhou Zhang

This paper initiates the study of scale-free learning in Markov Decision Processes (MDPs), where the scale of rewards/losses is unknown to the learner.

reinforcement-learning

Cannot find the paper you are looking for? You can Submit a new open access paper.