no code implementations • 14 Nov 2024 • Wenlong Mou, Jian Qian
Specifically, the error upper bound of our estimator approaches the optimal variance achieved by TD, with an additional term depending on the exit probability of a selected subset of the state space.
no code implementations • 29 Oct 2024 • Jian Qian, Alexander Rakhlin, Nikita Zhivotovskiy
We revisit the sequential variants of linear regression with the squared loss, classification problems with hinge loss, and logistic regression, all characterized by unbounded losses in the setup where no assumptions are made on the magnitude of design vectors and the norm of the optimal vector of parameters.
no code implementations • 16 Oct 2024 • Zeyu Jia, Jian Qian, Alexander Rakhlin, Chen-Yu Wei
We show that a regret of $\Omega(\sqrt{d_\text{elu}\Lambda}+d_\text{elu})$ is unavoidable when $\sqrt{d_\text{elu}\Lambda}+d_\text{elu}\leq\sqrt{AT}$.
no code implementations • 7 Oct 2024 • Fan Chen, Dylan J. Foster, Yanjun Han, Jian Qian, Alexander Rakhlin, Yunbei Xu
Classical lower bound techniques -- such as Fano's method, Le Cam's method, and Assouad's lemma -- are central to the study of minimax risk in statistical estimation, yet are insufficient to provide tight lower bounds for \emph{interactive decision making} algorithms that collect data interactively (e. g., algorithms for bandits and reinforcement learning).
1 code implementation • 12 Sep 2024 • Jian Qian, Miao Sun, Ashley Lee, Jie Li, Shenglong Zhuo, Patrick Yin Chiang
The network consists of an input module for the depth map and RGB image features extraction and concatenation, a U-shaped encoder-decoder Transformer for extracting deep features, and a refinement module.
1 code implementation • 8 Jul 2024 • Jian Qian, Miao Sun, Sifan Zhou, Ziyu Zhao, Ruizhi Hun, Patrick Chiang
In Sub-SA, we design a submodular function that facilitates effective subset selection for annotation and demonstrates the characteristics of monotonically and submodularity from the theoretical perspective.
no code implementations • 5 Jul 2024 • Jian Qian, Bingyu Xie, Biao Wan, Minhao Li, Miao Sun, Patrick Yin Chiang
TimeLDM is composed of a variational autoencoder that encodes time series into an informative and smoothed latent content and a latent diffusion model operating in the latent space to generate latent information.
no code implementations • 28 May 2024 • Jian Qian, Haichen Hu, David Simchi-Levi
In this paper, we introduce a reduction from CMDPs to offline density estimation under the realizability assumption, i. e., a model class M containing the true underlying CMDP is provided in advance.
no code implementations • 15 Apr 2024 • Dylan J. Foster, Yanjun Han, Jian Qian, Alexander Rakhlin
Our main results settle the statistical and computational complexity of online estimation in this framework.
no code implementations • 3 Apr 2022 • Ali Jadbabaie, Haochuan Li, Jian Qian, Yi Tian
In this paper, we study a linear bandit optimization problem in a federated setting where a large collection of distributed agents collaboratively learn a common linear bandit model.
no code implementations • 27 Dec 2021 • Dylan J. Foster, Sham M. Kakade, Jian Qian, Alexander Rakhlin
The main result of this work provides a complexity measure, the Decision-Estimation Coefficient, that is proven to be both necessary and sufficient for sample-efficient interactive learning.
no code implementations • 1 Mar 2021 • Avrim Blum, Steve Hanneke, Jian Qian, Han Shao
We study the problem of robust learning under clean-label data-poisoning attacks, where the attacker injects (an arbitrary set of) correctly-labeled examples to the training set to fool the algorithm into making mistakes on specific test instances at test time.
no code implementations • 15 Oct 2020 • Xuedong Shang, Han Shao, Jian Qian
We study two goals: (a) finding the arm with the minimum $\ell^\infty$-norm of relative losses with a given confidence level (which refers to fixed-confidence best-arm identification); (b) minimizing the $\ell^\infty$-norm of cumulative relative losses (which refers to regret minimization).
no code implementations • NeurIPS 2020 • Yi Tian, Jian Qian, Suvrit Sra
We study minimax optimal reinforcement learning in episodic factored Markov decision processes (FMDPs), which are MDPs with conditionally independent transition components.
no code implementations • 30 Jan 2020 • Jian Qian, Ronan Fruit, Matteo Pirotta, Alessandro Lazaric
We investigate concentration inequalities for Dirichlet and Multinomial random variables.
1 code implementation • NeurIPS 2019 • Jian Qian, Ronan Fruit, Matteo Pirotta, Alessandro Lazaric
The exploration bonus is an effective approach to manage the exploration-exploitation trade-off in Markov Decision Processes (MDPs).
2 code implementations • NeurIPS 2019 • Matthew Schlegel, Wesley Chung, Daniel Graves, Jian Qian, Martha White
Importance sampling (IS) is a common reweighting strategy for off-policy prediction in reinforcement learning.
no code implementations • 11 Dec 2018 • Jian Qian, Ronan Fruit, Matteo Pirotta, Alessandro Lazaric
We introduce and analyse two algorithms for exploration-exploitation in discrete and continuous Markov Decision Processes (MDPs) based on exploration bonuses.