no code implementations • 22 Feb 2024 • Yuheng Zhang, Nan Jiang
We study off-policy evaluation (OPE) in partially observable environments with complex observations, with the goal of developing estimators whose guarantee avoids exponential dependence on the horizon.
no code implementations • 12 Feb 2024 • Mengxiao Zhang, Yuheng Zhang, Haipeng Luo, Paul Mineiro
Bandits with feedback graphs are powerful online learning models that interpolate between the full information and classic bandit problems, capturing many real-life applications.
no code implementations • 11 Feb 2024 • Chenlu Ye, Wei Xiong, Yuheng Zhang, Nan Jiang, Tong Zhang
In this work, we provide theoretical insights for a recently proposed learning paradigm, Nash learning from human feedback (NLHF), which considered a general preference model and formulated the alignment process as a game between two competitive LLMs.
no code implementations • 24 Nov 2023 • Yuheng Zhang, Pin Liu, Guojun Wang, Peiqiang Li, Wanyi Gu, Houji Chen, Xuelei Liu, Jinyao Zhu
Front-running attacks, a unique form of security threat, pose significant challenges to the integrity of blockchain transactions.
no code implementations • 6 Feb 2023 • Yuheng Zhang, Yu Bai, Nan Jiang
We study offline multi-agent reinforcement learning (RL) in Markov games, where the goal is to learn an approximate equilibrium -- such as Nash equilibrium and (Coarse) Correlated Equilibrium -- from an offline dataset pre-collected from the game.
Multi-agent Reinforcement Learning Reinforcement Learning (RL)
no code implementations • 4 Oct 2022 • Haipeng Luo, Hanghang Tong, Mengxiao Zhang, Yuheng Zhang
For general strongly observable graphs, we develop an algorithm that achieves the optimal regret $\widetilde{\mathcal{O}}((\sum_{t=1}^T\alpha_t)^{1/2}+\max_{t\in[T]}\alpha_t)$ with high probability, where $\alpha_t$ is the independence number of the feedback graph at round $t$.
1 code implementation • 2 Oct 2022 • Yikun Ban, Yuheng Zhang, Hanghang Tong, Arindam Banerjee, Jingrui He
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting.
2 code implementations • 11 Sep 2020 • Tianhao Wang, Yuheng Zhang, Ruoxi Jia
This paper studies defense mechanisms against model inversion (MI) attacks -- a type of privacy attacks aimed at inferring information about the training data distribution given the access to a target machine learning model.
no code implementations • 7 Aug 2020 • Haiping Zhu, Hongming Shan, Yuheng Zhang, Lingfu Che, Xiaoyang Xu, Junping Zhang, Jianbo Shi, Fei-Yue Wang
We propose a novel ordinal regression approach, termed Convolutional Ordinal Regression Forest or CORF, for image ordinal estimation, which can integrate ordinal regression and differentiable decision trees with a convolutional neural network for obtaining precise and stable global ordinal relationships.
1 code implementation • CVPR 2020 • Yuheng Zhang, Ruoxi Jia, Hengzhi Pei, Wenxiao Wang, Bo Li, Dawn Song
This paper studies model-inversion attacks, in which the access to a model is abused to infer information about the training data.
no code implementations • 27 May 2019 • Haiping Zhu, Yuheng Zhang, Guohao Li, Junping Zhang, Hongming Shan
This paper proposes an ordinal distribution regression with a global and local convolutional neural network for gait-based age estimation.