no code implementations • 24 Feb 2025 • Yuheng Zhang, Dian Yu, Tao Ge, Linfeng Song, Zhichen Zeng, Haitao Mi, Nan Jiang, Dong Yu
Reinforcement learning from human feedback (RLHF) has demonstrated remarkable effectiveness in aligning large language models (LLMs) with human preferences.
no code implementations • 22 Dec 2024 • Dian Yu, Yuheng Zhang, Jiahao Xu, Tian Liang, Linfeng Song, Zhaopeng Tu, Haitao Mi, Dong Yu
We propose CaP, a novel approach that uses external tools to refine chain-of-thought (CoT) responses generated by the same or other LLMs.
no code implementations • 6 Dec 2024 • Yuheng Zhang, Yuan Yuan, Jingtao Ding, Jian Yuan, Yong Li
In this paper, we propose CoDiffMob, a diffusion method for urban mobility generation with collaborative noise priors, we emphasize the critical role of noise in diffusion models for generating mobility data.
no code implementations • 21 Nov 2024 • Jingtao Ding, Yunke Zhang, Yu Shang, Yuheng Zhang, Zefang Zong, Jie Feng, Yuan Yuan, Hongyuan Su, Nian Li, Nicholas Sukiennik, Fengli Xu, Yong Li
The concept of world models has garnered significant attention due to advancements in multimodal large language models such as GPT-4 and video generation models such as Sora, which are central to the pursuit of artificial general intelligence.
no code implementations • 30 Jun 2024 • Yuheng Zhang, Dian Yu, Baolin Peng, Linfeng Song, Ye Tian, Mingyue Huo, Nan Jiang, Haitao Mi, Dong Yu
Specifically, we formulate the problem as a two-player game and propose a novel online algorithm, iterative Nash policy optimization (INPO).
no code implementations • 31 May 2024 • Mengxiao Zhang, Yuheng Zhang, Haipeng Luo, Paul Mineiro
Interactive-Grounded Learning (IGL) [Xie et al., 2021] is a powerful framework in which a learner aims at maximizing unobservable rewards through interacting with an environment and observing reward-dependent feedback on the taken actions.
no code implementations • 22 Feb 2024 • Yuheng Zhang, Nan Jiang
We study off-policy evaluation (OPE) in partially observable environments with complex observations, with the goal of developing estimators whose guarantee avoids exponential dependence on the horizon.
no code implementations • 12 Feb 2024 • Mengxiao Zhang, Yuheng Zhang, Haipeng Luo, Paul Mineiro
Bandits with feedback graphs are powerful online learning models that interpolate between the full information and classic bandit problems, capturing many real-life applications.
1 code implementation • 11 Feb 2024 • Chenlu Ye, Wei Xiong, Yuheng Zhang, Hanze Dong, Nan Jiang, Tong Zhang
We investigate Reinforcement Learning from Human Feedback (RLHF) in the context of a general preference oracle.
no code implementations • 24 Nov 2023 • Yuheng Zhang, Pin Liu, Guojun Wang, Peiqiang Li, Wanyi Gu, Houji Chen, Xuelei Liu, Jinyao Zhu
Front-running attacks, a unique form of security threat, pose significant challenges to the integrity of blockchain transactions.
no code implementations • 6 Feb 2023 • Yuheng Zhang, Yu Bai, Nan Jiang
We study offline multi-agent reinforcement learning (RL) in Markov games, where the goal is to learn an approximate equilibrium -- such as Nash equilibrium and (Coarse) Correlated Equilibrium -- from an offline dataset pre-collected from the game.
Multi-agent Reinforcement Learning
Reinforcement Learning (RL)
no code implementations • 4 Oct 2022 • Haipeng Luo, Hanghang Tong, Mengxiao Zhang, Yuheng Zhang
For general strongly observable graphs, we develop an algorithm that achieves the optimal regret $\widetilde{\mathcal{O}}((\sum_{t=1}^T\alpha_t)^{1/2}+\max_{t\in[T]}\alpha_t)$ with high probability, where $\alpha_t$ is the independence number of the feedback graph at round $t$.
1 code implementation • 2 Oct 2022 • Yikun Ban, Yuheng Zhang, Hanghang Tong, Arindam Banerjee, Jingrui He
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting.
2 code implementations • 11 Sep 2020 • Tianhao Wang, Yuheng Zhang, Ruoxi Jia
This paper studies defense mechanisms against model inversion (MI) attacks -- a type of privacy attacks aimed at inferring information about the training data distribution given the access to a target machine learning model.
no code implementations • 7 Aug 2020 • Haiping Zhu, Hongming Shan, Yuheng Zhang, Lingfu Che, Xiaoyang Xu, Junping Zhang, Jianbo Shi, Fei-Yue Wang
We propose a novel ordinal regression approach, termed Convolutional Ordinal Regression Forest or CORF, for image ordinal estimation, which can integrate ordinal regression and differentiable decision trees with a convolutional neural network for obtaining precise and stable global ordinal relationships.
1 code implementation • CVPR 2020 • Yuheng Zhang, Ruoxi Jia, Hengzhi Pei, Wenxiao Wang, Bo Li, Dawn Song
This paper studies model-inversion attacks, in which the access to a model is abused to infer information about the training data.
no code implementations • 27 May 2019 • Haiping Zhu, Yuheng Zhang, Guohao Li, Junping Zhang, Hongming Shan
This paper proposes an ordinal distribution regression with a global and local convolutional neural network for gait-based age estimation.