Search Results for author: Yuheng Zhang

Found 17 papers, 4 papers with code

Improving LLM General Preference Alignment via Optimistic Online Mirror Descent

no code implementations24 Feb 2025 Yuheng Zhang, Dian Yu, Tao Ge, Linfeng Song, Zhichen Zeng, Haitao Mi, Nan Jiang, Dong Yu

Reinforcement learning from human feedback (RLHF) has demonstrated remarkable effectiveness in aligning large language models (LLMs) with human preferences.

Teaching LLMs to Refine with Tools

no code implementations22 Dec 2024 Dian Yu, Yuheng Zhang, Jiahao Xu, Tian Liang, Linfeng Song, Zhaopeng Tu, Haitao Mi, Dong Yu

We propose CaP, a novel approach that uses external tools to refine chain-of-thought (CoT) responses generated by the same or other LLMs.

Noise Matters: Diffusion Model-based Urban Mobility Generation with Collaborative Noise Priors

no code implementations6 Dec 2024 Yuheng Zhang, Yuan Yuan, Jingtao Ding, Jian Yuan, Yong Li

In this paper, we propose CoDiffMob, a diffusion method for urban mobility generation with collaborative noise priors, we emphasize the critical role of noise in diffusion models for generating mobility data.

Image Generation

Understanding World or Predicting Future? A Comprehensive Survey of World Models

no code implementations21 Nov 2024 Jingtao Ding, Yunke Zhang, Yu Shang, Yuheng Zhang, Zefang Zong, Jie Feng, Yuan Yuan, Hongyuan Su, Nian Li, Nicholas Sukiennik, Fengli Xu, Yong Li

The concept of world models has garnered significant attention due to advancements in multimodal large language models such as GPT-4 and video generation models such as Sora, which are central to the pursuit of artificial general intelligence.

Autonomous Driving Decision Making +1

Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning

no code implementations30 Jun 2024 Yuheng Zhang, Dian Yu, Baolin Peng, Linfeng Song, Ye Tian, Mingyue Huo, Nan Jiang, Haitao Mi, Dong Yu

Specifically, we formulate the problem as a two-player game and propose a novel online algorithm, iterative Nash policy optimization (INPO).

Provably Efficient Interactive-Grounded Learning with Personalized Reward

no code implementations31 May 2024 Mengxiao Zhang, Yuheng Zhang, Haipeng Luo, Paul Mineiro

Interactive-Grounded Learning (IGL) [Xie et al., 2021] is a powerful framework in which a learner aims at maximizing unobservable rewards through interacting with an environment and observing reward-dependent feedback on the taken actions.

Recommendation Systems

On the Curses of Future and History in Future-dependent Value Functions for Off-policy Evaluation

no code implementations22 Feb 2024 Yuheng Zhang, Nan Jiang

We study off-policy evaluation (OPE) in partially observable environments with complex observations, with the goal of developing estimators whose guarantee avoids exponential dependence on the horizon.

Off-policy evaluation

Efficient Contextual Bandits with Uninformed Feedback Graphs

no code implementations12 Feb 2024 Mengxiao Zhang, Yuheng Zhang, Haipeng Luo, Paul Mineiro

Bandits with feedback graphs are powerful online learning models that interpolate between the full information and classic bandit problems, capturing many real-life applications.

Multi-Armed Bandits regression

Online Iterative Reinforcement Learning from Human Feedback with General Preference Model

1 code implementation11 Feb 2024 Chenlu Ye, Wei Xiong, Yuheng Zhang, Hanze Dong, Nan Jiang, Tong Zhang

We investigate Reinforcement Learning from Human Feedback (RLHF) in the context of a general preference oracle.

FRAD: Front-Running Attacks Detection on Ethereum using Ternary Classification Model

no code implementations24 Nov 2023 Yuheng Zhang, Pin Liu, Guojun Wang, Peiqiang Li, Wanyi Gu, Houji Chen, Xuelei Liu, Jinyao Zhu

Front-running attacks, a unique form of security threat, pose significant challenges to the integrity of blockchain transactions.

Offline Learning in Markov Games with General Function Approximation

no code implementations6 Feb 2023 Yuheng Zhang, Yu Bai, Nan Jiang

We study offline multi-agent reinforcement learning (RL) in Markov games, where the goal is to learn an approximate equilibrium -- such as Nash equilibrium and (Coarse) Correlated Equilibrium -- from an offline dataset pre-collected from the game.

Multi-agent Reinforcement Learning Reinforcement Learning (RL)

Improved High-Probability Regret for Adversarial Bandits with Time-Varying Feedback Graphs

no code implementations4 Oct 2022 Haipeng Luo, Hanghang Tong, Mengxiao Zhang, Yuheng Zhang

For general strongly observable graphs, we develop an algorithm that achieves the optimal regret $\widetilde{\mathcal{O}}((\sum_{t=1}^T\alpha_t)^{1/2}+\max_{t\in[T]}\alpha_t)$ with high probability, where $\alpha_t$ is the independence number of the feedback graph at round $t$.

Multi-Armed Bandits

Improved Algorithms for Neural Active Learning

1 code implementation2 Oct 2022 Yikun Ban, Yuheng Zhang, Hanghang Tong, Arindam Banerjee, Jingrui He

We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting.

Active Learning

Improving Robustness to Model Inversion Attacks via Mutual Information Regularization

2 code implementations11 Sep 2020 Tianhao Wang, Yuheng Zhang, Ruoxi Jia

This paper studies defense mechanisms against model inversion (MI) attacks -- a type of privacy attacks aimed at inferring information about the training data distribution given the access to a target machine learning model.

Convolutional Ordinal Regression Forest for Image Ordinal Estimation

no code implementations7 Aug 2020 Haiping Zhu, Hongming Shan, Yuheng Zhang, Lingfu Che, Xiaoyang Xu, Junping Zhang, Jianbo Shi, Fei-Yue Wang

We propose a novel ordinal regression approach, termed Convolutional Ordinal Regression Forest or CORF, for image ordinal estimation, which can integrate ordinal regression and differentiable decision trees with a convolutional neural network for obtaining precise and stable global ordinal relationships.

Age Estimation Binary Classification +1

The Secret Revealer: Generative Model-Inversion Attacks Against Deep Neural Networks

1 code implementation CVPR 2020 Yuheng Zhang, Ruoxi Jia, Hengzhi Pei, Wenxiao Wang, Bo Li, Dawn Song

This paper studies model-inversion attacks, in which the access to a model is abused to infer information about the training data.

Face Recognition regression

Ordinal Distribution Regression for Gait-based Age Estimation

no code implementations27 May 2019 Haiping Zhu, Yuheng Zhang, Guohao Li, Junping Zhang, Hongming Shan

This paper proposes an ordinal distribution regression with a global and local convolutional neural network for gait-based age estimation.

Age Estimation regression

Cannot find the paper you are looking for? You can Submit a new open access paper.