Search Results for author: Wenpin Tang

Found 19 papers, 1 papers with code

Analysis of the Order Flow Auction under Proposer-Builder Separation

no code implementations17 Feb 2025 Ruofei Ma, Wenpin Tang, David Yao

In this paper, we consider the impact of the order flow auction (OFA) in the context of the proposer-builder separation (PBS) mechanism through a game-theoretic perspective.

Score as Action: Fine-Tuning Diffusion Generative Models by Continuous-time Reinforcement Learning

no code implementations3 Feb 2025 Hanyang Zhao, Haoxian Chen, Ji Zhang, David D. Yao, Wenpin Tang

Reinforcement learning from human feedback (RLHF), which aligns a diffusion model with input prompt, has become a crucial step in building reliable generative AI models.

Regret of exploratory policy improvement and $q$-learning

no code implementations2 Nov 2024 Wenpin Tang, Xun Yu Zhou

We study the convergence of $q$-learning and related algorithms introduced by Jia and Zhou (J. Mach.

Q-Learning

RainbowPO: A Unified Framework for Combining Improvements in Preference Optimization

no code implementations5 Oct 2024 Hanyang Zhao, Genta Indra Winata, Anirban Das, Shi-Xiong Zhang, David D. Yao, Wenpin Tang, Sambit Sahu

Recently, numerous preference optimization algorithms have been introduced as extensions to the Direct Preference Optimization (DPO) family.

Scores as Actions: a framework of fine-tuning diffusion models by continuous-time reinforcement learning

no code implementations12 Sep 2024 Hanyang Zhao, Haoxian Chen, Ji Zhang, David D. Yao, Wenpin Tang

Reinforcement Learning from human feedback (RLHF) has been shown a promising direction for aligning generative models with human intent and has also been explored in recent works for alignment of diffusion generative models.

reinforcement-learning Reinforcement Learning +1

MallowsPO: Fine-Tune Your LLM with Preference Dispersions

no code implementations23 May 2024 Haoxian Chen, Hanyang Zhao, Henry Lam, David Yao, Wenpin Tang

Direct Preference Optimization (DPO) has recently emerged as a popular approach to improve reinforcement learning with human feedback (RLHF), leading to better techniques to fine-tune large language models (LLM).

Diversity

Fine-tuning of diffusion models via stochastic control: entropy regularization and beyond

no code implementations10 Mar 2024 Wenpin Tang

This paper aims to develop and provide a rigorous treatment to the problem of entropy regularized fine-tuning in the context of continuous-time diffusion models, which was recently proposed by Uehara et al. (arXiv:2402. 15194, 2024).

Score-based Diffusion Models via Stochastic Differential Equations -- a Technical Tutorial

no code implementations12 Feb 2024 Wenpin Tang, Hanyang Zhao

This is an expository article on the score-based diffusion models, with a particular focus on the formulation via stochastic differential equations (SDE).

reinforcement-learning

Contractive Diffusion Probabilistic Models

no code implementations23 Jan 2024 Wenpin Tang, Hanyang Zhao

In view of possibly unguaranteed score matching, we propose a new criterion -- the contraction of backward sampling in the design of DPMs, leading to a novel class of contractive DPMs (CDPMs).

Trading and wealth evolution in the Proof of Stake protocol

no code implementations3 Aug 2023 Wenpin Tang

With the increasing adoption of the Proof of Stake (PoS) blockchain, it is timely to study the economy created by such blockchain.

POS

Trading under the Proof-of-Stake Protocol -- a Continuous-Time Control Approach

no code implementations26 Jul 2022 Wenpin Tang, David D. Yao

In particular, we show when a participant is risk-neutral or risk-seeking, corresponding to the risk-adjusted valuation being a martingale or a sub-martingale, the optimal strategy must be to either buy all the time, sell all the time, or first buy then sell, and with both buying and selling executed at full capacity.

POS

Stability of shares in the Proof of Stake Protocol -- Concentration and Phase Transitions

no code implementations5 Jun 2022 Wenpin Tang

This paper is concerned with the stability of shares in a cryptocurrency where the new coins are issued according to the Proof of Stake protocol.

POS

Asset Selection via Correlation Blockmodel Clustering

no code implementations26 Mar 2021 Wenpin Tang, Xiao Xu, Xun Yu Zhou

Finally, we conduct an empirical analysis to verify the performance of the algorithm.

Clustering

Simulated annealing from continuum to discretization: a convergence analysis via the Eyring--Kramers law

no code implementations3 Feb 2021 Wenpin Tang, Xun Yu Zhou

in cumulative step size), and provide an explicit rate as a function of the model parameters.

Learning an arbitrary mixture of two multinomial logits

no code implementations1 Jul 2020 Wenpin Tang

In this paper, we consider mixtures of multinomial logistic models (MNL), which are known to $\epsilon$-approximate any random utility model.

Vocal Bursts Valence Prediction

Escaping Saddle Points Efficiently with Occupation-Time-Adapted Perturbations

no code implementations9 May 2020 Xin Guo, Jiequn Han, Mahan Tajrobehkar, Wenpin Tang

Motivated by the super-diffusivity of self-repelling random walk, which has roots in statistical physics, this paper develops a new perturbation mechanism for optimization algorithms.

On identifiability and consistency of the nugget in Gaussian spatial process models

1 code implementation15 Aug 2019 Wenpin Tang, Lu Zhang, Sudipto Banerjee

We formally establish results on the identifiability and consistency of the nugget in spatial models based upon the Gaussian process within the framework of in-fill asymptotics, i. e. the sample size increases within a sampling domain that is bounded.

Spatial Interpolation Statistics Theory Statistics Theory

A class of stochastic games and moving free boundary problems

no code implementations10 Sep 2018 Xin Guo, Wenpin Tang, Renyuan Xu

In this paper we propose and analyze a class of $N$-player stochastic games that include finite fuel stochastic games as a special case.

Cannot find the paper you are looking for? You can Submit a new open access paper.