1 code implementation • 10 Oct 2024 • Shenao Zhang, Zhihan Liu, Boyi Liu, Yufeng Zhang, Yingxiang Yang, Yongfei Liu, Liyu Chen, Tao Sun, Zhaoran Wang
This dataset is easily integrated with existing direct alignment algorithms and is applicable to any preference dataset.
no code implementations • 26 May 2024 • Zhihan Liu, Miao Lu, Shenao Zhang, Boyi Liu, Hongyi Guo, Yingxiang Yang, Jose Blanchet, Zhaoran Wang
To mitigate overoptimization, we first propose a theoretical algorithm that chooses the best policy for an adversarially chosen reward model; one that simultaneously minimizes the maximum likelihood estimation of the loss and a reward penalty term.
no code implementations • 11 Mar 2024 • Yufeng Zhang, Liyu Chen, Boyi Liu, Yingxiang Yang, Qiwen Cui, Yunzhe Tao, Hongxia Yang
Recent advances in reinforcement learning (RL) algorithms aim to enhance the performance of language models at scale.
1 code implementation • 25 Feb 2024 • Shenao Zhang, Sirui Zheng, Shuqi Ke, Zhihan Liu, Wanxin Jin, Jianbo Yuan, Yingxiang Yang, Hongxia Yang, Zhaoran Wang
Specifically, we develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning, particularly when the difference between the ideal policy and the LLM-informed policy is small, which suggests that the initial policy is close to optimal, reducing the need for further exploration.
no code implementations • 28 Nov 2023 • Xiaohui Chen, Yongfei Liu, Yingxiang Yang, Jianbo Yuan, Quanzeng You, Li-Ping Liu, Hongxia Yang
Recent advancements in text-to-image (T2I) generative models have shown remarkable capabilities in producing diverse and imaginative visuals based on text prompts.
no code implementations • 10 Oct 2023 • Chau Pham, Boyi Liu, Yingxiang Yang, Zhengyu Chen, Tianyi Liu, Jianbo Yuan, Bryan A. Plummer, Zhaoran Wang, Hongxia Yang
Although natural language is an obvious choice for communication due to LLM's language understanding capability, the token sampling step needed when generating natural language poses a potential risk of information loss, as it uses only one token to represent the model's belief across the entire vocabulary.
no code implementations • NeurIPS 2020 • Yingxiang Yang, Negar Kiyavash, Le Song, Niao He
Macroscopic data aggregated from microscopic events are pervasive in machine learning, such as country-level COVID-19 infection statistics based on city-level data.
no code implementations • NeurIPS 2019 • Yingxiang Yang, Haoxiang Wang, Negar Kiyavash, Niao He
The nonparametric learning of positive-valued functions appears widely in machine learning, especially in the context of estimating intensity functions of point processes.
no code implementations • NeurIPS 2018 • Yingxiang Yang, Bo Dai, Negar Kiyavash, Niao He
Approximate Bayesian computation (ABC) is an important methodology for Bayesian inference when the likelihood function is intractable.
no code implementations • 11 Mar 2018 • Yingxiang Yang, Adams Wei Yu, Zhaoran Wang, Tuo Zhao
We propose a nonparametric method for detecting nonlinear causal relationship within a set of multidimensional discrete time series, by using sparse additive models (SpAMs).
no code implementations • 25 Jan 2018 • Yingxiang Yang, Jalal Etesami, Niao He, Negar Kiyavash
In this paper, we design a nonparametric online algorithm for estimating the triggering functions of multivariate Hawkes processes.
no code implementations • NeurIPS 2017 • Yingxiang Yang, Jalal Etesami, Niao He, Negar Kiyavash
We develop a nonparametric and online learning algorithm that estimates the triggering functions of a multivariate Hawkes process (MHP).
no code implementations • 22 Sep 2015 • Yingxiang Yang, Jalal Etesami, Negar Kiyavash
This paper addresses the problem of neighborhood selection for Gaussian graphical models.