1 code implementation • 23 Dec 2024 • Ermo Hua, Che Jiang, Xingtai Lv, Kaiyan Zhang, Ning Ding, Youbang Sun, Biqing Qi, Yuchen Fan, Xuekai Zhu, BoWen Zhou
Extending the context length of Language Models (LMs) by improving Rotary Position Embedding (RoPE) has become a trend.
no code implementations • 19 Dec 2024 • Xuekai Zhu, Daixuan Cheng, Hengli Li, Kaiyan Zhang, Ermo Hua, Xingtai Lv, Ning Ding, Zhouhan Lin, Zilong Zheng, BoWen Zhou
Model collapse in synthetic data indicates that iterative training on self-generated data leads to a gradual decline in performance.
1 code implementation • 2 Dec 2024 • Lifan Yuan, Wendi Li, Huayu Chen, Ganqu Cui, Ning Ding, Kaiyan Zhang, BoWen Zhou, Zhiyuan Liu, Hao Peng
The only assumption is to parameterize the outcome reward as the log-likelihood ratios of the policy and reference models, which can be optimized regardless of the specific choice of loss objectives.
no code implementations • 24 Nov 2024 • Biqing Qi, Zhouyi Qian, Yiang Luo, Junqi Gao, Dong Li, Kaiyan Zhang, BoWen Zhou
Additionally, we propose a Condensation-Aggregation mechanism to cluster and eliminate redundant paths, facilitate improved information sharing among parent nodes, and ultimately enhance both the efficiency and quality of the reasoning process.
no code implementations • 6 Nov 2024 • Ning Ding, Shang Qu, Linhai Xie, Yifei Li, Zaoqu Liu, Kaiyan Zhang, Yibai Xiong, Yuxin Zuo, Zhangren Chen, Ermo Hua, Xingtai Lv, Youbang Sun, Yang Li, Dong Li, Fuchu He, BoWen Zhou
By automating complex proteomics analysis workflows and hypothesis generation, PROTEUS has the potential to considerably accelerate the pace of scientific discovery in proteomics research, enabling researchers to efficiently explore large-scale datasets and uncover biological insights.
1 code implementation • 4 Nov 2024 • Xingtai Lv, Ning Ding, Kaiyan Zhang, Ermo Hua, Ganqu Cui, BoWen Zhou
Improving the effectiveness and efficiency of large language models (LLMs) simultaneously is a critical yet challenging research goal.
no code implementations • 28 Oct 2024 • Wei-Nan Zhang, Yiming Cui, Kaiyan Zhang, Yifa Wang, Qingfu Zhu, Lingzhi Li, Ting Liu
To address this issue, in this paper, we proposed a static and dynamic attention-based approach to model the dialogue history and then generate open domain multi turn dialogue responses.
no code implementations • 26 Oct 2024 • Haoyu Song, Wei-Nan Zhang, Kaiyan Zhang, Ting Liu
To this end, we propose a novel stack-propagation framework for learning a generation and understanding pipeline. Specifically, the framework stacks a Transformer encoder and two Transformer decoders, where the first decoder models response generation and the second serves as a regularizer and jointly models response generation and consistency understanding.
no code implementations • 15 Oct 2024 • Zhiyuan Ma, Yuzhu Zhang, Guoli Jia, Liangliang Zhao, Yichao Ma, Mingjie Ma, Gaofeng Liu, Kaiyan Zhang, Jianjun Li, BoWen Zhou
As one of the most popular and sought-after generative models in the recent years, diffusion models have sparked the interests of many researchers and steadily shown excellent advantage in various generative tasks such as image synthesis, video generation, molecule design, 3D scene rendering and multimodal generation, relying on their dense theoretical principles and reliable application practices.
1 code implementation • 12 Jul 2024 • Biqing Qi, Kaiyan Zhang, Kai Tian, Haoxiang Li, Zhang-Ren Chen, Sihang Zeng, Ermo Hua, Hu Jinfang, BoWen Zhou
In this paper, we present a comprehensive evaluation of LLMs as biomedical hypothesis generators.
no code implementations • 11 Jul 2024 • Kaiyan Zhang, Biqing Qi, BoWen Zhou
In this perspective paper, we introduce the concept of Specialized Generalist Artificial Intelligence (SGAI or simply SGI) as a crucial milestone toward Artificial General Intelligence (AGI).
1 code implementation • 18 Jun 2024 • Kaiyan Zhang, Jianyu Wang, Ning Ding, Biqing Qi, Ermo Hua, Xingtai Lv, BoWen Zhou
Our research underscores that the fundamental distinction between System 1 and System 2 lies in the uncertainty of next token predictions, where interventions by System 2 are crucial to support System 1.
no code implementations • 8 Jun 2024 • Biqing Qi, Pengfei Li, Fangyuan Li, Junqi Gao, Kaiyan Zhang, BoWen Zhou
Inspired by intraspecific competition driving species evolution, we propose a Online Fast-Slow chasing DPO (OFS-DPO) for preference alignment, simulating competition through fast and slow chasing among models to facilitate rapid adaptation.
1 code implementation • 6 Jun 2024 • Kaiyan Zhang, Sihang Zeng, Ermo Hua, Ning Ding, Zhang-Ren Chen, Zhiyuan Ma, Haoxin Li, Ganqu Cui, Biqing Qi, Xuekai Zhu, Xingtai Lv, Hu Jinfang, Zhiyuan Liu, BoWen Zhou
Large Language Models (LLMs) have demonstrated remarkable capabilities across various domains and are moving towards more specialized areas.
no code implementations • 27 May 2024 • Biqing Qi, Junqi Gao, Kaiyan Zhang, Dong Li, Jianxing Liu, Ligang Wu, BoWen Zhou
Experiments on long-range modeling tasks in autoregressive language modeling and Long Range Arena demonstrate the general effectiveness of the SMR mechanism for a series of SSM models.
1 code implementation • 20 May 2024 • Ermo Hua, Biqing Qi, Kaiyan Zhang, Yue Yu, Ning Ding, Xingtai Lv, Kai Tian, BoWen Zhou
To obtain a unified understanding, we interpret SFT and PO with two sub-processes -- Preference Estimation and Transition Optimization -- defined at token level within the Markov Decision Process (MDP) framework.
no code implementations • 5 Mar 2024 • Kaiyan Zhang, Jianyu Wang, Ermo Hua, Biqing Qi, Ning Ding, BoWen Zhou
With the advancement of language models (LMs), their exposure to private data is increasingly inevitable, and their deployment (especially for smaller ones) on personal devices, such as PCs and smartphones, has become a prevailing trend.
1 code implementation • 16 Jan 2024 • Xinwei Long, Jiali Zeng, Fandong Meng, Zhiyuan Ma, Kaiyan Zhang, BoWen Zhou, Jie zhou
Knowledge retrieval with multi-modal queries plays a crucial role in supporting knowledge-intensive multi-modal applications.
1 code implementation • 10 Nov 2023 • Biqing Qi, Kaiyan Zhang, Haoxiang Li, Kai Tian, Sihang Zeng, Zhang-Ren Chen, BoWen Zhou
We subsequently evaluate the hypothesis generation capabilities of various top-tier instructed models in zero-shot, few-shot, and fine-tuning settings, including both closed and open-source LLMs.
1 code implementation • 24 Oct 2023 • Kaiyan Zhang, Ning Ding, Biqing Qi, Xuekai Zhu, Xinwei Long, BoWen Zhou
Instruction tuning has recently been recognized as an effective way of aligning Large Language Models (LLMs) to enhance their generalization ability across various tasks.
1 code implementation • 23 May 2023 • Xuekai Zhu, Biqing Qi, Kaiyan Zhang, Xinwei Long, Zhouhan Lin, BoWen Zhou
While large language models (LLMs) excel in various natural language processing tasks, their huge size and the inaccessibility of parameters present challenges for practical deployment.
1 code implementation • ACL 2021 • Haoyu Song, Yan Wang, Kaiyan Zhang, Wei-Nan Zhang, Ting Liu
Maintaining consistent personas is essential for dialogue agents.