Search Results for author: Kaiyan Zhang

Found 22 papers, 12 papers with code

Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization

1 code implementation23 Dec 2024 Ermo Hua, Che Jiang, Xingtai Lv, Kaiyan Zhang, Ning Ding, Youbang Sun, Biqing Qi, Yuchen Fan, Xuekai Zhu, BoWen Zhou

Extending the context length of Language Models (LMs) by improving Rotary Position Embedding (RoPE) has become a trend.

Position

How to Synthesize Text Data without Model Collapse?

no code implementations19 Dec 2024 Xuekai Zhu, Daixuan Cheng, Hengli Li, Kaiyan Zhang, Ermo Hua, Xingtai Lv, Ning Ding, Zhouhan Lin, Zilong Zheng, BoWen Zhou

Model collapse in synthetic data indicates that iterative training on self-generated data leads to a gradual decline in performance.

Free Process Rewards without Process Labels

1 code implementation2 Dec 2024 Lifan Yuan, Wendi Li, Huayu Chen, Ganqu Cui, Ning Ding, Kaiyan Zhang, BoWen Zhou, Zhiyuan Liu, Hao Peng

The only assumption is to parameterize the outcome reward as the log-likelihood ratios of the policy and reference models, which can be optimized regardless of the specific choice of loss objectives.

Math

Evolution of Thought: Diverse and High-Quality Reasoning via Multi-Objective Optimization

no code implementations24 Nov 2024 Biqing Qi, Zhouyi Qian, Yiang Luo, Junqi Gao, Dong Li, Kaiyan Zhang, BoWen Zhou

Additionally, we propose a Condensation-Aggregation mechanism to cluster and eliminate redundant paths, facilitate improved information sharing among parent nodes, and ultimately enhance both the efficiency and quality of the reasoning process.

Diversity

Automating Exploratory Proteomics Research via Language Models

no code implementations6 Nov 2024 Ning Ding, Shang Qu, Linhai Xie, Yifei Li, Zaoqu Liu, Kaiyan Zhang, Yibai Xiong, Yuxin Zuo, Zhangren Chen, Ermo Hua, Xingtai Lv, Youbang Sun, Yang Li, Dong Li, Fuchu He, BoWen Zhou

By automating complex proteomics analysis workflows and hypothesis generation, PROTEUS has the potential to considerably accelerate the pace of scientific discovery in proteomics research, enabling researchers to efficiently explore large-scale datasets and uncover biological insights.

scientific discovery

Scalable Efficient Training of Large Language Models with Low-dimensional Projected Attention

1 code implementation4 Nov 2024 Xingtai Lv, Ning Ding, Kaiyan Zhang, Ermo Hua, Ganqu Cui, BoWen Zhou

Improving the effectiveness and efficiency of large language models (LLMs) simultaneously is a critical yet challenging research goal.

A Static and Dynamic Attention Framework for Multi Turn Dialogue Generation

no code implementations28 Oct 2024 Wei-Nan Zhang, Yiming Cui, Kaiyan Zhang, Yifa Wang, Qingfu Zhu, Lingzhi Li, Ting Liu

To address this issue, in this paper, we proposed a static and dynamic attention-based approach to model the dialogue history and then generate open domain multi turn dialogue responses.

Dialogue Generation

A Stack-Propagation Framework for Low-Resource Personalized Dialogue Generation

no code implementations26 Oct 2024 Haoyu Song, Wei-Nan Zhang, Kaiyan Zhang, Ting Liu

To this end, we propose a novel stack-propagation framework for learning a generation and understanding pipeline. Specifically, the framework stacks a Transformer encoder and two Transformer decoders, where the first decoder models response generation and the second serves as a regularizer and jointly models response generation and consistency understanding.

Dialogue Generation Response Generation

Efficient Diffusion Models: A Comprehensive Survey from Principles to Practices

no code implementations15 Oct 2024 Zhiyuan Ma, Yuzhu Zhang, Guoli Jia, Liangliang Zhao, Yichao Ma, Mingjie Ma, Gaofeng Liu, Kaiyan Zhang, Jianjun Li, BoWen Zhou

As one of the most popular and sought-after generative models in the recent years, diffusion models have sparked the interests of many researchers and steadily shown excellent advantage in various generative tasks such as image synthesis, video generation, molecule design, 3D scene rendering and multimodal generation, relying on their dense theoretical principles and reliable application practices.

Image Generation multimodal generation +2

Towards Building Specialized Generalist AI with System 1 and System 2 Fusion

no code implementations11 Jul 2024 Kaiyan Zhang, Biqing Qi, BoWen Zhou

In this perspective paper, we introduce the concept of Specialized Generalist Artificial Intelligence (SGAI or simply SGI) as a crucial milestone toward Artificial General Intelligence (AGI).

Fast and Slow Generating: An Empirical Study on Large and Small Language Models Collaborative Decoding

1 code implementation18 Jun 2024 Kaiyan Zhang, Jianyu Wang, Ning Ding, Biqing Qi, Ermo Hua, Xingtai Lv, BoWen Zhou

Our research underscores that the fundamental distinction between System 1 and System 2 lies in the uncertainty of next token predictions, where interventions by System 2 are crucial to support System 1.

Hallucination

Online DPO: Online Direct Preference Optimization with Fast-Slow Chasing

no code implementations8 Jun 2024 Biqing Qi, Pengfei Li, Fangyuan Li, Junqi Gao, Kaiyan Zhang, BoWen Zhou

Inspired by intraspecific competition driving species evolution, we propose a Online Fast-Slow chasing DPO (OFS-DPO) for preference alignment, simulating competition through fast and slow chasing among models to facilitate rapid adaptation.

Continual Learning

UltraMedical: Building Specialized Generalists in Biomedicine

1 code implementation6 Jun 2024 Kaiyan Zhang, Sihang Zeng, Ermo Hua, Ning Ding, Zhang-Ren Chen, Zhiyuan Ma, Haoxin Li, Ganqu Cui, Biqing Qi, Xuekai Zhu, Xingtai Lv, Hu Jinfang, Zhiyuan Liu, BoWen Zhou

Large Language Models (LLMs) have demonstrated remarkable capabilities across various domains and are moving towards more specialized areas.

SMR: State Memory Replay for Long Sequence Modeling

no code implementations27 May 2024 Biqing Qi, Junqi Gao, Kaiyan Zhang, Dong Li, Jianxing Liu, Ligang Wu, BoWen Zhou

Experiments on long-range modeling tasks in autoregressive language modeling and Long Range Arena demonstrate the general effectiveness of the SMR mechanism for a series of SSM models.

Language Modeling Long-range modeling +2

Intuitive Fine-Tuning: Towards Simplifying Alignment into a Single Process

1 code implementation20 May 2024 Ermo Hua, Biqing Qi, Kaiyan Zhang, Yue Yu, Ning Ding, Xingtai Lv, Kai Tian, BoWen Zhou

To obtain a unified understanding, we interpret SFT and PO with two sub-processes -- Preference Estimation and Transition Optimization -- defined at token level within the Markov Decision Process (MDP) framework.

CoGenesis: A Framework Collaborating Large and Small Language Models for Secure Context-Aware Instruction Following

no code implementations5 Mar 2024 Kaiyan Zhang, Jianyu Wang, Ermo Hua, Biqing Qi, Ning Ding, BoWen Zhou

With the advancement of language models (LMs), their exposure to private data is increasingly inevitable, and their deployment (especially for smaller ones) on personal devices, such as PCs and smartphones, has become a prevailing trend.

Instruction Following

Generative Multi-Modal Knowledge Retrieval with Large Language Models

1 code implementation16 Jan 2024 Xinwei Long, Jiali Zeng, Fandong Meng, Zhiyuan Ma, Kaiyan Zhang, BoWen Zhou, Jie zhou

Knowledge retrieval with multi-modal queries plays a crucial role in supporting knowledge-intensive multi-modal applications.

Retrieval

Large Language Models are Zero Shot Hypothesis Proposers

1 code implementation10 Nov 2023 Biqing Qi, Kaiyan Zhang, Haoxiang Li, Kai Tian, Sihang Zeng, Zhang-Ren Chen, BoWen Zhou

We subsequently evaluate the hypothesis generation capabilities of various top-tier instructed models in zero-shot, few-shot, and fine-tuning settings, including both closed and open-source LLMs.

scientific discovery

CRaSh: Clustering, Removing, and Sharing Enhance Fine-tuning without Full Large Language Model

1 code implementation24 Oct 2023 Kaiyan Zhang, Ning Ding, Biqing Qi, Xuekai Zhu, Xinwei Long, BoWen Zhou

Instruction tuning has recently been recognized as an effective way of aligning Large Language Models (LLMs) to enhance their generalization ability across various tasks.

Clustering Language Modeling +2

PaD: Program-aided Distillation Can Teach Small Models Reasoning Better than Chain-of-thought Fine-tuning

1 code implementation23 May 2023 Xuekai Zhu, Biqing Qi, Kaiyan Zhang, Xinwei Long, Zhouhan Lin, BoWen Zhou

While large language models (LLMs) excel in various natural language processing tasks, their huge size and the inaccessibility of parameters present challenges for practical deployment.

Arithmetic Reasoning GSM8K +1

Cannot find the paper you are looking for? You can Submit a new open access paper.