1 code implementation • 17 Dec 2024 • Mukai Li, Lei LI, Shansan Gong, Qi Liu
Towards this goal, we make the best design choice through extensive experiment settings from data curation to context window extending and utilizing: (1) we analyze data sources and length distributions to construct ETVLM - a data recipe to balance the performance across scenarios; (2) we examine existing position extending methods, identify their limitations and propose M-RoPE++ as an enhanced approach; we also choose to solely instruction-tune the backbone with mixed-source data; (3) we discuss how to better utilize extended context windows and propose hybrid-resolution training.
no code implementations • 24 Oct 2024 • Chenxin An, Jun Zhang, Ming Zhong, Lei LI, Shansan Gong, Yao Luo, Jingjing Xu, Lingpeng Kong
Advancements in distributed training and efficient attention mechanisms have significantly expanded the context window sizes of large language models (LLMs).
1 code implementation • 23 Oct 2024 • Shansan Gong, Shivam Agarwal, Yizhe Zhang, Jiacheng Ye, Lin Zheng, Mukai Li, Chenxin An, Peilin Zhao, Wei Bi, Jiawei Han, Hao Peng, Lingpeng Kong
Diffusion Language Models (DLMs) have emerged as a promising new paradigm for text generative modeling, potentially addressing limitations of autoregressive (AR) models.
1 code implementation • 18 Oct 2024 • Jiacheng Ye, Jiahui Gao, Shansan Gong, Lin Zheng, Xin Jiang, Zhenguo Li, Lingpeng Kong
Our work highlights the potential of diffusion-based approaches in advancing AI capabilities for sophisticated language understanding and problem-solving tasks.
1 code implementation • 27 Feb 2024 • Chenxin An, Fei Huang, Jun Zhang, Shansan Gong, Xipeng Qiu, Chang Zhou, Lingpeng Kong
The ability of Large Language Models (LLMs) to process and generate coherent text is markedly weakened when the number of input tokens exceeds their pretraining length.
no code implementations • 21 Feb 2024 • Xueliang Zhao, Xinting Huang, Tingchen Fu, Qintong Li, Shansan Gong, Lemao Liu, Wei Bi, Lingpeng Kong
Multimodal reasoning stands as a pivotal capability for large vision-language models (LVLMs).
1 code implementation • 12 Feb 2024 • Jiacheng Ye, Shansan Gong, Liheng Chen, Lin Zheng, Jiahui Gao, Han Shi, Chuan Wu, Xin Jiang, Zhenguo Li, Wei Bi, Lingpeng Kong
Recently, diffusion models have garnered significant interest in the field of text processing due to their many potential advantages compared to conventional autoregressive models.
1 code implementation • 9 Oct 2023 • Shansan Gong, Mukai Li, Jiangtao Feng, Zhiyong Wu, Lingpeng Kong
Diffusion models have gained prominence in generating high-quality sequences of text.
3 code implementations • 20 Jul 2023 • Chenxin An, Shansan Gong, Ming Zhong, Xingjian Zhao, Mukai Li, Jun Zhang, Lingpeng Kong, Xipeng Qiu
Recently, there has been growing interest in extending the context length of large language models (LLMs), aiming to effectively process long inputs of one turn or conversations with more extensive histories.
1 code implementation • 9 Feb 2023 • Mukai Li, Shansan Gong, Jiangtao Feng, Yiheng Xu, Jun Zhang, Zhiyong Wu, Lingpeng Kong
Based on EVALM, we scale up the size of examples efficiently in both instruction tuning and in-context learning to explore the boundary of the benefits from more annotated data.
1 code implementation • 17 Oct 2022 • Shansan Gong, Mukai Li, Jiangtao Feng, Zhiyong Wu, Lingpeng Kong
Bringing together theoretical analysis and empirical evidence, we demonstrate the great potential of diffusion models in complex conditional language generation tasks.
no code implementations • 21 May 2022 • Kaijian Li, Shansan Gong, Kenny Q. Zhu
Natural Language Inference Generation task is to generate a text hypothesis given a text premise and a logical relation between the two.
1 code implementation • 12 May 2022 • Shansan Gong, Kenny Q. Zhu
News recommendation for anonymous readers is a useful but challenging task for many news portals, where interactions between readers and articles are limited within a temporary login session.