Search Results for author: Shansan Gong

Found 13 papers, 10 papers with code

GIRAFFE: Design Choices for Extending the Context Length of Visual Language Models

1 code implementation17 Dec 2024 Mukai Li, Lei LI, Shansan Gong, Qi Liu

Towards this goal, we make the best design choice through extensive experiment settings from data curation to context window extending and utilizing: (1) we analyze data sources and length distributions to construct ETVLM - a data recipe to balance the performance across scenarios; (2) we examine existing position extending methods, identify their limitations and propose M-RoPE++ as an enhanced approach; we also choose to solely instruction-tune the backbone with mixed-source data; (3) we discuss how to better utilize extended context windows and propose hybrid-resolution training.

Long-range modeling

Why Does the Effective Context Length of LLMs Fall Short?

no code implementations24 Oct 2024 Chenxin An, Jun Zhang, Ming Zhong, Lei LI, Shansan Gong, Yao Luo, Jingjing Xu, Lingpeng Kong

Advancements in distributed training and efficient attention mechanisms have significantly expanded the context window sizes of large language models (LLMs).

Attribute

Scaling Diffusion Language Models via Adaptation from Autoregressive Models

1 code implementation23 Oct 2024 Shansan Gong, Shivam Agarwal, Yizhe Zhang, Jiacheng Ye, Lin Zheng, Mukai Li, Chenxin An, Peilin Zhao, Wei Bi, Jiawei Han, Hao Peng, Lingpeng Kong

Diffusion Language Models (DLMs) have emerged as a promising new paradigm for text generative modeling, potentially addressing limitations of autoregressive (AR) models.

In-Context Learning Language Modeling +1

Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning

1 code implementation18 Oct 2024 Jiacheng Ye, Jiahui Gao, Shansan Gong, Lin Zheng, Xin Jiang, Zhenguo Li, Lingpeng Kong

Our work highlights the potential of diffusion-based approaches in advancing AI capabilities for sophisticated language understanding and problem-solving tasks.

Training-Free Long-Context Scaling of Large Language Models

1 code implementation27 Feb 2024 Chenxin An, Fei Huang, Jun Zhang, Shansan Gong, Xipeng Qiu, Chang Zhou, Lingpeng Kong

The ability of Large Language Models (LLMs) to process and generate coherent text is markedly weakened when the number of input tokens exceeds their pretraining length.

16k

Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models

1 code implementation12 Feb 2024 Jiacheng Ye, Shansan Gong, Liheng Chen, Lin Zheng, Jiahui Gao, Han Shi, Chuan Wu, Xin Jiang, Zhenguo Li, Wei Bi, Lingpeng Kong

Recently, diffusion models have garnered significant interest in the field of text processing due to their many potential advantages compared to conventional autoregressive models.

Language Modeling Language Modelling +1

L-Eval: Instituting Standardized Evaluation for Long Context Language Models

3 code implementations20 Jul 2023 Chenxin An, Shansan Gong, Ming Zhong, Xingjian Zhao, Mukai Li, Jun Zhang, Lingpeng Kong, Xipeng Qiu

Recently, there has been growing interest in extending the context length of large language models (LLMs), aiming to effectively process long inputs of one turn or conversations with more extensive histories.

Instruction Following

In-Context Learning with Many Demonstration Examples

1 code implementation9 Feb 2023 Mukai Li, Shansan Gong, Jiangtao Feng, Yiheng Xu, Jun Zhang, Zhiyong Wu, Lingpeng Kong

Based on EVALM, we scale up the size of examples efficiently in both instruction tuning and in-context learning to explore the boundary of the benefits from more annotated data.

16k 8k +3

DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models

1 code implementation17 Oct 2022 Shansan Gong, Mukai Li, Jiangtao Feng, Zhiyong Wu, Lingpeng Kong

Bringing together theoretical analysis and empirical evidence, we demonstrate the great potential of diffusion models in complex conditional language generation tasks.

Diversity Text Generation

Few-Shot Natural Language Inference Generation with PDD: Prompt and Dynamic Demonstration

no code implementations21 May 2022 Kaijian Li, Shansan Gong, Kenny Q. Zhu

Natural Language Inference Generation task is to generate a text hypothesis given a text premise and a logical relation between the two.

Data Augmentation Natural Language Inference +1

Positive, Negative and Neutral: Modeling Implicit Feedback in Session-based News Recommendation

1 code implementation12 May 2022 Shansan Gong, Kenny Q. Zhu

News recommendation for anonymous readers is a useful but challenging task for many news portals, where interactions between readers and articles are limited within a temporary login session.

News Recommendation Session-Based Recommendations

Cannot find the paper you are looking for? You can Submit a new open access paper.