no code implementations • 17 Feb 2025 • Kun Luo, Zheng Liu, Peitian Zhang, Hongjin Qian, Jun Zhao, Kang Liu
The efficient processing of long context poses a serious challenge for large language models (LLMs).
1 code implementation • 9 Jan 2025 • Xiaoxi Li, Guanting Dong, Jiajie Jin, Yuyao Zhang, Yujia Zhou, Yutao Zhu, Peitian Zhang, Zhicheng Dou
To address this limitation, we introduce \textbf{Search-o1}, a framework that enhances LRMs with an agentic retrieval-augmented generation (RAG) mechanism and a Reason-in-Documents module for refining retrieved documents.
Ranked #1 on
Mathematical Reasoning
on MATH500
no code implementations • 17 Dec 2024 • Hongjin Qian, Zheng Liu, Peitian Zhang, Zhicheng Dou, Defu Lian
ACRE constructs a Bi-layer KV Cache for long contexts, where the layer-1 (L1) cache compactly captures global information, and the layer-2 (L2) cache provides detailed and localized information.
1 code implementation • 22 Sep 2024 • Yan Shu, Zheng Liu, Peitian Zhang, Minghao Qin, Junjie Zhou, Zhengyang Liang, Tiejun Huang, Bo Zhao
The VST module is trained by instruction fine-tuning, where two optimizing strategies are offered.
1 code implementation • 9 Sep 2024 • Hongjin Qian, Peitian Zhang, Zheng Liu, Kelong Mao, Zhicheng Dou
Retrieval-Augmented Generation (RAG) leverages retrieval tools to access external databases, thereby enhancing the generation quality of large language models (LLMs) through optimized context.
1 code implementation • 26 May 2024 • Peitian Zhang, Zheng Liu, Shitao Xiao, Ninglu Shao, Qiwei Ye, Zhicheng Dou
Compressing lengthy context is a critical but technically challenging problem.
no code implementations • 24 May 2024 • Hongjin Qian, Zheng Liu, Peitian Zhang, Kelong Mao, Yujia Zhou, Xu Chen, Zhicheng Dou
The learning and deployment of long-LLMs remains a challenging problem despite recent progresses.
1 code implementation • 30 Apr 2024 • Peitian Zhang, Ninglu Shao, Zheng Liu, Shitao Xiao, Hongjin Qian, Qiwei Ye, Zhicheng Dou
We extend the context length of Llama-3-8B-Instruct from 8K to 80K via QLoRA fine-tuning.
1 code implementation • 23 Apr 2024 • Xiaoxi Li, Jiajie Jin, Yujia Zhou, Yuyao Zhang, Peitian Zhang, Yutao Zhu, Zhicheng Dou
We will summarize the advancements in GR regarding model training, document identifier, incremental learning, downstream tasks adaptation, multi-modal GR and generative recommendation, as well as progress in reliable response generation in aspects of internal knowledge memorization, external knowledge augmentation, generating response with citations and personal information assistant.
no code implementations • 18 Feb 2024 • Ninglu Shao, Shitao Xiao, Zheng Liu, Peitian Zhang
2) Strong sample efficiency of training, which enables the embedding model to be learned in a cost-effective way.
3 code implementations • 5 Feb 2024 • Jianlv Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu Lian, Zheng Liu
It can simultaneously perform the three common retrieval functionalities of embedding model: dense retrieval, multi-vector retrieval, and sparse retrieval, which provides a unified model foundation for real-world IR applications.
1 code implementation • 15 Jan 2024 • Ninglu Shao, Shitao Xiao, Zheng Liu, Peitian Zhang
Extensible Tokenization stands as a midware in between of the tokenized context and the LLM, which transforms the raw token embeddings into the extensible embeddings.
1 code implementation • 12 Jan 2024 • Yutao Zhu, Peitian Zhang, Chenghao Zhang, Yifei Chen, Binyu Xie, Zheng Liu, Ji-Rong Wen, Zhicheng Dou
Despite this, their application to information retrieval (IR) tasks is still challenging due to the infrequent occurrence of many IR-specific concepts in natural language.
1 code implementation • 7 Jan 2024 • Peitian Zhang, Zheng Liu, Shitao Xiao, Ninglu Shao, Qiwei Ye, Zhicheng Dou
In this paper, we propose Activation Beacon, a plug-in module for transformer-based LLMs that targets effective, efficient, and flexible compression of long contexts.
1 code implementation • 22 Nov 2023 • Shitao Xiao, Zheng Liu, Peitian Zhang, Xingrun Xing
Despite simplicity, LM-Cocktail is surprisingly effective: the resulted model is able to achieve a strong empirical performance in the whole scope of general tasks while preserving a superior capacity in its targeted domain.
1 code implementation • 11 Oct 2023 • Peitian Zhang, Shitao Xiao, Zheng Liu, Zhicheng Dou, Jian-Yun Nie
On the other hand, the task-specific retrievers lack the required versatility, hindering their performance across the diverse retrieval augmentation scenarios.
2 code implementations • 14 Sep 2023 • Shitao Xiao, Zheng Liu, Peitian Zhang, Niklas Muennighoff, Defu Lian, Jian-Yun Nie
Along with our resources on general Chinese embedding, we release our data and models for English text embeddings.
1 code implementation • 23 May 2023 • Peitian Zhang, Zheng Liu, Yujia Zhou, Zhicheng Dou, Fangchao Liu, Zhao Cao
On top of the term-set DocID, we propose a permutation-invariant decoding algorithm, with which the term set can be generated in any permutation yet will always lead to the corresponding document.
1 code implementation • 11 Oct 2022 • Peitian Zhang, Zheng Liu, Shitao Xiao, Zhicheng Dou, Jing Yao
Based on comprehensive experiments on popular retrieval benchmarks, we verify that clusters and terms indeed complement each other, enabling HI$^2$ to achieve lossless retrieval quality with competitive efficiency across various index settings.
no code implementations • 19 Aug 2022 • Yujia Zhou, Jing Yao, Zhicheng Dou, Ledell Wu, Peitian Zhang, Ji-Rong Wen
In order to unify these two stages, we explore a model-based indexer for document retrieval.
no code implementations • 12 Jan 2022 • Peitian Zhang, Zheng Liu
News feed recommendation is an important web service.
no code implementations • 13 Oct 2021 • Peitian Zhang, Zhicheng Dou, Jing Yao
The key to personalized news recommendation is to match the user's interests with the candidate news precisely and efficiently.