Search Results for author: Peitian Zhang

Found 14 papers, 10 papers with code

From Matching to Generation: A Survey on Generative Information Retrieval

1 code implementation23 Apr 2024 Xiaoxi Li, Jiajie Jin, Yujia Zhou, Yuyao Zhang, Peitian Zhang, Yutao Zhu, Zhicheng Dou

We will summarize the advancements in GR regarding model training, document identifier, incremental learning, downstream tasks adaptation, multi-modal GR and generative recommendation, as well as progress in reliable response generation in aspects of internal knowledge memorization, external knowledge augmentation, generating response with citations and personal information assistant.

Extensible Embedding: A Flexible Multipler For LLM's Context Length

no code implementations18 Feb 2024 Ninglu Shao, Shitao Xiao, Zheng Liu, Peitian Zhang

2) Strong sample efficiency of training, which enables the embedding model to be learned in a cost-effective way.

Language Modelling

BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation

1 code implementation5 Feb 2024 Jianlv Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu Lian, Zheng Liu

It can simultaneously perform the three common retrieval functionalities of embedding model: dense retrieval, multi-vector retrieval, and sparse retrieval, which provides a unified model foundation for real-world IR applications.

Retrieval Self-Knowledge Distillation

Flexibly Scaling Large Language Models Contexts Through Extensible Tokenization

1 code implementation15 Jan 2024 Ninglu Shao, Shitao Xiao, Zheng Liu, Peitian Zhang

Extensible Tokenization stands as a midware in between of the tokenized context and the LLM, which transforms the raw token embeddings into the extensible embeddings.

Few-Shot Learning Language Modelling

INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning

1 code implementation12 Jan 2024 Yutao Zhu, Peitian Zhang, Chenghao Zhang, Yifei Chen, Binyu Xie, Zhicheng Dou, Zheng Liu, Ji-Rong Wen

Despite this, their application to information retrieval (IR) tasks is still challenging due to the infrequent occurrence of many IR-specific concepts in natural language.

document understanding Information Retrieval +2

Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon

1 code implementation7 Jan 2024 Peitian Zhang, Zheng Liu, Shitao Xiao, Ninglu Shao, Qiwei Ye, Zhicheng Dou

Although the context window can be extended through fine-tuning, it will result in a considerable cost at both training and inference time, and exert an unfavorable impact to the LLM's original capabilities.

4k Language Modelling

LM-Cocktail: Resilient Tuning of Language Models via Model Merging

1 code implementation22 Nov 2023 Shitao Xiao, Zheng Liu, Peitian Zhang, Xingrun Xing

Despite simplicity, LM-Cocktail is surprisingly effective: the resulted model is able to achieve a strong empirical performance in the whole scope of general tasks while preserving a superior capacity in its targeted domain.

Language Modelling

Retrieve Anything To Augment Large Language Models

1 code implementation11 Oct 2023 Peitian Zhang, Shitao Xiao, Zheng Liu, Zhicheng Dou, Jian-Yun Nie

On the other hand, the task-specific retrievers lack the required versatility, hindering their performance across the diverse retrieval augmentation scenarios.

Knowledge Distillation Retrieval

C-Pack: Packaged Resources To Advance General Chinese Embedding

3 code implementations14 Sep 2023 Shitao Xiao, Zheng Liu, Peitian Zhang, Niklas Muennighoff

Along with our resources on general Chinese embedding, we release our data and models for English text embeddings.

Generative Retrieval via Term Set Generation

1 code implementation23 May 2023 Peitian Zhang, Zheng Liu, Yujia Zhou, Zhicheng Dou, Fangchao Liu, Zhao Cao

On top of the term-set DocID, we propose a permutation-invariant decoding algorithm, with which the term set can be generated in any permutation yet will always lead to the corresponding document.

Information Retrieval Natural Questions +1

Hybrid Inverted Index Is a Robust Accelerator for Dense Retrieval

1 code implementation11 Oct 2022 Peitian Zhang, Zheng Liu, Shitao Xiao, Zhicheng Dou, Jing Yao

Based on comprehensive experiments on popular retrieval benchmarks, we verify that clusters and terms indeed complement each other, enabling HI$^2$ to achieve lossless retrieval quality with competitive efficiency across various index settings.

Knowledge Distillation Quantization +1

Ultron: An Ultimate Retriever on Corpus with a Model-based Indexer

no code implementations19 Aug 2022 Yujia Zhou, Jing Yao, Zhicheng Dou, Ledell Wu, Peitian Zhang, Ji-Rong Wen

In order to unify these two stages, we explore a model-based indexer for document retrieval.

Retrieval

Learning to Select Historical News Articles for Interaction based Neural News Recommendation

no code implementations13 Oct 2021 Peitian Zhang, Zhicheng Dou, Jing Yao

The key to personalized news recommendation is to match the user's interests with the candidate news precisely and efficiently.

News Recommendation

Cannot find the paper you are looking for? You can Submit a new open access paper.