TSRankLLM: A Two-Stage Adaptation of LLMs for Text Ranking

28 Nov 2023 · Longhui Zhang, Yanzhao Zhang, Dingkun Long, Pengjun Xie, Meishan Zhang, Min Zhang ·

Text ranking is a critical task in various information retrieval applications, and the recent success of pre-trained language models (PLMs), especially large language models (LLMs), has sparked interest in their application to text ranking. To eliminate the misalignment between PLMs and text ranking, fine-tuning with supervised ranking data has been widely explored. However, previous studies focus mainly on encoder-only and encoder-decoder PLMs, and decoder-only LLM research is still lacking. An exception to this is RankLLaMA, which suggests direct supervised fine-tuning (SFT) to explore LLaMA fully. In our work, we argue that a two-stage progressive paradigm would be more beneficial. First, we suggest continual pre-training (CPT) on LLMs by using a large-scale weakly-supervised corpus. Second, we perform SFT consistent with RankLLaMA, and propose an improved optimization strategy further. Our experimental results on multiple benchmarks demonstrate the superior performance of our method covering both in-domain and out-domain scenarios.

PDF Abstract