2 code implementations • 8 Apr 2024 • Longhui Zhang, Dingkun Long, Meishan Zhang, Yanzhao Zhang, Pengjun Xie, Min Zhang
Experimental results on Chinese sequence labeling datasets demonstrate that the improved BABERT variant outperforms the vanilla version, not only on these tasks but also more broadly across a range of Chinese natural language understanding tasks.
1 code implementation • 28 Nov 2023 • Longhui Zhang, Yanzhao Zhang, Dingkun Long, Pengjun Xie, Meishan Zhang, Min Zhang
Text ranking is a critical task in various information retrieval applications, and the recent success of pre-trained language models (PLMs), especially large language models (LLMs), has sparked interest in their application to text ranking.
1 code implementation • 9 Nov 2023 • Yanzhao Zhang, Dingkun Long, Zehan Li, Pengjun Xie
Pre-trained language models (PLMs) have recently shown great success in text representation field.
1 code implementation • 12 Oct 2023 • Xin Zhang, Zehan Li, Yanzhao Zhang, Dingkun Long, Pengjun Xie, Meishan Zhang, Min Zhang
As such cases span from English to other natural or programming languages, from retrieval to classification and beyond, it is desirable to build a unified embedding model rather than dedicated ones for each scenario.
no code implementations • 7 Aug 2023 • Zehan Li, Xin Zhang, Yanzhao Zhang, Dingkun Long, Pengjun Xie, Meishan Zhang
We present GTE, a general-purpose text embedding model trained with multi-stage contrastive learning.
no code implementations • 22 May 2023 • Zehan Li, Yanzhao Zhang, Dingkun Long, Pengjun Xie
Recently, various studies have been directed towards exploring dense passage retrieval techniques employing pre-trained language models, among which the masked auto-encoder (MAE) pre-training architecture has emerged as the most promising.
2 code implementations • 27 Oct 2022 • Peijie Jiang, Dingkun Long, Yanzhao Zhang, Pengjun Xie, Meishan Zhang, Min Zhang
We apply BABERT for feature induction of Chinese sequence labeling tasks.
Ranked #1 on Chinese Word Segmentation on MSRA
Chinese Named Entity Recognition Chinese Word Segmentation +3
1 code implementation • 27 Oct 2022 • Dingkun Long, Yanzhao Zhang, Guangwei Xu, Pengjun Xie
Pre-trained language model (PTM) has been shown to yield powerful text representations for dense passage retrieval task.
1 code implementation • 21 May 2022 • Yanzhao Zhang, Dingkun Long, Guangwei Xu, Pengjun Xie
Existing text retrieval systems with state-of-the-art performance usually adopt a retrieve-then-reranking architecture due to the high computational cost of pre-trained language models and the large corpus size.
Ranked #1 on Passage Re-Ranking on MS MARCO