Search Results for author: Dingkun Long

Found 17 papers, 10 papers with code

A Fine-Grained Domain Adaption Model for Joint Word Segmentation and POS Tagging

1 code implementation • EMNLP 2021 • Peijie Jiang, Dingkun Long, Yueheng Sun, Meishan Zhang, Guangwei Xu, Pengjun Xie

Self-training is one promising solution for it, which struggles to construct a set of high-quality pseudo training instances for the target domain.

Domain Adaptation POS +3

Paper
Code

Chinese Sequence Labeling with Semi-Supervised Boundary-Aware Language Model Pre-training

2 code implementations • 8 Apr 2024 • Longhui Zhang, Dingkun Long, Meishan Zhang, Yanzhao Zhang, Pengjun Xie, Min Zhang

Experimental results on Chinese sequence labeling datasets demonstrate that the improved BABERT variant outperforms the vanilla version, not only on these tasks but also more broadly across a range of Chinese natural language understanding tasks.

Language Modelling Natural Language Understanding

357

Paper
Code

TSRankLLM: A Two-Stage Adaptation of LLMs for Text Ranking

1 code implementation • 28 Nov 2023 • Longhui Zhang, Yanzhao Zhang, Dingkun Long, Pengjun Xie, Meishan Zhang, Min Zhang

Text ranking is a critical task in various information retrieval applications, and the recent success of pre-trained language models (PLMs), especially large language models (LLMs), has sparked interest in their application to text ranking.

Information Retrieval Retrieval

Paper
Code

Text Representation Distillation via Information Bottleneck Principle

1 code implementation • 9 Nov 2023 • Yanzhao Zhang, Dingkun Long, Zehan Li, Pengjun Xie

Pre-trained language models (PLMs) have recently shown great success in text representation field.

Knowledge Distillation Retrieval +1

Paper
Code

Language Models are Universal Embedders

1 code implementation • 12 Oct 2023 • Xin Zhang, Zehan Li, Yanzhao Zhang, Dingkun Long, Pengjun Xie, Meishan Zhang, Min Zhang

As such cases span from English to other natural or programming languages, from retrieval to classification and beyond, it is desirable to build a unified embedding model rather than dedicated ones for each scenario.

Code Search Language Modelling +2

Paper
Code

Hybrid Retrieval and Multi-stage Text Ranking Solution at TREC 2022 Deep Learning Track

no code implementations • 23 Aug 2023 • Guangwei Xu, Yangzhao Zhang, Longhui Zhang, Dingkun Long, Pengjun Xie, Ruijie Guo

Large-scale text retrieval technology has been widely used in various practical business scenarios.

Document Ranking Language Modelling +3

Paper
Add Code

Towards General Text Embeddings with Multi-stage Contrastive Learning

no code implementations • 7 Aug 2023 • Zehan Li, Xin Zhang, Yanzhao Zhang, Dingkun Long, Pengjun Xie, Meishan Zhang

We present GTE, a general-purpose text embedding model trained with multi-stage contrastive learning.

Contrastive Learning Unsupervised Pre-training

Paper
Add Code

Challenging Decoder helps in Masked Auto-Encoder Pre-training for Dense Passage Retrieval

no code implementations • 22 May 2023 • Zehan Li, Yanzhao Zhang, Dingkun Long, Pengjun Xie

Recently, various studies have been directed towards exploring dense passage retrieval techniques employing pre-trained language models, among which the masked auto-encoder (MAE) pre-training architecture has emerged as the most promising.

Passage Retrieval Retrieval

Paper
Add Code

Unsupervised Boundary-Aware Language Model Pretraining for Chinese Sequence Labeling

2 code implementations • 27 Oct 2022 • Peijie Jiang, Dingkun Long, Yanzhao Zhang, Pengjun Xie, Meishan Zhang, Min Zhang

We apply BABERT for feature induction of Chinese sequence labeling tasks.

Ranked #1 on Chinese Word Segmentation on MSRA

Chinese Named Entity Recognition Chinese Word Segmentation +3

6,020

Paper
Code

Retrieval Oriented Masking Pre-training Language Model for Dense Passage Retrieval

1 code implementation • 27 Oct 2022 • Dingkun Long, Yanzhao Zhang, Guangwei Xu, Pengjun Xie

Pre-trained language model (PTM) has been shown to yield powerful text representations for dense passage retrieval task.

Language Modelling Masked Language Modeling +2

150

Paper
Code

HLATR: Enhance Multi-stage Text Retrieval with Hybrid List Aware Transformer Reranking

1 code implementation • 21 May 2022 • Yanzhao Zhang, Dingkun Long, Guangwei Xu, Pengjun Xie

Existing text retrieval systems with state-of-the-art performance usually adopt a retrieve-then-reranking architecture due to the high computational cost of pre-trained language models and the large corpus size.

Ranked #1 on Passage Re-Ranking on MS MARCO

Passage Ranking Passage Re-Ranking +2

Paper
Code

Multi-CPR: A Multi Domain Chinese Dataset for Passage Retrieval

1 code implementation • 7 Mar 2022 • Dingkun Long, Qiong Gao, Kuan Zou, Guangwei Xu, Pengjun Xie, Ruijie Guo, Jian Xu, Guanjun Jiang, Luxi Xing, Ping Yang

We find that the performance of retrieval models trained on dataset from general domain will inevitably decrease on specific domain.

Passage Retrieval Retrieval

150

Paper
Code

Keyphrase Extraction with Dynamic Graph Convolutional Networks and Diversified Inference

no code implementations • 24 Oct 2020 • Haoyu Zhang, Dingkun Long, Guangwei Xu, Pengjun Xie, Fei Huang, Ji Wang

Keyphrase extraction (KE) aims to summarize a set of phrases that accurately express a concept or a topic covered in a given document.

Keyphrase Extraction Representation Learning

Paper
Add Code

Coupling Distant Annotation and Adversarial Training for Cross-Domain Chinese Word Segmentation

1 code implementation • ACL 2020 • Ning Ding, Dingkun Long, Guangwei Xu, Muhua Zhu, Pengjun Xie, Xiaobin Wang, Hai-Tao Zheng

In order to simultaneously alleviate these two issues, this paper proposes to couple distant annotation and adversarial training for cross-domain CWS.

Chinese Word Segmentation Sentence

Paper
Code

Hierarchy-Aware Global Model for Hierarchical Text Classification

no code implementations • ACL 2020 • Jie Zhou, Chunping Ma, Dingkun Long, Guangwei Xu, Ning Ding, Haoyu Zhang, Pengjun Xie, Gongshen Liu

Hierarchical text classification is an essential yet challenging subtask of multi-label text classification with a taxonomic hierarchy.

General Classification Multi Label Text Classification +2

Paper
Add Code

Understanding Feature Selection and Feature Memorization in Recurrent Neural Networks

no code implementations • 3 Mar 2019 • Bokang Zhu, Richong Zhang, Dingkun Long, Yongyi Mao

Gated models resolve this conflict by adaptively adjusting their state-update equations, whereas Vanilla RNN resolves this conflict by assigning different dimensions different tasks.

feature selection Memorization

Paper
Add Code

Prototypical Recurrent Unit

no code implementations • 20 Nov 2016 • Dingkun Long, Richong Zhang, Yongyi Mao

For this purpose, we design a simple and controllable task, called ``memorization problem'', where the networks are trained to memorize certain targeted information.

Memorization

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.