Search Results for author: Pengjun Xie

Found 41 papers, 27 papers with code

A Fine-Grained Domain Adaption Model for Joint Word Segmentation and POS Tagging

1 code implementation EMNLP 2021 Peijie Jiang, Dingkun Long, Yueheng Sun, Meishan Zhang, Guangwei Xu, Pengjun Xie

Self-training is one promising solution for it, which struggles to construct a set of high-quality pseudo training instances for the target domain.

Domain Adaptation POS +1

Exploring Lottery Prompts for Pre-trained Language Models

no code implementations31 May 2023 Yulin Chen, Ning Ding, Xiaobin Wang, Shengding Hu, Hai-Tao Zheng, Zhiyuan Liu, Pengjun Xie

Consistently scaling pre-trained language models (PLMs) imposes substantial burdens on model adaptation, necessitating more efficient alternatives to conventional fine-tuning.

Challenging Decoder helps in Masked Auto-Encoder Pre-training for Dense Passage Retrieval

no code implementations22 May 2023 Zehan Li, Yanzhao Zhang, Dingkun Long, Pengjun Xie

Recently, various studies have been directed towards exploring dense passage retrieval techniques employing pre-trained language models, among which the masked auto-encoder (MAE) pre-training architecture has emerged as the most promising.

Passage Retrieval Retrieval

GeoGLUE: A GeoGraphic Language Understanding Evaluation Benchmark

no code implementations11 May 2023 Dongyang Li, Ruixue Ding, Qiang Zhang, Zheng Li, Boli Chen, Pengjun Xie, Yao Xu, Xin Li, Ning Guo, Fei Huang, Xiaofeng He

With a fast developing pace of geographic applications, automatable and intelligent models are essential to be designed to handle the large volume of information.

Entity Alignment Natural Language Understanding

COMBO: A Complete Benchmark for Open KG Canonicalization

1 code implementation8 Feb 2023 Chengyue Jiang, Yong Jiang, Weiqi Wu, Yuting Zheng, Pengjun Xie, Kewei Tu

The subject and object noun phrases and the relation in open KG have severe redundancy and ambiguity and need to be canonicalized.

Open Knowledge Graph Canonicalization

MGeo: Multi-Modal Geographic Pre-Training Method

1 code implementation11 Jan 2023 Ruixue Ding, Boli Chen, Pengjun Xie, Fei Huang, Xin Li, Qiang Zhang, Yao Xu

Single-modal PTMs can barely make use of the important GC and therefore have limited performance.

Language Modelling

Modeling Label Correlations for Ultra-Fine Entity Typing with Neural Pairwise Conditional Random Field

1 code implementation3 Dec 2022 Chengyue Jiang, Yong Jiang, Weiqi Wu, Pengjun Xie, Kewei Tu

We use mean-field variational inference for efficient type inference on very large type sets and unfold it as a neural network module to enable end-to-end training.

Entity Typing Variational Inference +1

Named Entity and Relation Extraction with Multi-Modal Retrieval

1 code implementation3 Dec 2022 Xinyu Wang, Jiong Cai, Yong Jiang, Pengjun Xie, Kewei Tu, Wei Lu

MoRe contains a text retrieval module and an image-based retrieval module, which retrieve related knowledge of the input text and image in the knowledge corpus respectively.

Multi-modal Named Entity Recognition Named Entity Recognition +3

Few-shot Classification with Hypersphere Modeling of Prototypes

no code implementations10 Nov 2022 Ning Ding, Yulin Chen, Ganqu Cui, Xiaobin Wang, Hai-Tao Zheng, Zhiyuan Liu, Pengjun Xie

Moreover, it is more convenient to perform metric-based classification with hypersphere prototypes than statistical modeling, as we only need to calculate the distance from a data point to the surface of the hypersphere.

Classification Few-Shot Learning +1

Retrieval Oriented Masking Pre-training Language Model for Dense Passage Retrieval

1 code implementation27 Oct 2022 Dingkun Long, Yanzhao Zhang, Guangwei Xu, Pengjun Xie

Pre-trained language model (PTM) has been shown to yield powerful text representations for dense passage retrieval task.

Language Modelling Masked Language Modeling +2

Entity-to-Text based Data Augmentation for various Named Entity Recognition Tasks

no code implementations19 Oct 2022 Xuming Hu, Yong Jiang, Aiwei Liu, Zhongqiang Huang, Pengjun Xie, Fei Huang, Lijie Wen, Philip S. Yu

Data augmentation techniques have been used to alleviate the problem of scarce labeled data in various NER tasks (flat, nested, and discontinuous NER tasks).

Data Augmentation named-entity-recognition +3

Forging Multiple Training Objectives for Pre-trained Language Models via Meta-Learning

2 code implementations19 Oct 2022 Hongqiu Wu, Ruixue Ding, Hai Zhao, Boli Chen, Pengjun Xie, Fei Huang, Min Zhang

Multiple pre-training objectives fill the vacancy of the understanding capability of single-objective language modeling, which serves the ultimate purpose of pre-trained language models (PrLMs), generalizing well on a mass of scenarios.

Language Modelling Meta-Learning

Domain-Specific NER via Retrieving Correlated Samples

1 code implementation COLING 2022 Xin Zhang, Yong Jiang, Xiaobin Wang, Xuming Hu, Yueheng Sun, Pengjun Xie, Meishan Zhang

Successful Machine Learning based Named Entity Recognition models could fail on texts from some special domains, for instance, Chinese addresses and e-commerce titles, where requires adequate background knowledge.

Named Entity Recognition

Adversarial Self-Attention for Language Understanding

1 code implementation25 Jun 2022 Hongqiu Wu, Ruixue Ding, Hai Zhao, Pengjun Xie, Fei Huang, Min Zhang

Deep neural models (e. g. Transformer) naturally learn spurious features, which create a ``shortcut'' between the labels and inputs, thus impairing the generalization and robustness.

Machine Reading Comprehension Named Entity Recognition (NER) +4

HLATR: Enhance Multi-stage Text Retrieval with Hybrid List Aware Transformer Reranking

1 code implementation21 May 2022 Yanzhao Zhang, Dingkun Long, Guangwei Xu, Pengjun Xie

Existing text retrieval systems with state-of-the-art performance usually adopt a retrieve-then-reranking architecture due to the high computational cost of pre-trained language models and the large corpus size.

Passage Ranking Passage Re-Ranking +2

Multi-CPR: A Multi Domain Chinese Dataset for Passage Retrieval

1 code implementation7 Mar 2022 Dingkun Long, Qiong Gao, Kuan Zou, Guangwei Xu, Pengjun Xie, Ruijie Guo, Jian Xu, Guanjun Jiang, Luxi Xing, Ping Yang

We find that the performance of retrieval models trained on dataset from general domain will inevitably decrease on specific domain.

Passage Retrieval Retrieval

AISHELL-NER: Named Entity Recognition from Chinese Speech

1 code implementation17 Feb 2022 Boli Chen, Guangwei Xu, Xiaobin Wang, Pengjun Xie, Meishan Zhang, Fei Huang

Named Entity Recognition (NER) from speech is among Spoken Language Understanding (SLU) tasks, aiming to extract semantic information from the speech signal.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Few-shot Learning with Big Prototypes

no code implementations29 Sep 2021 Ning Ding, Yulin Chen, Xiaobin Wang, Hai-Tao Zheng, Zhiyuan Liu, Pengjun Xie

A big prototype could be effectively modeled by two sets of learnable parameters, one is the center of the hypersphere, which is an embedding with the same dimension of training examples.

Few-Shot Learning

Prompt-Learning for Fine-Grained Entity Typing

no code implementations24 Aug 2021 Ning Ding, Yulin Chen, Xu Han, Guangwei Xu, Pengjun Xie, Hai-Tao Zheng, Zhiyuan Liu, Juanzi Li, Hong-Gee Kim

In this work, we investigate the application of prompt-learning on fine-grained entity typing in fully supervised, few-shot and zero-shot scenarios.

Entity Typing Knowledge Probing +5

Counterfactual Inference for Text Classification Debiasing

1 code implementation ACL 2021 Chen Qian, Fuli Feng, Lijie Wen, Chunping Ma, Pengjun Xie

In inference, given a factual input document, Corsair imagines its two counterfactual counterparts to distill and mitigate the two biases captured by the poisonous model.

Counterfactual Inference Fairness +2

Crowdsourcing Learning as Domain Adaptation: A Case Study on Named Entity Recognition

1 code implementation ACL 2021 Xin Zhang, Guangwei Xu, Yueheng Sun, Meishan Zhang, Pengjun Xie

Crowdsourcing is regarded as one prospective solution for effective supervised learning, aiming to build large-scale annotated training data by crowd workers.

Domain Adaptation named-entity-recognition +3

Few-NERD: A Few-Shot Named Entity Recognition Dataset

5 code implementations ACL 2021 Ning Ding, Guangwei Xu, Yulin Chen, Xiaobin Wang, Xu Han, Pengjun Xie, Hai-Tao Zheng, Zhiyuan Liu

In this paper, we present Few-NERD, a large-scale human-annotated few-shot NER dataset with a hierarchy of 8 coarse-grained and 66 fine-grained entity types.

Few-shot NER Named Entity Recognition

Probing BERT in Hyperbolic Spaces

1 code implementation ICLR 2021 Boli Chen, Yao Fu, Guangwei Xu, Pengjun Xie, Chuanqi Tan, Mosha Chen, Liping Jing

We introduce a Poincare probe, a structural probe projecting these embeddings into a Poincare subspace with explicitly defined hierarchies.

Word Embeddings

Keyphrase Extraction with Dynamic Graph Convolutional Networks and Diversified Inference

no code implementations24 Oct 2020 Haoyu Zhang, Dingkun Long, Guangwei Xu, Pengjun Xie, Fei Huang, Ji Wang

Keyphrase extraction (KE) aims to summarize a set of phrases that accurately express a concept or a topic covered in a given document.

Keyphrase Extraction Representation Learning

Coupling Distant Annotation and Adversarial Training for Cross-Domain Chinese Word Segmentation

1 code implementation ACL 2020 Ning Ding, Dingkun Long, Guangwei Xu, Muhua Zhu, Pengjun Xie, Xiaobin Wang, Hai-Tao Zheng

In order to simultaneously alleviate these two issues, this paper proposes to couple distant annotation and adversarial training for cross-domain CWS.

Chinese Word Segmentation

Better Modeling of Incomplete Annotations for Named Entity Recognition

no code implementations NAACL 2019 Zhanming Jie, Pengjun Xie, Wei Lu, Ruixue Ding, Linlin Li

Supervised approaches to named entity recognition (NER) are largely developed based on the assumption that the training data is fully annotated with named entity information.

named-entity-recognition Named Entity Recognition +1

Neural Chinese Address Parsing

no code implementations NAACL 2019 Hao Li, Wei Lu, Pengjun Xie, Linlin Li

This paper introduces a new task {--} Chinese address parsing {--} the task of mapping Chinese addresses into semantically meaningful chunks.

Structured Prediction

Cannot find the paper you are looking for? You can Submit a new open access paper.