1 code implementation • EMNLP 2021 • Peijie Jiang, Dingkun Long, Yueheng Sun, Meishan Zhang, Guangwei Xu, Pengjun Xie
Self-training is one promising solution for it, which struggles to construct a set of high-quality pseudo training instances for the target domain.
no code implementations • 31 May 2023 • Yulin Chen, Ning Ding, Xiaobin Wang, Shengding Hu, Hai-Tao Zheng, Zhiyuan Liu, Pengjun Xie
Consistently scaling pre-trained language models (PLMs) imposes substantial burdens on model adaptation, necessitating more efficient alternatives to conventional fine-tuning.
no code implementations • 22 May 2023 • Zehan Li, Yanzhao Zhang, Dingkun Long, Pengjun Xie
Recently, various studies have been directed towards exploring dense passage retrieval techniques employing pre-trained language models, among which the masked auto-encoder (MAE) pre-training architecture has emerged as the most promising.
no code implementations • 11 May 2023 • Dongyang Li, Ruixue Ding, Qiang Zhang, Zheng Li, Boli Chen, Pengjun Xie, Yao Xu, Xin Li, Ning Guo, Fei Huang, Xiaofeng He
With a fast developing pace of geographic applications, automatable and intelligent models are essential to be designed to handle the large volume of information.
1 code implementation • 5 May 2023 • Zeqi Tan, Shen Huang, Zixia Jia, Jiong Cai, Yinghui Li, Weiming Lu, Yueting Zhuang, Kewei Tu, Pengjun Xie, Fei Huang, Yong Jiang
Also, we discover that the limited context length causes the retrieval knowledge to be invisible to the model.
Multilingual Named Entity Recognition
named-entity-recognition
+3
1 code implementation • 20 Feb 2023 • Xiang Wei, Xingyu Cui, Ning Cheng, Xiaobin Wang, Xin Zhang, Shen Huang, Pengjun Xie, Jinan Xu, Yufeng Chen, Meishan Zhang, Yong Jiang, Wenjuan Han
Zero-shot information extraction (IE) aims to build IE systems from the unannotated text.
1 code implementation • 8 Feb 2023 • Chengyue Jiang, Yong Jiang, Weiqi Wu, Yuting Zheng, Pengjun Xie, Kewei Tu
The subject and object noun phrases and the relation in open KG have severe redundancy and ambiguity and need to be canonicalized.
1 code implementation • 11 Jan 2023 • Ruixue Ding, Boli Chen, Pengjun Xie, Fei Huang, Xin Li, Qiang Zhang, Yao Xu
Single-modal PTMs can barely make use of the important GC and therefore have limited performance.
1 code implementation • 18 Dec 2022 • Chengyue Jiang, Wenyang Hui, Yong Jiang, Xiaobin Wang, Pengjun Xie, Kewei Tu
We also found MCCE is very effective in fine-grained (130 types) and coarse-grained (9 types) entity typing.
Ranked #1 on
Entity Typing
on Open Entity
1 code implementation • 3 Dec 2022 • Chengyue Jiang, Yong Jiang, Weiqi Wu, Pengjun Xie, Kewei Tu
We use mean-field variational inference for efficient type inference on very large type sets and unfold it as a neural network module to enable end-to-end training.
Ranked #2 on
Entity Typing
on Open Entity
1 code implementation • 3 Dec 2022 • Xinyu Wang, Jiong Cai, Yong Jiang, Pengjun Xie, Kewei Tu, Wei Lu
MoRe contains a text retrieval module and an image-based retrieval module, which retrieve related knowledge of the input text and image in the knowledge corpus respectively.
Ranked #1 on
Multi-modal Named Entity Recognition
on SNAP (MNER)
Multi-modal Named Entity Recognition
Named Entity Recognition
+3
no code implementations • 10 Nov 2022 • Ning Ding, Yulin Chen, Ganqu Cui, Xiaobin Wang, Hai-Tao Zheng, Zhiyuan Liu, Pengjun Xie
Moreover, it is more convenient to perform metric-based classification with hypersphere prototypes than statistical modeling, as we only need to calculate the distance from a data point to the surface of the hypersphere.
1 code implementation • 27 Oct 2022 • Dingkun Long, Yanzhao Zhang, Guangwei Xu, Pengjun Xie
Pre-trained language model (PTM) has been shown to yield powerful text representations for dense passage retrieval task.
2 code implementations • 27 Oct 2022 • Peijie Jiang, Dingkun Long, Yanzhao Zhang, Pengjun Xie, Meishan Zhang, Min Zhang
We apply BABERT for feature induction of Chinese sequence labeling tasks.
Ranked #1 on
Chinese Word Segmentation
on MSRA
Chinese Named Entity Recognition
Chinese Word Segmentation
+3
no code implementations • 19 Oct 2022 • Xuming Hu, Yong Jiang, Aiwei Liu, Zhongqiang Huang, Pengjun Xie, Fei Huang, Lijie Wen, Philip S. Yu
Data augmentation techniques have been used to alleviate the problem of scarce labeled data in various NER tasks (flat, nested, and discontinuous NER tasks).
2 code implementations • 19 Oct 2022 • Hongqiu Wu, Ruixue Ding, Hai Zhao, Boli Chen, Pengjun Xie, Fei Huang, Min Zhang
Multiple pre-training objectives fill the vacancy of the understanding capability of single-objective language modeling, which serves the ultimate purpose of pre-trained language models (PrLMs), generalizing well on a mass of scenarios.
1 code implementation • 27 Sep 2022 • Shen Huang, Yuchen Zhai, Xinwei Long, Yong Jiang, Xiaobin Wang, Yin Zhang, Pengjun Xie
Speech Entity Linking aims to recognize and disambiguate named entities in spoken languages.
1 code implementation • COLING 2022 • Xin Zhang, Yong Jiang, Xiaobin Wang, Xuming Hu, Yueheng Sun, Pengjun Xie, Meishan Zhang
Successful Machine Learning based Named Entity Recognition models could fail on texts from some special domains, for instance, Chinese addresses and e-commerce titles, where requires adequate background knowledge.
1 code implementation • 25 Jun 2022 • Hongqiu Wu, Ruixue Ding, Hai Zhao, Pengjun Xie, Fei Huang, Min Zhang
Deep neural models (e. g. Transformer) naturally learn spurious features, which create a ``shortcut'' between the labels and inputs, thus impairing the generalization and robustness.
Ranked #1 on
Machine Reading Comprehension
on DREAM
Machine Reading Comprehension
Named Entity Recognition (NER)
+4
1 code implementation • 21 May 2022 • Yanzhao Zhang, Dingkun Long, Guangwei Xu, Pengjun Xie
Existing text retrieval systems with state-of-the-art performance usually adopt a retrieve-then-reranking architecture due to the high computational cost of pre-trained language models and the large corpus size.
Ranked #1 on
Passage Re-Ranking
on MS MARCO
1 code implementation • NAACL 2022 • Linzhi Wu, Pengjun Xie, Jie zhou, Meishan Zhang, Chunping Ma, Guangwei Xu, Min Zhang
Prior research has mainly resorted to heuristic rule-based constraints to reduce the noise for specific self-augmentation methods individually.
1 code implementation • ACL 2022 • Yongliang Shen, Xiaobin Wang, Zeqi Tan, Guangwei Xu, Pengjun Xie, Fei Huang, Weiming Lu, Yueting Zhuang
Each instance query predicts one entity, and by feeding all instance queries simultaneously, we can query all entities in parallel.
Ranked #1 on
Nested Named Entity Recognition
on GENIA
Chinese Named Entity Recognition
named-entity-recognition
+4
1 code implementation • 7 Mar 2022 • Dingkun Long, Qiong Gao, Kuan Zou, Guangwei Xu, Pengjun Xie, Ruijie Guo, Jian Xu, Guanjun Jiang, Luxi Xing, Ping Yang
We find that the performance of retrieval models trained on dataset from general domain will inevitably decrease on specific domain.
1 code implementation • SemEval (NAACL) 2022 • Xinyu Wang, Yongliang Shen, Jiong Cai, Tao Wang, Xiaobin Wang, Pengjun Xie, Fei Huang, Weiming Lu, Yueting Zhuang, Kewei Tu, Wei Lu, Yong Jiang
Our system wins 10 out of 13 tracks in the MultiCoNER shared task.
Multilingual Named Entity Recognition
Named Entity Recognition
1 code implementation • 17 Feb 2022 • Boli Chen, Guangwei Xu, Xiaobin Wang, Pengjun Xie, Meishan Zhang, Fei Huang
Named Entity Recognition (NER) from speech is among Spoken Language Understanding (SLU) tasks, aiming to extract semantic information from the speech signal.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+5
no code implementations • 29 Sep 2021 • Ning Ding, Yulin Chen, Xiaobin Wang, Hai-Tao Zheng, Zhiyuan Liu, Pengjun Xie
A big prototype could be effectively modeled by two sets of learnable parameters, one is the center of the hypersphere, which is an embedding with the same dimension of training examples.
no code implementations • 24 Aug 2021 • Ning Ding, Yulin Chen, Xu Han, Guangwei Xu, Pengjun Xie, Hai-Tao Zheng, Zhiyuan Liu, Juanzi Li, Hong-Gee Kim
In this work, we investigate the application of prompt-learning on fine-grained entity typing in fully supervised, few-shot and zero-shot scenarios.
1 code implementation • ACL 2021 • Chen Qian, Fuli Feng, Lijie Wen, Chunping Ma, Pengjun Xie
In inference, given a factual input document, Corsair imagines its two counterfactual counterparts to distill and mitigate the two biases captured by the poisonous model.
1 code implementation • ACL 2021 • Xin Zhang, Guangwei Xu, Yueheng Sun, Meishan Zhang, Pengjun Xie
Crowdsourcing is regarded as one prospective solution for effective supervised learning, aiming to build large-scale annotated training data by crowd workers.
5 code implementations • ACL 2021 • Ning Ding, Guangwei Xu, Yulin Chen, Xiaobin Wang, Xu Han, Pengjun Xie, Hai-Tao Zheng, Zhiyuan Liu
In this paper, we present Few-NERD, a large-scale human-annotated few-shot NER dataset with a hierarchy of 8 coarse-grained and 66 fine-grained entity types.
Ranked #5 on
Few-shot NER
on Few-NERD (INTRA)
1 code implementation • ICLR 2021 • Boli Chen, Yao Fu, Guangwei Xu, Pengjun Xie, Chuanqi Tan, Mosha Chen, Liping Jing
We introduce a Poincare probe, a structural probe projecting these embeddings into a Poincare subspace with explicitly defined hierarchies.
1 code implementation • ICLR 2021 • Ning Ding, Xiaobin Wang, Yao Fu, Guangwei Xu, Rui Wang, Pengjun Xie, Ying Shen, Fei Huang, Hai-Tao Zheng, Rui Zhang
This approach allows us to learn meaningful, interpretable prototypes for the final classification.
no code implementations • 24 Oct 2020 • Haoyu Zhang, Dingkun Long, Guangwei Xu, Pengjun Xie, Fei Huang, Ji Wang
Keyphrase extraction (KE) aims to summarize a set of phrases that accurately express a concept or a topic covered in a given document.
1 code implementation • ACL 2020 • Ning Ding, Dingkun Long, Guangwei Xu, Muhua Zhu, Pengjun Xie, Xiaobin Wang, Hai-Tao Zheng
In order to simultaneously alleviate these two issues, this paper proposes to couple distant annotation and adversarial training for cross-domain CWS.
no code implementations • ACL 2020 • Jie Zhou, Chunping Ma, Dingkun Long, Guangwei Xu, Ning Ding, Haoyu Zhang, Pengjun Xie, Gongshen Liu
Hierarchical text classification is an essential yet challenging subtask of multi-label text classification with a taxonomic hierarchy.
1 code implementation • ACL 2019 • Ruixue Ding, Pengjun Xie, Xiaoyan Zhang, Wei Lu, Linlin Li, Luo Si
Gazetteers were shown to be useful resources for named entity recognition (NER).
no code implementations • NAACL 2019 • Zhanming Jie, Pengjun Xie, Wei Lu, Ruixue Ding, Linlin Li
Supervised approaches to named entity recognition (NER) are largely developed based on the assumption that the training data is fully annotated with named entity information.
no code implementations • SEMEVAL 2019 • Xiaobin Wang, Chunping Ma, Huafei Zheng, Chu Liu, Pengjun Xie, Linlin Li, Luo Si
This paper describes DM-NLP{'}s system for toponym resolution task at Semeval 2019.
no code implementations • NAACL 2019 • Hao Li, Wei Lu, Pengjun Xie, Linlin Li
This paper introduces a new task {--} Chinese address parsing {--} the task of mapping Chinese addresses into semantically meaningful chunks.
no code implementations • SEMEVAL 2018 • Chunping Ma, Huafei Zheng, Pengjun Xie, Chen Li, Linlin Li, Luo Si
This paper describes our submissions for SemEval-2018 Task 8: Semantic Extraction from CybersecUrity REports using NLP.
no code implementations • IJCNLP 2017 • Yi Yang, Pengjun Xie, Jun Tao, Guangwei Xu, Linlin Li, Luo Si
This paper introduces Alibaba NLP team system on IJCNLP 2017 shared task No.
Ranked #1 on
2D Human Pose Estimation
on Alibaba Cluster Trace
(using extra training data)