Adaptive Threshold Selective Self-Attention for Chinese NER

Recently, Transformer has achieved great success in Chinese named entity recognition (NER) owing to its good parallelism and ability to model long-range dependencies, which utilizes self-attention to encode context. However, the fully connected way of self-attention may scatter the attention distribution and allow some irrelevant character information to be integrated, leading to entity boundaries being misidentified. In this paper, we propose a data-driven Adaptive Threshold Selective Self-Attention (ATSSA) mechanism that aims to dynamically select the most relevant characters to enhance the Transformer architecture for Chinese NER. In ATSSA, the attention score threshold of each query is automatically generated, and characters with attention score higher than the threshold are selected by the query while others are discarded, so as to address irrelevant attention integration. Experiments on four benchmark Chinese NER datasets show that the proposed ATSSA brings 1.68 average F1 score improvements to the baseline model and achieves state-of-the-art performance.

PDF Abstract

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.


No methods listed for this paper. Add relevant methods here