Hierarchically-Refined Label Attention Network for Sequence Labeling

IJCNLP 2019  ยท  Leyang Cui, Yue Zhang ยท

CRF has been used as a powerful model for statistical sequence labeling. For neural sequence labeling, however, BiLSTM-CRF does not always lead to better results compared with BiLSTM-softmax local classification. This can be because the simple Markov label transition model of CRF does not give much information gain over strong neural encoding. For better representing label sequences, we investigate a hierarchically-refined label attention network, which explicitly leverages label embeddings and captures potential long-term label dependency by giving each word incrementally refined label distributions with hierarchical attention. Results on POS tagging, NER and CCG supertagging show that the proposed model not only improves the overall tagging accuracy with similar number of parameters, but also significantly speeds up the training and testing compared to BiLSTM-CRF.

PDF Abstract IJCNLP 2019 PDF IJCNLP 2019 Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
CCG Supertagging CCGbank BiLSTM-LAN Accuracy 94.7 # 4
Named Entity Recognition (NER) Ontonotes v5 (English) BiLSTM-LAN F1 88.16 # 20
Part-Of-Speech Tagging Penn Treebank BiLSTM-LAN Accuracy 97.65 # 7
Part-Of-Speech Tagging UD BiLSTM-LAN Avg accuracy 96.88 # 1

Methods