BERT got a Date: Introducing Transformers to Temporal Tagging
Temporal expressions in text play a significant role in language understanding and correctly identifying them is fundamental to various retrieval and natural language processing systems. Previous works have slowly shifted from rule-based to neural architectures, capable of tagging expressions with higher accuracy. However, neural models can not yet distinguish between different expression types at the same level as their rule-based counterparts. In this work, we aim to identify the most suitable transformer architecture for joint temporal tagging and type classification, as well as, investigating the effect of semi-supervised training on the performance of these systems. Based on our study of token classification variants and encoder-decoder architectures, we present a transformer encoder-decoder model using the RoBERTa language model as our best performing system. By supplementing training resources with weakly labeled data from rule-based systems, our model surpasses previous works in temporal tagging and type classification, especially on rare classes. Our code and pre-trained experiments are available at: https://github.com/satya77/Transformer_Temporal_Tagger
PDF AbstractDatasets
Task | Dataset | Model | Metric Name | Metric Value | Global Rank | Benchmark |
---|---|---|---|---|---|---|
Temporal Tagging | TempEval-3 | R2R | Strict Detection (Pr.) | 96.37 | # 1 | |
Strict Detection (Re.) | 96.37 | # 1 | ||||
Strict Detection (F1) | 96.37 | # 1 | ||||
Relaxed Detection (Pr.) | 100 | # 1 | ||||
Relaxed Detection (Re.) | 100 | # 1 | ||||
Relaxed Detection (F1) | 100 | # 1 | ||||
Type | 90.43 | # 1 | ||||
Temporal Tagging | TempEval-3 | BERT-base | Strict Detection (Pr.) | 81.83 | # 4 | |
Strict Detection (Re.) | 79.56 | # 4 | ||||
Strict Detection (F1) | 80.67 | # 4 | ||||
Relaxed Detection (Pr.) | 91.37 | # 3 | ||||
Relaxed Detection (Re.) | 88.84 | # 3 | ||||
Relaxed Detection (F1) | 90.08 | # 4 | ||||
Type | 82.00 | # 4 | ||||
Temporal Tagging | TempEval-3 | B2B | Strict Detection (Pr.) | 94.11 | # 2 | |
Strict Detection (Re.) | 81.01 | # 3 | ||||
Strict Detection (F1) | 87.07 | # 2 | ||||
Relaxed Detection (Pr.) | 100 | # 1 | ||||
Relaxed Detection (Re.) | 86.09 | # 4 | ||||
Relaxed Detection (F1) | 92.52 | # 3 | ||||
Type | 83.79 | # 3 | ||||
Temporal Tagging | TempEval-3 | DateBERT | Strict Detection (Pr.) | 82.72 | # 3 | |
Strict Detection (Re.) | 85.79 | # 2 | ||||
Strict Detection (F1) | 84.21 | # 3 | ||||
Relaxed Detection (Pr.) | 90.95 | # 4 | ||||
Relaxed Detection (Re.) | 94.35 | # 2 | ||||
Relaxed Detection (F1) | 92.60 | # 2 | ||||
Type | 86.21 | # 2 |