Synthetic Data Generation and Multi-Task Learning for Extracting Temporal Information from Health-Related Narrative Text

Extracting temporal information is critical to process health-related text. Temporal information extraction is a challenging task for language models because it requires processing both texts and numbers. Moreover, the fundamental challenge is how to obtain a large-scale training dataset. To address this, we propose a synthetic data generation algorithm. Also, we propose a novel multi-task temporal information extraction model and investigate whether multi-task learning can contribute to performance improvement by exploiting additional training signals with the existing training data. For experiments, we collected a custom dataset containing unstructured texts with temporal information of sleep-related activities. Experimental results show that utilising synthetic data can improve the performance when the augmentation factor is 3. The results also show that when multi-task learning is used with an appropriate amount of synthetic data, the performance can significantly improve from 82. to 88.6 and from 83.9 to 91.9 regarding micro-and macro-average exact match scores of normalised time prediction, respectively.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here