A New Public Corpus for Clinical Section Identification: MedSecId

The process by which sections in a document are demarcated and labeled is known as section identification. Such sections are helpful to the reader when searching for information and contextualizing specific topics. The goal of this work is to segment the sections of clinical medical domain documentation. The primary contribution of this work is MedSecId, a publicly available set of 2,002 fully annotated medical notes from the MIMIC-III. We include several baselines, source code, a pretrained model and analysis of the data showing a relationship between medical concepts across sections using principal component analysis.

PDF Abstract

Datasets


Introduced in the Paper:

MedSecId

Used in the Paper:

MIMIC-III

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Classification MedSecId BiLSTM-CRF 1 shot Micro-F1 82.2 # 2
Clinical Section Identification MedSecId MedSecId 1 shot Micro-F1 95.5 # 2

Methods


No methods listed for this paper. Add relevant methods here