Romanian micro-blogging named entity recognition including health-related entities

SMM4H (COLING) 2022 · Vasile Pais, Verginica Barbu Mititelu, Elena Irimia, Maria Mitrofan, Carol Luca Gasan, Roxana Micu ·

This paper introduces a manually annotated dataset for named entity recognition (NER) in micro-blogging text for Romanian language. It contains gold annotations for 9 entity classes and expressions: persons, locations, organizations, time expressions, legal references, disorders, chemicals, medical devices and anatomical parts. Furthermore, word embeddings models computed on a larger micro-blogging corpus are made available. Finally, several NER models are trained and their performance is evaluated against the newly introduced corpus.

PDF Abstract