no code implementations • LREC 2020 • Svetla Koeva, Nikola Obreshkov, Martin Yalamov
The paper presents the Bulgarian MARCELL corpus, part of a recently developed multilingual corpus representing the national legislation in seven European countries and the NLP pipeline that turns the web crawled data into structured, linguistically annotated dataset.
no code implementations • CLIB 2020 • Nikola Obreshkov, Martin Yalamov, Svetla Koeva
The evaluation shows slight overweight of the basic method, which makes it appropriate as the categorisation should be a module of a NLP Pipeline for Bulgarian that is continuously feeding and annotating the Bulgarian MARCELL corpus with newly issued legislative documents.