no code implementations • LREC 2020 • Tam{\'a}s V{\'a}radi, Svetla Koeva, Martin Yamalov, Marko Tadi{\'c}, B{\'a}lint Sass, Bart{\l}omiej Nito{\'n}, Maciej Ogrodniczuk, Piotr P{\k{e}}zik, Verginica Barbu Mititelu, Radu Ion, Elena Irimia, Maria Mitrofan, Vasile P{\u{a}}i{\textcommabelow{s}}, Dan Tufi{\textcommabelow{s}}, Radovan Garab{\'\i}k, Simon Krek, Andraz Repar, Matja{\v{z}} Rihtar, Janez Brank
This article presents the current outcomes of the MARCELL CEF Telecom project aiming to collect and deeply annotate a large comparable corpus of legal documents.
no code implementations • LREC 2020 • Bal{\'a}zs Indig, B{\'a}lint Sass, Iv{\'a}n Mittelholcz
When a module is put into xtsv, all functionalities of the system are immediately available for that module, and the module can be be a part of an xtsv pipeline.
no code implementations • RANLP 2019 • B{\'a}lint Sass
We implemented this as an effective data structure, and developed an algorithm based on this structure to discover essential verbal expressions from corpus data.
1 code implementation • WS 2019 • Bal{\'a}zs Indig, B{\'a}lint Sass, Eszter Simon, Iv{\'a}n Mittelholcz, No{\'e}mi Vad{\'a}sz, M{\'a}rton Makrai
We present a more efficient version of the e-magyar NLP pipeline for Hungarian called emtsv.
no code implementations • LREC 2014 • Csaba Oravecz, Tam{\'a}s V{\'a}radi, B{\'a}lint Sass
The paper reports on the development of the Hungarian Gigaword Corpus (HGC), an extended new edition of the Hungarian National Corpus, with upgraded and redesigned linguistic annotation and an increased size of 1. 5 billion tokens.