no code implementations • WS 2019 • Marcos Zampieri, Shervin Malmasi, Yves Scherrer, Tanja Samard{\v{z}}i{\'c}, Francis Tyers, Miikka Silfverberg, Natalia Klyueva, Tung-Le Pan, Chu-Ren Huang, Radu Tudor Ionescu, Andrei M. Butnaru, Tommi Jauhiainen
In this paper, we present the findings of the Third VarDial Evaluation Campaign organized as part of the sixth edition of the workshop on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects (VarDial), co-located with NAACL 2019.
1 code implementation • COLING 2018 • Massimo Lusetti, Tatyana Ruzsics, Anne G{\"o}hring, Tanja Samard{\v{z}}i{\'c}, Elisabeth Stark
Text normalization has been addressed with a variety of methods, most successfully with character-level statistical machine translation (CSMT).
no code implementations • COLING 2018 • Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Ahmed Ali, Suwon Shon, James Glass, Yves Scherrer, Tanja Samard{\v{z}}i{\'c}, Nikola Ljube{\v{s}}i{\'c}, J{\"o}rg Tiedemann, Chris van der Lee, Stefan Grondelaers, Nelleke Oostdijk, Dirk Speelman, Antal Van den Bosch, Ritesh Kumar, Bornini Lahiri, Mayank Jain
We present the results and the findings of the Second VarDial Evaluation Campaign on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects.
no code implementations • CONLL 2017 • Tatyana Ruzsics, Tanja Samard{\v{z}}i{\'c}
Learning internal word structure has recently been recognized as an important step in various multilingual processing tasks and in theoretical language comparison.
no code implementations • WS 2017 • Tanja Samard{\v{z}}i{\'c}, Mirjana Starovi{\'c}, {\v{Z}}eljko Agi{\'c}, Nikola Ljube{\v{s}}i{\'c}
The paper documents the procedure of building a new Universal Dependencies (UDv2) treebank for Serbian starting from an existing Croatian UDv1 treebank and taking into account the other Slavic UD annotation guidelines.
no code implementations • COLING 2016 • Nikola Ljube{\v{s}}i{\'c}, Tanja Samard{\v{z}}i{\'c}, Curdin Derungs
In this paper we present a newly developed tool that enables researchers interested in spatial variation of language to define a geographic perimeter of interest, collect data from the Twitter streaming API published in that perimeter, filter the obtained data by language and country, define and extract variables of interest and analyse the extracted variables by one spatial statistic and two spatial visualisations.
no code implementations • WS 2016 • Christian Bentz, Tatyana Ruzsics, Alex Koplenig, er, Tanja Samard{\v{z}}i{\'c}
Language complexity is an intriguing phenomenon argued to play an important role in both language learning and processing.
no code implementations • LREC 2016 • Tanja Samard{\v{z}}i{\'c}, Maja Mili{\v{c}}evi{\'c}
Focusing on Croatian and Serbian, in this paper we propose a novel framework for automatic classification of their verb types into a number of fine-grained aspectual classes based on the observable morphology of verb forms.
no code implementations • LREC 2016 • Tanja Samard{\v{z}}i{\'c}, Yves Scherrer, Elvira Glaser
Swiss dialects of German are, unlike most dialects of well standardised languages, widely used in everyday communication.
no code implementations • LREC 2012 • Andrea Gesmundo, Tanja Samard{\v{z}}i{\'c}
We present a novel tool for morphological analysis of Serbian, which is a low-resource language with rich morphology.