Search Results for author: Tanja Samard{\v{z}}i{\'c}

Found 15 papers, 1 papers with code

A Report on the Third VarDial Evaluation Campaign

no code implementations • WS 2019 • Marcos Zampieri, Shervin Malmasi, Yves Scherrer, Tanja Samard{\v{z}}i{\'c}, Francis Tyers, Miikka Silfverberg, Natalia Klyueva, Tung-Le Pan, Chu-Ren Huang, Radu Tudor Ionescu, Andrei M. Butnaru, Tommi Jauhiainen

In this paper, we present the findings of the Third VarDial Evaluation Campaign organized as part of the sixth edition of the workshop on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects (VarDial), co-located with NAACL 2019.

Dialect Identification Morphological Analysis

Paper
Add Code

Encoder-Decoder Methods for Text Normalization

1 code implementation • COLING 2018 • Massimo Lusetti, Tatyana Ruzsics, Anne G{\"o}hring, Tanja Samard{\v{z}}i{\'c}, Elisabeth Stark

Text normalization has been addressed with a variety of methods, most successfully with character-level statistical machine translation (CSMT).

Machine Translation Translation

Paper
Code

Language Identification and Morphosyntactic Tagging: The Second VarDial Evaluation Campaign

no code implementations • COLING 2018 • Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Ahmed Ali, Suwon Shon, James Glass, Yves Scherrer, Tanja Samard{\v{z}}i{\'c}, Nikola Ljube{\v{s}}i{\'c}, J{\"o}rg Tiedemann, Chris van der Lee, Stefan Grondelaers, Nelleke Oostdijk, Dirk Speelman, Antal Van den Bosch, Ritesh Kumar, Bornini Lahiri, Mayank Jain

We present the results and the findings of the Second VarDial Evaluation Campaign on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects.

Dependency Parsing Dialect Identification

Paper
Add Code

Neural Sequence-to-sequence Learning of Internal Word Structure

no code implementations • CONLL 2017 • Tatyana Ruzsics, Tanja Samard{\v{z}}i{\'c}

Learning internal word structure has recently been recognized as an important step in various multilingual processing tasks and in theoretical language comparison.

Language Modelling Machine Translation +1

Paper
Add Code

Universal Dependencies for Serbian in Comparison with Croatian and Other Slavic Languages

no code implementations • WS 2017 • Tanja Samard{\v{z}}i{\'c}, Mirjana Starovi{\'c}, {\v{Z}}eljko Agi{\'c}, Nikola Ljube{\v{s}}i{\'c}

The paper documents the procedure of building a new Universal Dependencies (UDv2) treebank for Serbian starting from an existing Croatian UDv1 treebank and taking into account the other Slavic UD annotation guidelines.

Paper
Add Code

TweetGeo - A Tool for Collecting, Processing and Analysing Geo-encoded Linguistic Data

no code implementations • COLING 2016 • Nikola Ljube{\v{s}}i{\'c}, Tanja Samard{\v{z}}i{\'c}, Curdin Derungs

In this paper we present a newly developed tool that enables researchers interested in spatial variation of language to define a geographic perimeter of interest, collect data from the Twitter streaming API published in that perimeter, filter the obtained data by language and country, define and extract variables of interest and analyse the extracted variables by one spatial statistic and two spatial visualisations.

Paper
Add Code

A Comparison Between Morphological Complexity Measures: Typological Data vs. Language Corpora

no code implementations • WS 2016 • Christian Bentz, Tatyana Ruzsics, Alex Koplenig, er, Tanja Samard{\v{z}}i{\'c}

Language complexity is an intriguing phenomenon argued to play an important role in both language learning and processing.

Machine Translation Word Alignment

Paper
Add Code

A Framework for Automatic Acquisition of Croatian and Serbian Verb Aspect from Corpora

no code implementations • LREC 2016 • Tanja Samard{\v{z}}i{\'c}, Maja Mili{\v{c}}evi{\'c}

Focusing on Croatian and Serbian, in this paper we propose a novel framework for automatic classification of their verb types into a number of fine-grained aspectual classes based on the observable morphology of verb forms.

General Classification