no code implementations • 30 Apr 2024 • Kirill Milintsevich, Gaël Dias, Kairit Sirts
This paper explores the impact of incorporating sentiment, emotion, and domain-specific lexicons into a transformer-based model for depression symptom estimation.
no code implementations • 30 Apr 2024 • Aleksei Dorkin, Kairit Sirts
We present an information retrieval based reverse dictionary system using modern pre-trained language models and approximate nearest neighbors search algorithms.
no code implementations • 23 Apr 2024 • Aleksei Dorkin, Kairit Sirts
This study evaluates three different lemmatization approaches to Estonian -- Generative character-level models, Pattern-based word-level classification models, and rule-based morphological analysis.
no code implementations • 19 Apr 2024 • Aleksei Dorkin, Kairit Sirts
We present our submission to the unconstrained subtask of the SIGTYP 2024 Shared Task on Word Embedding Evaluation for Ancient and Historical Languages for morphological annotation, POS-tagging, lemmatization, character- and word-level gap-filling.
no code implementations • 1 Mar 2024 • Kirill Milintsevich, Kairit Sirts, Gaël Dias
This paper addresses the quality of annotations in mental health datasets used for NLP-based depression level estimation from social media texts.
1 code implementation • EACL 2021 • Kirill Milintsevich, Kairit Sirts
We also compare with other methods of integrating external data into lemmatization and show that our enhanced system performs considerably better than a simple lexicon extension method based on the Stanza system, and it achieves complementary improvements w. r. t.
1 code implementation • 16 Nov 2020 • Kairit Sirts, Kairit Peekman
Texts obtained from web are noisy and do not necessarily follow the orthographic sentence and word boundary rules.
no code implementations • NoDaLiDa 2021 • Hasan Tanvir, Claudia Kittask, Sandra Eiche, Kairit Sirts
This paper presents EstBERT, a large pretrained transformer-based language-specific BERT model for Estonian.
no code implementations • 1 Oct 2020 • Claudia Kittask, Kirill Milintsevich, Kairit Sirts
Recently, large pre-trained language models, such as BERT, have reached state-of-the-art performance in many natural language processing tasks, but for many languages, including Estonian, BERT models are not yet available.
1 code implementation • CONLL 2018 • Alexander Tkachenko, Kairit Sirts
Neural morphological tagging has been regarded as an extension to POS tagging task, treating each morphological tag as a monolithic label and ignoring its internal structure.
no code implementations • 16 Oct 2018 • Alexander Tkachenko, Kairit Sirts
Secondly, we complement these models with the analyses generated by a rule-based Estonian morphological analyser (MA) VABAMORF , thus performing a soft morphological disambiguation.
no code implementations • 11 Oct 2018 • Faiz Ali Shah, Kairit Sirts, Dietmar Pfahl
Our experiments show that having annotated training reviews from the test app is not necessary although including them into training set helps to improve recall.
no code implementations • CONLL 2017 • Kairit Sirts, Olivier Piguet, Mark Johnson
ID has been used in two different versions: propositional idea density (PID) counts the expressed ideas and can be applied to any text while semantic idea density (SID) counts pre-defined information content units and is naturally more applicable to normative domains, such as picture description tasks.
1 code implementation • WS 2017 • Avo Muromägi, Kairit Sirts, Sven Laur
The results show that while using the ordinary least squares regression performs poorly in our experiments, using orthogonal Procrustes to combine several word embedding models into an ensemble model leads to 7-10% relative improvements over the mean result of the initial models in synonym tests and 19-47% in analogy tests.
1 code implementation • NAACL 2016 • Dat Quoc Nguyen, Kairit Sirts, Lizhen Qu, Mark Johnson
Knowledge bases of real-world facts about entities and their relationships are useful resources for a variety of natural language processing tasks.
no code implementations • CONLL 2016 • Dat Quoc Nguyen, Kairit Sirts, Lizhen Qu, Mark Johnson
Knowledge bases are useful resources for many natural language processing tasks, however, they are far from complete.
no code implementations • TACL 2013 • Kairit Sirts, Sharon Goldwater
This paper explores the use of Adaptor Grammars, a nonparametric Bayesian modelling framework, for minimally supervised morphological segmentation.