Search Results for author: Kairit Sirts

Found 23 papers, 5 papers with code

Evaluating Lexicon Incorporation for Depression Symptom Estimation

no code implementations • 30 Apr 2024 • Kirill Milintsevich, Gaël Dias, Kairit Sirts

This paper explores the impact of incorporating sentiment, emotion, and domain-specific lexicons into a transformer-based model for depression symptom estimation.

Paper
Add Code

Sõnajaht: Definition Embeddings and Semantic Search for Reverse Dictionary Creation

no code implementations • 30 Apr 2024 • Aleksei Dorkin, Kairit Sirts

We present an information retrieval based reverse dictionary system using modern pre-trained language models and approximate nearest neighbors search algorithms.

Paper
Add Code

Comparison of Current Approaches to Lemmatization: A Case Study in Estonian

no code implementations • 23 Apr 2024 • Aleksei Dorkin, Kairit Sirts

This study evaluates three different lemmatization approaches to Estonian -- Generative character-level models, Pattern-based word-level classification models, and rule-based morphological analysis.

Classification Lemmatization +1

Paper
Add Code

TartuNLP @ SIGTYP 2024 Shared Task: Adapting XLM-RoBERTa for Ancient and Historical Languages

no code implementations • 19 Apr 2024 • Aleksei Dorkin, Kairit Sirts

We present our submission to the unconstrained subtask of the SIGTYP 2024 Shared Task on Word Embedding Evaluation for Ancient and Historical Languages for morphological annotation, POS-tagging, lemmatization, character- and word-level gap-filling.

Lemmatization POS +1

Paper
Add Code

Your Model Is Not Predicting Depression Well And That Is Why: A Case Study of PRIMATE Dataset

no code implementations • 1 Mar 2024 • Kirill Milintsevich, Kairit Sirts, Gaël Dias

This paper addresses the quality of annotations in mental health datasets used for NLP-based depression level estimation from social media texts.

Paper
Add Code

Enhancing Sequence-to-Sequence Neural Lemmatization with External Resources

1 code implementation • EACL 2021 • Kirill Milintsevich, Kairit Sirts

We also compare with other methods of integrating external data into lemmatization and show that our enhanced system performs considerably better than a simple lexicon extension method based on the Stanza system, and it achieves complementary improvements w. r. t.

Data Augmentation LEMMA +1

Paper
Code

Evaluating Sentence Segmentation and Word Tokenization Systems on Estonian Web Texts

1 code implementation • 16 Nov 2020 • Kairit Sirts, Kairit Peekman

Texts obtained from web are noisy and do not necessarily follow the orthographic sentence and word boundary rules.

Segmentation Sentence +1

Paper
Code

EstBERT: A Pretrained Language-Specific BERT for Estonian

no code implementations • NoDaLiDa 2021 • Hasan Tanvir, Claudia Kittask, Sandra Eiche, Kairit Sirts

This paper presents EstBERT, a large pretrained transformer-based language-specific BERT model for Estonian.

Morphological Tagging named-entity-recognition +5

Paper
Add Code

Evaluating Multilingual BERT for Estonian

no code implementations • 1 Oct 2020 • Claudia Kittask, Kirill Milintsevich, Kairit Sirts

Recently, large pre-trained language models, such as BERT, have reached state-of-the-art performance in many natural language processing tasks, but for many languages, including Estonian, BERT models are not yet available.

Morphological Tagging NER +3

Paper
Add Code

Modeling Composite Labels for Neural Morphological Tagging

1 code implementation • CONLL 2018 • Alexander Tkachenko, Kairit Sirts

Neural morphological tagging has been regarded as an extension to POS tagging task, treating each morphological tag as a monolithic label and ignoring its internal structure.

Morphological Tagging POS +2

Paper
Code

Neural Morphological Tagging for Estonian

no code implementations • 16 Oct 2018 • Alexander Tkachenko, Kairit Sirts

Secondly, we complement these models with the analyses generated by a rule-based Estonian morphological analyser (MA) VABAMORF , thus performing a soft morphological disambiguation.

Morphological Disambiguation Morphological Tagging +1

Paper
Add Code

The Impact of Annotation Guidelines and Annotated Data on Extracting App Features from App Reviews

no code implementations • 11 Oct 2018 • Faiz Ali Shah, Kairit Sirts, Dietmar Pfahl

Our experiments show that having annotated training reviews from the test app is not necessary although including them into training set helps to improve recall.

Paper
Add Code

Idea density for predicting Alzheimer's disease from transcribed speech

no code implementations • CONLL 2017 • Kairit Sirts, Olivier Piguet, Mark Johnson

ID has been used in two different versions: propositional idea density (PID) counts the expressed ideas and can be applied to any text while semantic idea density (SID) counts pre-defined information content units and is naturally more applicable to normative domains, such as picture description tasks.

Clustering

Paper
Add Code

Linear Ensembles of Word Embedding Models

1 code implementation • WS 2017 • Avo Muromägi, Kairit Sirts, Sven Laur

The results show that while using the ordinary least squares regression performs poorly in our experiments, using orthogonal Procrustes to combine several word embedding models into an ensemble model leads to 7-10% relative improvements over the mean result of the initial models in synonym tests and 19-47% in analogy tests.

regression Word Embeddings