1 code implementation • 5 Feb 2024 • Luise Häuser, Gerhard Jäger, Taraka Rama, Johann-Mattis List, Alexandros Stamatakis
In traditional studies on language evolution, scholars often emphasize the importance of sound laws and sound correspondences for phylogenetic inference of language family trees.
1 code implementation • 11 Apr 2022 • Çağrı Çöltekin, Taraka Rama
We present an analysis of eight measures used for quantifying morphological complexity of natural languages.
1 code implementation • 25 Feb 2021 • Taraka Rama, Sowmya Vajjala
Our results indicate that while fine-tuned embeddings are useful for multilingual proficiency modeling, none of the features achieve consistently best performance for all dimensions of language proficiency.
no code implementations • COLING 2020 • Taraka Rama, Lisa Beinborn, Steffen Eger
We probe the layers in multilingual BERT (mBERT) for phylogenetic and geographic language signals across 100 languages and compute language distances based on the mBERT representations.
1 code implementation • CONLL 2020 • Chundra Cathcart, Taraka Rama
This paper seeks to uncover patterns of sound change across Indo-Aryan languages using an LSTM encoder-decoder architecture.
no code implementations • WS 2019 • Brad Aiken, Jared Kelly, Alexis Palmer, Suleyman Olcay Polat, Taraka Rama, Rodney Nielsen
While our system results are dramatically below the average system submitted for the shared task evaluation campaign, our method is (we suspect) unique in its minimal reliance on labeled training data.
no code implementations • WS 2019 • Stig Johan Berggren, Taraka Rama, Lilja {\O}vrelid
In this paper we present first results for the task of Automated Essay Scoring for Norwegian learner language.
1 code implementation • ACL 2019 • Taraka Rama, Johann-Mattis List
We present a fully automated workflow for phylogenetic reconstruction on large datasets, consisting of two novel methods, one for fast detection of cognates and one for fast Bayesian phylogenetic inference.
no code implementations • WS 2018 • Aleks Berdicevskis, rs, {\c{C}}a{\u{g}}r{\i} {\c{C}}{\"o}ltekin, Katharina Ehret, Kilu von Prince, Daniel Ross, Bill Thompson, Chunxiao Yan, Vera Demberg, Gary Lupyan, Taraka Rama, Christian Bentz
We evaluate corpus-based measures of linguistic complexity obtained using Universal Dependencies (UD) treebanks.
no code implementations • WS 2018 • {\c{C}}a{\u{g}}r{\i} {\c{C}}{\"o}ltekin, Taraka Rama
This paper describes our systems in social media mining for health applications (SMM4H) shared task.
no code implementations • WS 2018 • Taraka Rama, P{\aa}l Brekke, {\O}ystein Nytr{\o}, Lilja {\O}vrelid
In this article, we describe the development of annotation guidelines for family history information in Norwegian clinical text.
no code implementations • CONLL 2018 • Taraka Rama
We present and evaluate two similarity dependent Chinese Restaurant Process (sd-CRP) algorithms at the task of automated cognate detection.
no code implementations • 13 Sep 2018 • Çağrı Çöltekin, Taraka Rama
We experimented with a number of different models, including recurrent and convolutional networks, Poisson regression, support vector regression, and L1 and L2 regularized linear regression.
no code implementations • COLING 2018 • {\c{C}}a{\u{g}}r{\i} {\c{C}}{\"o}ltekin, Taraka Rama, Verena Blaschke
This paper describes our systems for the VarDial 2018 evaluation campaign.
no code implementations • COLING 2018 • Taraka Rama, S{\o}ren Wichmann
The assumption here is that the value of n for which the quartet distance begins to stabilize is also the value at which the quality of the tree ceases to improve.
no code implementations • SEMEVAL 2018 • {\c{C}}a{\u{g}}r{\i} {\c{C}}{\"o}ltekin, Taraka Rama
This paper describes our participation in the SemEval-2018 task Multilingual Emoji Prediction.
no code implementations • 9 May 2018 • Taraka Rama
The root age of the Indo-European family has tended to decrease from an age that supported the Anatolian origin hypothesis to an age that supports the Steppe origin hypothesis with the application of new models (Chang et al., 2015).
1 code implementation • WS 2018 • Sowmya Vajjala, Taraka Rama
The Common European Framework of Reference (CEFR) guidelines describe language proficiency of learners on a scale of 6 levels.
1 code implementation • NAACL 2018 • Taraka Rama, Johann-Mattis List, Johannes Wahle, Gerhard Jäger
We evaluate the performance of state-of-the-art algorithms for automatic cognate detection by comparing how useful automatically inferred cognates are for the task of phylogenetic inference compared to classical manually annotated cognate sets.
no code implementations • WS 2017 • Taraka Rama, {\c{C}}a{\u{g}}r{\i} {\c{C}}{\"o}ltekin
In the speech track, an LDA classifier based only on i-vectors performed better than a combination system using text features from speech transcriptions and i-vectors.
Ranked #1 on Native Language Identification on italki NLI
no code implementations • WS 2017 • {\c{C}}a{\u{g}}r{\i} {\c{C}}{\"o}ltekin, Taraka Rama
This paper describes our systems and results on VarDial 2017 shared tasks.
no code implementations • WS 2017 • Taraka Rama, {\c{C}}a{\u{g}}r{\i} {\c{C}}{\"o}ltekin, Pavel Sofroniev
This paper presents a computational analysis of Gondi dialects spoken in central India.
no code implementations • 16 Feb 2017 • Taraka Rama, Johannes Wahle, Pavel Sofroniev, Gerhard Jäger
In this paper we explore the use of unsupervised methods for detecting cognates in multilingual word lists.
no code implementations • COLING 2016 • Taraka Rama
In this paper, we present phoneme level Siamese convolutional networks for the task of pair-wise cognate identification.
no code implementations • WS 2016 • {\c{C}}a{\u{g}}r{\i} {\c{C}}{\"o}ltekin, Taraka Rama
This paper describes the systems we experimented with for participating in the discriminating between similar languages (DSL) shared task 2016.
no code implementations • WS 2016 • Taraka Rama, {\c{C}}a{\u{g}}r{\i} {\c{C}}{\"o}ltekin
Computational approaches for dialectometry employed Levenshtein distance to compute an aggregate similarity between two dialects belonging to a single language group.
no code implementations • 19 Oct 2016 • Taraka Rama
In this paper, we introduce a threshold free approach, motivated from Chinese Restaurant Process, for the purpose of cognate clustering.
no code implementations • 17 May 2016 • Taraka Rama
In this paper, we explore the use of convolutional networks (ConvNets) for the purpose of cognate identification.
no code implementations • 1 Sep 2014 • Taraka Rama
In this empirical study, I compare various tree distance measures -- originally developed in computational biology for the purpose of tree comparison -- for the purpose of parser evaluation.
no code implementations • 11 Aug 2014 • Taraka Rama
The contribution of this paper is the use of subsequence features for cognate identification and to employ the cognate judgments for phylogenetic inference.
no code implementations • LREC 2014 • Lars Borin, Anju Saxena, Taraka Rama, Bernard Comrie
Like many other research fields, linguistics is entering the age of big data.
no code implementations • 20 Jan 2014 • Taraka Rama, Karthik Gali, Avinesh PVS
In this paper we explore various parameter settings of the state-of-art Statistical Machine Translation system to improve the quality of the translation for a `distant' language pair like English-Hindi.
no code implementations • 4 Jan 2014 • Taraka Rama, Lars Borin
We investigate if the sizes of three different N-gram distributions of the world's language families obey a power law.
no code implementations • 3 Jan 2014 • Taraka Rama, Sudheer Kolachina, Lakshmi Bai B
In this paper we examine the usefulness of two classes of algorithms Distance Methods, Discrete Character Methods (Felsenstein and Felsenstein 2003) widely used in genetics, for predicting the family relationships among a set of related languages and therefore, diachronic language change.