Search Results for author: Taraka Rama

Found 38 papers, 7 papers with code

Are Sounds Sound for Phylogenetic Reconstruction?

1 code implementation5 Feb 2024 Luise Häuser, Gerhard Jäger, Taraka Rama, Johann-Mattis List, Alexandros Stamatakis

In traditional studies on language evolution, scholars often emphasize the importance of sound laws and sound correspondences for phylogenetic inference of language family trees.

What do complexity measures measure? Correlating and validating corpus-based measures of morphological complexity

1 code implementation11 Apr 2022 Çağrı Çöltekin, Taraka Rama

We present an analysis of eight measures used for quantifying morphological complexity of natural languages.

Are pre-trained text representations useful for multilingual and multi-dimensional language proficiency modeling?

1 code implementation25 Feb 2021 Taraka Rama, Sowmya Vajjala

Our results indicate that while fine-tuned embeddings are useful for multilingual proficiency modeling, none of the features achieve consistently best performance for all dimensions of language proficiency.

Probing Multilingual BERT for Genetic and Typological Signals

no code implementations COLING 2020 Taraka Rama, Lisa Beinborn, Steffen Eger

We probe the layers in multilingual BERT (mBERT) for phylogenetic and geographic language signals across 100 languages and compute language distances based on the mBERT representations.

regression

Disentangling dialects: a neural approach to Indo-Aryan historical phonology and subgrouping

1 code implementation CONLL 2020 Chundra Cathcart, Taraka Rama

This paper seeks to uncover patterns of sound change across Indo-Aryan languages using an LSTM encoder-decoder architecture.

Word Embeddings

Sigmorphon 2019 Task 2 system description paper: Morphological analysis in context for many languages, with supervision from only a few

no code implementations WS 2019 Brad Aiken, Jared Kelly, Alexis Palmer, Suleyman Olcay Polat, Taraka Rama, Rodney Nielsen

While our system results are dramatically below the average system submitted for the shared task evaluation campaign, our method is (we suspect) unique in its minimal reliance on labeled training data.

Lemmatization Morphological Analysis +5

An Automated Framework for Fast Cognate Detection and Bayesian Phylogenetic Inference in Computational Historical Linguistics

1 code implementation ACL 2019 Taraka Rama, Johann-Mattis List

We present a fully automated workflow for phylogenetic reconstruction on large datasets, consisting of two novel methods, one for fast detection of cognates and one for fast Bayesian phylogenetic inference.

Iterative development of family history annotation guidelines using a synthetic corpus of clinical text

no code implementations WS 2018 Taraka Rama, P{\aa}l Brekke, {\O}ystein Nytr{\o}, Lilja {\O}vrelid

In this article, we describe the development of annotation guidelines for family history information in Norwegian clinical text.

Similarity Dependent Chinese Restaurant Process for Cognate Identification in Multilingual Wordlists

no code implementations CONLL 2018 Taraka Rama

We present and evaluate two similarity dependent Chinese Restaurant Process (sd-CRP) algorithms at the task of automated cognate detection.

Clustering Language Identification

Tübingen-Oslo system: Linear regression works the best at Predicting Current and Future Psychological Health from Childhood Essays in the CLPsych 2018 Shared Task

no code implementations13 Sep 2018 Çağrı Çöltekin, Taraka Rama

We experimented with a number of different models, including recurrent and convolutional networks, Poisson regression, support vector regression, and L1 and L2 regularized linear regression.

regression

Towards identifying the optimal datasize for lexically-based Bayesian inference of linguistic phylogenies

no code implementations COLING 2018 Taraka Rama, S{\o}ren Wichmann

The assumption here is that the value of n for which the quartet distance begins to stabilize is also the value at which the quality of the tree ceases to improve.

Bayesian Inference

Three tree priors and five datasets: A study of the effect of tree priors in Indo-European phylogenetics

no code implementations9 May 2018 Taraka Rama

The root age of the Indo-European family has tended to decrease from an age that supported the Anatolian origin hypothesis to an age that supports the Steppe origin hypothesis with the application of new models (Chang et al., 2015).

Experiments with Universal CEFR Classification

1 code implementation WS 2018 Sowmya Vajjala, Taraka Rama

The Common European Framework of Reference (CEFR) guidelines describe language proficiency of learners on a scale of 6 levels.

Classification General Classification

Are Automatic Methods for Cognate Detection Good Enough for Phylogenetic Reconstruction in Historical Linguistics?

1 code implementation NAACL 2018 Taraka Rama, Johann-Mattis List, Johannes Wahle, Gerhard Jäger

We evaluate the performance of state-of-the-art algorithms for automatic cognate detection by comparing how useful automatically inferred cognates are for the task of phylogenetic inference compared to classical manually annotated cognate sets.

Fewer features perform well at Native Language Identification task

no code implementations WS 2017 Taraka Rama, {\c{C}}a{\u{g}}r{\i} {\c{C}}{\"o}ltekin

In the speech track, an LDA classifier based only on i-vectors performed better than a combination system using text features from speech transcriptions and i-vectors.

Native Language Identification

Fast and unsupervised methods for multilingual cognate clustering

no code implementations16 Feb 2017 Taraka Rama, Johannes Wahle, Pavel Sofroniev, Gerhard Jäger

In this paper we explore the use of unsupervised methods for detecting cognates in multilingual word lists.

Clustering

Siamese Convolutional Networks for Cognate Identification

no code implementations COLING 2016 Taraka Rama

In this paper, we present phoneme level Siamese convolutional networks for the task of pair-wise cognate identification.

Discriminating Similar Languages with Linear SVMs and Neural Networks

no code implementations WS 2016 {\c{C}}a{\u{g}}r{\i} {\c{C}}{\"o}ltekin, Taraka Rama

This paper describes the systems we experimented with for participating in the discriminating between similar languages (DSL) shared task 2016.

Language Identification

LSTM Autoencoders for Dialect Analysis

no code implementations WS 2016 Taraka Rama, {\c{C}}a{\u{g}}r{\i} {\c{C}}{\"o}ltekin

Computational approaches for dialectometry employed Levenshtein distance to compute an aggregate similarity between two dialects belonging to a single language group.

Dimensionality Reduction

Chinese Restaurant Process for cognate clustering: A threshold free approach

no code implementations19 Oct 2016 Taraka Rama

In this paper, we introduce a threshold free approach, motivated from Chinese Restaurant Process, for the purpose of cognate clustering.

Clustering

Siamese convolutional networks based on phonetic features for cognate identification

no code implementations17 May 2016 Taraka Rama

In this paper, we explore the use of convolutional networks (ConvNets) for the purpose of cognate identification.

Empirical Evaluation of Tree distances for Parser Evaluation

no code implementations1 Sep 2014 Taraka Rama

In this empirical study, I compare various tree distance measures -- originally developed in computational biology for the purpose of tree comparison -- for the purpose of parser evaluation.

Gap-weighted subsequences for automatic cognate identification and phylogenetic inference

no code implementations11 Aug 2014 Taraka Rama

The contribution of this paper is the use of subsequence features for cognate identification and to employ the cognate judgments for phylogenetic inference.

Does Syntactic Knowledge help English-Hindi SMT?

no code implementations20 Jan 2014 Taraka Rama, Karthik Gali, Avinesh PVS

In this paper we explore various parameter settings of the state-of-art Statistical Machine Translation system to improve the quality of the translation for a `distant' language pair like English-Hindi.

Machine Translation Translation

Properties of phoneme N -grams across the world's language families

no code implementations4 Jan 2014 Taraka Rama, Lars Borin

We investigate if the sizes of three different N-gram distributions of the world's language families obey a power law.

Quantitative methods for Phylogenetic Inference in Historical Linguistics: An experimental case study of South Central Dravidian

no code implementations3 Jan 2014 Taraka Rama, Sudheer Kolachina, Lakshmi Bai B

In this paper we examine the usefulness of two classes of algorithms Distance Methods, Discrete Character Methods (Felsenstein and Felsenstein 2003) widely used in genetics, for predicting the family relationships among a set of related languages and therefore, diachronic language change.

Cannot find the paper you are looking for? You can Submit a new open access paper.