Search Results for author: Taraka Rama

Found 38 papers, 7 papers with code

Are Sounds Sound for Phylogenetic Reconstruction?

1 code implementation • 5 Feb 2024 • Luise Häuser, Gerhard Jäger, Taraka Rama, Johann-Mattis List, Alexandros Stamatakis

In traditional studies on language evolution, scholars often emphasize the importance of sound laws and sound correspondences for phylogenetic inference of language family trees.

Paper
Code

What do complexity measures measure? Correlating and validating corpus-based measures of morphological complexity

1 code implementation • 11 Apr 2022 • Çağrı Çöltekin, Taraka Rama

We present an analysis of eight measures used for quantifying morphological complexity of natural languages.

Paper
Code

Are pre-trained text representations useful for multilingual and multi-dimensional language proficiency modeling?

1 code implementation • 25 Feb 2021 • Taraka Rama, Sowmya Vajjala

Our results indicate that while fine-tuned embeddings are useful for multilingual proficiency modeling, none of the features achieve consistently best performance for all dimensions of language proficiency.

Paper
Code

Probing Multilingual BERT for Genetic and Typological Signals

no code implementations • COLING 2020 • Taraka Rama, Lisa Beinborn, Steffen Eger

We probe the layers in multilingual BERT (mBERT) for phylogenetic and geographic language signals across 100 languages and compute language distances based on the mBERT representations.

regression

Paper
Add Code

Disentangling dialects: a neural approach to Indo-Aryan historical phonology and subgrouping

1 code implementation • CONLL 2020 • Chundra Cathcart, Taraka Rama

This paper seeks to uncover patterns of sound change across Indo-Aryan languages using an LSTM encoder-decoder architecture.

Word Embeddings

Paper
Code

Sigmorphon 2019 Task 2 system description paper: Morphological analysis in context for many languages, with supervision from only a few

no code implementations • WS 2019 • Brad Aiken, Jared Kelly, Alexis Palmer, Suleyman Olcay Polat, Taraka Rama, Rodney Nielsen

While our system results are dramatically below the average system submitted for the shared task evaluation campaign, our method is (we suspect) unique in its minimal reliance on labeled training data.

Lemmatization Morphological Analysis +5

Paper
Add Code

Regression or classification? Automated Essay Scoring for Norwegian

no code implementations • WS 2019 • Stig Johan Berggren, Taraka Rama, Lilja {\O}vrelid

In this paper we present first results for the task of Automated Essay Scoring for Norwegian learner language.

Automated Essay Scoring BIG-bench Machine Learning +5

Paper
Add Code

An Automated Framework for Fast Cognate Detection and Bayesian Phylogenetic Inference in Computational Historical Linguistics

1 code implementation • ACL 2019 • Taraka Rama, Johann-Mattis List

We present a fully automated workflow for phylogenetic reconstruction on large datasets, consisting of two novel methods, one for fast detection of cognates and one for fast Bayesian phylogenetic inference.

Paper
Code

Using Universal Dependencies in cross-linguistic complexity research

no code implementations • WS 2018 • Aleks Berdicevskis, rs, {\c{C}}a{\u{g}}r{\i} {\c{C}}{\"o}ltekin, Katharina Ehret, Kilu von Prince, Daniel Ross, Bill Thompson, Chunxiao Yan, Vera Demberg, Gary Lupyan, Taraka Rama, Christian Bentz

We evaluate corpus-based measures of linguistic complexity obtained using Universal Dependencies (UD) treebanks.

Paper
Add Code

Drug-Use Identification from Tweets with Word and Character N-Grams

no code implementations • WS 2018 • {\c{C}}a{\u{g}}r{\i} {\c{C}}{\"o}ltekin, Taraka Rama

This paper describes our systems in social media mining for health applications (SMM4H) shared task.

Text Classification

Paper
Add Code

Iterative development of family history annotation guidelines using a synthetic corpus of clinical text

no code implementations • WS 2018 • Taraka Rama, P{\aa}l Brekke, {\O}ystein Nytr{\o}, Lilja {\O}vrelid

In this article, we describe the development of annotation guidelines for family history information in Norwegian clinical text.

Paper
Add Code

T\"ubingen-Oslo system at SIGMORPHON shared task on morphological inflection. A multi-tasking multilingual sequence to sequence model.

no code implementations • CONLL 2018 • Taraka Rama, {\c{C}}a{\u{g}}r{\i} {\c{C}}{\"o}ltekin

Data Augmentation Morphological Inflection

Paper
Add Code

Similarity Dependent Chinese Restaurant Process for Cognate Identification in Multilingual Wordlists

no code implementations • CONLL 2018 • Taraka Rama

We present and evaluate two similarity dependent Chinese Restaurant Process (sd-CRP) algorithms at the task of automated cognate detection.

Clustering Language Identification

Paper
Add Code

Tübingen-Oslo system: Linear regression works the best at Predicting Current and Future Psychological Health from Childhood Essays in the CLPsych 2018 Shared Task

no code implementations • 13 Sep 2018 • Çağrı Çöltekin, Taraka Rama

We experimented with a number of different models, including recurrent and convolutional networks, Poisson regression, support vector regression, and L1 and L2 regularized linear regression.

regression

Paper
Add Code

T\"ubingen-Oslo Team at the VarDial 2018 Evaluation Campaign: An Analysis of N-gram Features in Language Variety Identification

no code implementations • COLING 2018 • {\c{C}}a{\u{g}}r{\i} {\c{C}}{\"o}ltekin, Taraka Rama, Verena Blaschke

This paper describes our systems for the VarDial 2018 evaluation campaign.

Dialect Identification Document Classification

Paper
Add Code

Towards identifying the optimal datasize for lexically-based Bayesian inference of linguistic phylogenies

no code implementations • COLING 2018 • Taraka Rama, S{\o}ren Wichmann

The assumption here is that the value of n for which the quartet distance begins to stabilize is also the value at which the quality of the tree ceases to improve.

Bayesian Inference

Paper
Add Code

T\"ubingen-Oslo at SemEval-2018 Task 2: SVMs perform better than RNNs in Emoji Prediction

no code implementations • SEMEVAL 2018 • {\c{C}}a{\u{g}}r{\i} {\c{C}}{\"o}ltekin, Taraka Rama

This paper describes our participation in the SemEval-2018 task Multilingual Emoji Prediction.

Document Classification General Classification +5

Paper
Add Code

Three tree priors and five datasets: A study of the effect of tree priors in Indo-European phylogenetics

no code implementations • 9 May 2018 • Taraka Rama

The root age of the Indo-European family has tended to decrease from an age that supported the Anatolian origin hypothesis to an age that supports the Steppe origin hypothesis with the application of new models (Chang et al., 2015).

Paper
Add Code

Experiments with Universal CEFR Classification

1 code implementation • WS 2018 • Sowmya Vajjala, Taraka Rama

The Common European Framework of Reference (CEFR) guidelines describe language proficiency of learners on a scale of 6 levels.

Classification General Classification

Paper
Code

Are Automatic Methods for Cognate Detection Good Enough for Phylogenetic Reconstruction in Historical Linguistics?

1 code implementation • NAACL 2018 • Taraka Rama, Johann-Mattis List, Johannes Wahle, Gerhard Jäger

We evaluate the performance of state-of-the-art algorithms for automatic cognate detection by comparing how useful automatically inferred cognates are for the task of phylogenetic inference compared to classical manually annotated cognate sets.

Paper
Code

Fewer features perform well at Native Language Identification task

no code implementations • WS 2017 • Taraka Rama, {\c{C}}a{\u{g}}r{\i} {\c{C}}{\"o}ltekin

In the speech track, an LDA classifier based only on i-vectors performed better than a combination system using text features from speech transcriptions and i-vectors.

Ranked #1 on Native Language Identification on italki NLI

Native Language Identification

Paper
Add Code

T\"ubingen system in VarDial 2017 shared task: experiments with language identification and cross-lingual parsing

no code implementations • WS 2017 • {\c{C}}a{\u{g}}r{\i} {\c{C}}{\"o}ltekin, Taraka Rama

This paper describes our systems and results on VarDial 2017 shared tasks.

Dependency Parsing Language Identification +1

Paper
Add Code

Computational analysis of Gondi dialects

no code implementations • WS 2017 • Taraka Rama, {\c{C}}a{\u{g}}r{\i} {\c{C}}{\"o}ltekin, Pavel Sofroniev

This paper presents a computational analysis of Gondi dialects spoken in central India.

Word Alignment

Paper
Add Code

Fast and unsupervised methods for multilingual cognate clustering

no code implementations • 16 Feb 2017 • Taraka Rama, Johannes Wahle, Pavel Sofroniev, Gerhard Jäger

In this paper we explore the use of unsupervised methods for detecting cognates in multilingual word lists.

Clustering

Paper
Add Code

A Telugu treebank based on a grammar book

no code implementations • WS 2017 • Taraka Rama, Sowmya Vajjala

Dependency Parsing

Paper
Add Code

Siamese Convolutional Networks for Cognate Identification

no code implementations • COLING 2016 • Taraka Rama

In this paper, we present phoneme level Siamese convolutional networks for the task of pair-wise cognate identification.

Paper
Add Code

Discriminating Similar Languages with Linear SVMs and Neural Networks

no code implementations • WS 2016 • {\c{C}}a{\u{g}}r{\i} {\c{C}}{\"o}ltekin, Taraka Rama

This paper describes the systems we experimented with for participating in the discriminating between similar languages (DSL) shared task 2016.

Language Identification

Paper
Add Code

LSTM Autoencoders for Dialect Analysis

no code implementations • WS 2016 • Taraka Rama, {\c{C}}a{\u{g}}r{\i} {\c{C}}{\"o}ltekin

Computational approaches for dialectometry employed Levenshtein distance to compute an aggregate similarity between two dialects belonging to a single language group.

Dimensionality Reduction

Paper
Add Code

Chinese Restaurant Process for cognate clustering: A threshold free approach

no code implementations • 19 Oct 2016 • Taraka Rama

In this paper, we introduce a threshold free approach, motivated from Chinese Restaurant Process, for the purpose of cognate clustering.

Clustering

Paper
Add Code

Siamese convolutional networks based on phonetic features for cognate identification

no code implementations • 17 May 2016 • Taraka Rama

In this paper, we explore the use of convolutional networks (ConvNets) for the purpose of cognate identification.

Paper
Add Code

Automatic cognate identification with gap-weighted string subsequences.

no code implementations • HLT 2015 • Taraka Rama

Information Retrieval Retrieval +1

Paper
Add Code

Empirical Evaluation of Tree distances for Parser Evaluation

no code implementations • 1 Sep 2014 • Taraka Rama

In this empirical study, I compare various tree distance measures -- originally developed in computational biology for the purpose of tree comparison -- for the purpose of parser evaluation.

Paper
Add Code

Gap-weighted subsequences for automatic cognate identification and phylogenetic inference

no code implementations • 11 Aug 2014 • Taraka Rama

The contribution of this paper is the use of subsequence features for cognate identification and to employ the cognate judgments for phylogenetic inference.

Paper
Add Code

Linguistic landscaping of South Asia using digital language resources: Genetic vs. areal linguistics

no code implementations • LREC 2014 • Lars Borin, Anju Saxena, Taraka Rama, Bernard Comrie

Like many other research fields, linguistics is entering the age of big data.

Paper
Add Code

Does Syntactic Knowledge help English-Hindi SMT?

no code implementations • 20 Jan 2014 • Taraka Rama, Karthik Gali, Avinesh PVS

In this paper we explore various parameter settings of the state-of-art Statistical Machine Translation system to improve the quality of the translation for a `distant' language pair like English-Hindi.

Machine Translation Translation

Paper
Add Code

Properties of phoneme N -grams across the world's language families

no code implementations • 4 Jan 2014 • Taraka Rama, Lars Borin

We investigate if the sizes of three different N-gram distributions of the world's language families obey a power law.

Paper
Add Code

Quantitative methods for Phylogenetic Inference in Historical Linguistics: An experimental case study of South Central Dravidian

no code implementations • 3 Jan 2014 • Taraka Rama, Sudheer Kolachina, Lakshmi Bai B

In this paper we examine the usefulness of two classes of algorithms Distance Methods, Discrete Character Methods (Felsenstein and Felsenstein 2003) widely used in genetics, for predicting the family relationships among a set of related languages and therefore, diachronic language change.

Paper
Add Code

How Good are Typological Distances for Determining Genealogical Relationships among Languages?

no code implementations • COLING 2012 • Taraka Rama, Prasanth Kolachina

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.