Imputing typological values via phylogenetic inference

EMNLP (SIGTYP) 2020  ·  Gerhard Jäger ·

This paper describes a workflow to impute missing values in a typological database, a sub- set of the World Atlas of Language Structures (WALS). Using a world-wide phylogeny de- rived from lexical data, the model assumes a phylogenetic continuous time Markov chain governing the evolution of typological val- ues. Data imputation is performed via a Max- imum Likelihood estimation on the basis of this model. As back-off model for languages whose phylogenetic position is unknown, a k- nearest neighbor classification based on geo- graphic distance is performed.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here