1 code implementation • NAACL (SIGTYP) 2022 • Johann-Mattis List, Ekaterina Vylomova, Robert Forkel, Nathan Hill, Ryan Cotterell
This study describes the structure and the results of the SIGTYP 2022 shared task on the prediction of cognate reflexes from multilingual wordlists.
1 code implementation • 7 May 2024 • Arne Rubehn, Jessica Nieder, Robert Forkel, Johann-Mattis List
When comparing speech sounds across languages, scholars often make use of feature representations of individual sounds in order to determine fine-grained sound similarities.
1 code implementation • 5 Feb 2024 • Luise Häuser, Gerhard Jäger, Taraka Rama, Johann-Mattis List, Alexandros Stamatakis
In traditional studies on language evolution, scholars often emphasize the importance of sound laws and sound correspondences for phylogenetic inference of language family trees.
2 code implementations • 5 Feb 2024 • Jessica Nieder, Johann-Mattis List
Closely related languages show linguistic similarities that allow speakers of one language to understand speakers of another language without having actively learned it.
1 code implementation • 19 Oct 2023 • Johann-Mattis List, Nathan W. Hill, Robert Forkel, Frederic Blum
Despite the inherently fuzzy nature of reconstructions in historical linguistics, most scholars do not represent their uncertainty when proposing proto-forms.
no code implementations • 9 Aug 2023 • Julius Steuer, Badr Abdullah, Johann-Mattis List, Dietrich Klakow
Training data for our PLMs consists of word lists with a maximum of 1000 entries per language.
1 code implementation • 31 Mar 2023 • Frederic Blum, Johann-Mattis List
Sound correspondence patterns form the basis of cognate detection and phonological reconstruction in historical language comparison.
2 code implementations • 1 Feb 2023 • Johann-Mattis List
The past years have seen a drastic rise in studies devoted to the investigation of colexification patterns in individual languages families in particular and the languages of the world in specific.
1 code implementation • 1 Feb 2023 • John E. Miller, Johann-Mattis List
Language contact is a pervasive phenomenon reflected in the borrowing of words from donor to recipient languages.
1 code implementation • LChange (ACL) 2022 • Johann-Mattis List, Robert Forkel, Nathan W. Hill
Computational approaches in historical linguistics have been increasingly applied during the past decade and many new methods that implement parts of the traditional comparative method have been proposed.
1 code implementation • LREC 2020 • Robert Forkel, Johann-Mattis List
With cldfbench, we introduce a framework for the retro-standardization of legacy data and the curation of new datasets that drastically simplifies the creation of CLDF by providing a consistent, reproducible workflow that rigorously supports version control and long term archiving of research data and code.
1 code implementation • ACL 2019 • Taraka Rama, Johann-Mattis List
We present a fully automated workflow for phylogenetic reconstruction on large datasets, consisting of two novel methods, one for fast detection of cognates and one for fast Bayesian phylogenetic inference.
2 code implementations • CL 2019 • Johann-Mattis List
By excluding those patterns that occur in only a few cognate sets, the core of regularly recurring sound correspondences can be inferred.
1 code implementation • NAACL 2018 • Taraka Rama, Johann-Mattis List, Johannes Wahle, Gerhard Jäger
We evaluate the performance of state-of-the-art algorithms for automatic cognate detection by comparing how useful automatically inferred cognates are for the task of phylogenetic inference compared to classical manually annotated cognate sets.
no code implementations • EACL 2017 • Gerhard J{\"a}ger, Johann-Mattis List, Pavel Sofroniev
Most current approaches in phylogenetic linguistics require as input multilingual word lists partitioned into sets of etymologically related words (cognates).
1 code implementation • EACL 2017 • Johann-Mattis List
The paper presents the Etymological DICtionary ediTOR (EDICTOR), a free, interactive, web-based tool designed to aid historical linguists in creating, editing, analysing, and publishing etymological datasets.
1 code implementation • LREC 2016 • Johann-Mattis List, Michael Cysouw, Robert Forkel
We present an attempt to link the large amount of different concept lists which are used in the linguistic literature, ranging from Swadesh lists in historical linguistics to naming tests in clinical studies and psycholinguistics.
1 code implementation • LREC 2014 • Johann-Mattis List, Jelena Proki{\'c}
In the last two decades, alignment analyses have become an important technique in quantitative historical linguistics and dialectology.