TuGebic is a corpus of recordings of spontaneous speech samples from Turkish-German bilinguals, and the compilation of a corpus called TuGebic. Participants in the study were adult Turkish and German bilinguals living in Germany or Turkey at the time of recording in the first half of the 1990s. The data were manually tokenised and normalised, and all proper names (names of participants and places mentioned in the conversations) were replaced with pseudonyms. Token-level automatic language identification was performed, which made it possible to establish the proportions of words from each language.
Paper | Code | Results | Date | Stars |
---|