no code implementations • NAACL (CALCS) 2021 • Şaziye Betül Özateş, Özlem Çetinoğlu
Morphological tagging of code-switching (CS) data becomes more challenging especially when language pairs composing the CS data have different morphological representations.
no code implementations • LREC 2022 • Özlem Çetinoğlu, Antje Schweitzer
In this paper, we describe the anonymisation process of a Turkish-German code-switching corpus, namely SAGT, which consists of speech data and a treebank that is built on its transcripts.
1 code implementation • ACL (GeBNLP) 2021 • Agnieszka Falenska, Özlem Çetinoğlu
Potential gender biases existing in Wikipedia’s content can contribute to biased behaviors in a variety of downstream NLP systems.
1 code implementation • EMNLP (WNUT) 2021 • Rob van der Goot, Alan Ramponi, Arkaitz Zubiaga, Barbara Plank, Benjamin Muller, Iñaki San Vicente Roncal, Nikola Ljubešić, Özlem Çetinoğlu, Rahmad Mahendra, Talha Çolakoğlu, Timothy Baldwin, Tommaso Caselli, Wladimir Sidorenko
This task is beneficial for downstream analysis, as it provides a way to harmonize (often spontaneous) linguistic variation.
1 code implementation • Findings (NAACL) 2022 • Şaziye Özateş, Arzucan Özgür, Tunga Gungor, Özlem Çetinoğlu
Code-switching dependency parsing stands as a challenging task due to both the scarcity of necessary resources and the structural difficulties embedded in code-switched languages.
no code implementations • 11 Apr 2022 • Çağrı Çöltekin, A. Seza Doğruöz, Özlem Çetinoğlu
This paper presents a comprehensive survey of corpora and lexical resources available for Turkish.
no code implementations • 3 Nov 2020 • Manuela Sanguinetti, Lauren Cassidy, Cristina Bosco, Özlem Çetinoğlu, Alessandra Teresa Cignarella, Teresa Lynn, Ines Rehbein, Josef Ruppenhofer, Djamé Seddah, Amir Zeldes
This article presents a discussion on the main linguistic phenomena which cause difficulties in the analysis of user-generated texts found on the web and in social media, and proposes a set of annotation guidelines for their treatment within the Universal Dependencies (UD) framework of syntactic analysis.
no code implementations • EMNLP 2020 • Manuel Mager, Özlem Çetinoğlu, Katharina Kann
Canonical morphological segmentation consists of dividing words into their standardized morphemes.
no code implementations • 1 Jun 2020 • Rob van der Goot, Özlem Çetinoğlu
Lexical normalization, the translation of non-canonical data to standard language, has shown to improve the performance of manynatural language processing tasks on social media.
no code implementations • NAACL 2019 • Manuel Mager, Özlem Çetinoğlu, Katharina Kann
Language identification for code-switching (CS), the phenomenon of alternating between two or more languages in conversations, has traditionally been approached under the assumption of a single language per token.
no code implementations • WS 2016 • Özlem Çetinoğlu, Sarah Schulz, Ngoc Thang Vu
This paper addresses challenges of Natural Language Processing (NLP) on non-canonical multilingual data in which two or more languages are mixed.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+7