no code implementations • LREC 2022 • Fatih Beyhan, Buse Çarık, İnanç Arın, Ayşecan Terzioğlu, Berrin Yanikoglu, Reyyan Yeniterzi
We present a machine learning system for automatic detection of hate speech in Turkish, along with a hate speech dataset consisting of tweets collected in two separate domains.
no code implementations • ACL (CASE) 2021 • Furkan Çelik, Tuğberk Dalkılıç, Fatih Beyhan, Reyyan Yeniterzi
This paper summarizes our group’s efforts in the multilingual protest news detection shared task, which is organized as a part of the Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE) Workshop.
no code implementations • gwll (LREC) 2022 • Merve Doğan, Ceren Oksal, Arife Betül Yenice, Fatih Beyhan, Reyyan Yeniterzi, Olcay Taner Yildiz
This paper aims to present WordNet and Wikipedia connection by linking synsets from Turkish WordNet KeNet with Wikipedia and thus, provide a better machine-readable dictionary to create an NLP model with rich data.
no code implementations • 12 Feb 2024 • Mateusz Łajszczak, Guillermo Cámbara, Yang Li, Fatih Beyhan, Arent van Korlaar, Fan Yang, Arnaud Joly, Álvaro Martín-Cortinas, Ammar Abbas, Adam Michalski, Alexis Moinet, Sri Karlapati, Ewa Muszyńska, Haohan Guo, Bartosz Putrycz, Soledad López Gambino, Kayeon Yoo, Elena Sokolova, Thomas Drugman
Echoing the widely-reported "emergent abilities" of large language models when trained on increasing volume of data, we show that BASE TTS variants built with 10K+ hours and 500M+ parameters begin to demonstrate natural prosody on textually complex sentences.
no code implementations • 21 Nov 2022 • Ali Hürriyetoğlu, Osman Mutlu, Fırat Duruşan, Onur Uca, Alaeddin Selçuk Gürel, Benjamin Radford, Yaoyao Dai, Hansi Hettiarachchi, Niklas Stoehr, Tadashi Nomoto, Milena Slavcheva, Francielle Vargas, Aaqib Javid, Fatih Beyhan, Erdem Yörük
The CASE 2022 extension consists of expanding the test data with more data in previously available languages, namely, English, Hindi, Portuguese, and Spanish, and adding new test data in Mandarin, Turkish, and Urdu for Sub-task 1, document classification.
no code implementations • SemEval (NAACL) 2022 • Buse Çarık, Fatih Beyhan, Reyyan Yeniterzi
This paper describes the system proposed by Sabanc{\i} University Natural Language Processing Group in the SemEval-2022 MultiCoNER task.
1 code implementation • 18 Mar 2022 • Ali Hürriyetoğlu, Osman Mutlu, Fatih Beyhan, Fırat Duruşan, Ali Safaya, Reyyan Yeniterzi, Erdem Yörük
We propose a dataset for event coreference resolution, which is based on random samples drawn from multiple sources, languages, and countries.