1 code implementation • 24 Oct 2023 • Chaewon Park, Soohwan Kim, Kyubyong Park, Kunwoo Park
This resource is the largest offensive language corpus in Korean and is the first to offer target-specific ratings on a three-point Likert scale, enabling the detection of hate expressions in Korean across varying degrees of offensiveness.
no code implementations • 4 Jun 2023 • Hyunwoong Ko, Kichang Yang, Minho Ryu, Taekyoon Choi, Seungmu Yang, jiwung Hyun, Sungho Park, Kyubyong Park
This paper presents our work in developing the Polyglot Korean models, which propose some steps towards addressing the non-English language performance gap in multilingual language models.
1 code implementation • Asian Chapter of the Association for Computational Linguistics 2020 • Kyubyong Park, Joohong Lee, Seongbo Jang, Dawoon Jung
Typically, tokenization is the very first step in most text processing works.
1 code implementation • 28 Apr 2020 • Kyubyong Park
Korean is a morphologically rich language.
1 code implementation • 10 Apr 2020 • Yo Joong Choe, Jiyeon Ham, Kyubyong Park
Invariant risk minimization (IRM) (Arjovsky et al., 2019) is a recently proposed framework designed for learning predictors that are invariant to spurious correlations across different training environments.
3 code implementations • Findings of the Association for Computational Linguistics 2020 • Jiyeon Ham, Yo Joong Choe, Kyubyong Park, Ilji Choi, Hyungjoon Soh
Although several benchmark datasets for those tasks have been released in English and a few other languages, there are no publicly available NLI or STS datasets in the Korean language.
Natural Language Inference
Natural Language Understanding
+2
1 code implementation • 7 Apr 2020 • Kyubyong Park, Seanie Lee
Conversion of Chinese graphemes to phonemes (G2P) is an essential component in Mandarin Chinese Text-To-Speech (TTS) systems.
Ranked #2 on
Polyphone disambiguation
on CPP
2 code implementations • LREC 2020 • Yo Joong Choe, Kyubyong Park, Dongwoo Kim
We wrap our dataset and model in an easy-to-use Python library, which supports downloading and retrieving top-k word translations in any of the supported language pairs as well as computing top-k word translations for custom parallel corpora.
1 code implementation • LREC 2020 • Kyubyong Park, Yo Joong Choe, Jiyeon Ham
Jejueo was classified as critically endangered by UNESCO in 2010.
2 code implementations • WS 2019 • Yo Joong Choe, Jiyeon Ham, Kyubyong Park, Yeoil Yoon
The resulting parallel corpora are subsequently used to pre-train Transformer models.
Ranked #17 on
Grammatical Error Correction
on BEA-2019 (test)
no code implementations • 17 Apr 2019 • Jaechang Lim, Seongok Ryu, Kyubyong Park, Yo Joong Choe, Jiyeon Ham, Woo Youn Kim
Accurate prediction of drug-target interaction (DTI) is essential for in silico drug design.
1 code implementation • 27 Mar 2019 • Kyubyong Park, Thomas Mulc
We describe our development of CSS10, a collection of single speaker speech datasets for ten languages.