no code implementations • EMNLP 2020 • Sangwhan Moon, Naoaki Okazaki
Large scale pre-trained language models have shown groundbreaking performance improvements for transfer learning in the domain of natural language processing.
no code implementations • LREC 2022 • Sangwhan Moon, Won Ik Cho, Hye Joo Han, Naoaki Okazaki, Nam Soo Kim
As this problem originates from the conventional scheme used when creating a POS tagging corpus, we propose an improvement to the existing scheme, which makes it friendlier to generative tasks.
no code implementations • 22 Feb 2024 • Marco Cognetta, Vilém Zouhar, Sangwhan Moon, Naoaki Okazaki
In Tokenization and the Noiseless Channel (Zouhar et al., 2023a), R\'enyi efficiency is suggested as an intrinsic mechanism for evaluating a tokenizer: for NLP tasks, the tokenizer which leads to the highest R\'enyi efficiency of the unigram distribution should be chosen.
no code implementations • LREC 2022 • Hwichan Kim, Sangwhan Moon, Naoaki Okazaki, Mamoru Komachi
Training a model using North Korean data is the most straightforward approach to solving this problem, but there is insufficient data to train NMT models.
1 code implementation • LREC 2022 • Won Ik Cho, Sangwhan Moon, Jong In Kim, Seok Min Kim, Nam Soo Kim
Paraphrasing is often performed with less concern for controlled style conversion.
no code implementations • EMNLP (NLPOSS) 2020 • Won Ik Cho, Sangwhan Moon, YoungSook Song
Korean is often referred to as a low-resource language in the research community.
no code implementations • LREC 2020 • Sangwhan Moon, Naoaki Okazaki
In the context of multilingual language model pre-training, vocabulary size for languages with a broad set of potential characters is an unsolved problem.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Won Ik Cho, Young Ki Moon, Sangwhan Moon, Seok Min Kim, Nam Soo Kim
Modern dialog managers face the challenge of having to fulfill human-level conversational skills as part of common user expectations, including but not limited to discourse with no clear objective.