Integrated Eojeol Embedding for Erroneous Sentence Classification in Korean Chatbots

13 Apr 2020  ·  DongHyun Choi, IlNam Park, Myeong Cheol Shin, EungGyun Kim, Dong Ryeol Shin ·

This paper attempts to analyze the Korean sentence classification system for a chatbot. Sentence classification is the task of classifying an input sentence based on predefined categories. However, spelling or space error contained in the input sentence causes problems in morphological analysis and tokenization. This paper proposes a novel approach of Integrated Eojeol (Korean syntactic word separated by space) Embedding to reduce the effect that poorly analyzed morphemes may make on sentence classification. It also proposes two noise insertion methods that further improve classification performance. Our evaluation results indicate that the proposed system classifies erroneous sentences more accurately than the baseline system by 17%p.0

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here