Towards Semantic Noise Cleansing of Categorical Data based on Semantic Infusion

6 Feb 2020  ·  Rishabh Gupta, Rajesh N Rao ·

Semantic Noise affects text analytics activities for the domain-specific industries significantly. It impedes the text understanding which holds prime importance in the critical decision making tasks. In this work, we formalize semantic noise as a sequence of terms that do not contribute to the narrative of the text. We look beyond the notion of standard statistically-based stop words and consider the semantics of terms to exclude the semantic noise. We present a novel Semantic Infusion technique to associate meta-data with the categorical corpus text and demonstrate its near-lossless nature. Based on this technique, we propose an unsupervised text-preprocessing framework to filter the semantic noise using the context of the terms. Later we present the evaluation results of the proposed framework using a web forum dataset from the automobile-domain.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here