Preprocessing Methods and Pipelines of Data Mining: An Overview

20 Jun 2019  ·  Canchen Li ·

Data mining is about obtaining new knowledge from existing datasets. However, the data in the existing datasets can be scattered, noisy, and even incomplete. Although lots of effort is spent on developing or fine-tuning data mining models to make them more robust to the noise of the input data, their qualities still strongly depend on the quality of it. The article starts with an overview of the data mining pipeline, where the procedures in a data mining task are briefly introduced. Then an overview of the data preprocessing techniques which are categorized as the data cleaning, data transformation and data preprocessing is given. Detailed preprocessing methods, as well as their influenced on the data mining models, are covered in this article.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here