Tabular Data Imputation: Choose KNN over Deep Learning
As databases are ubiquitous nowadays, missing values constitute a pervasive problem for data analysis. Over the last 70 years, various imputation algorithms for tabular data have been developed and shown useful at estimating missing values. Besides, recent infatuations for Artificial Neural Networks have led to the development of complex and powerful algorithms for data imputation. This study is the first to compare state-of-the-art deep-learning models with the well-established KNN algorithm (1951). By using real-world and generated datasets in various missing data scenarios, we claim that the good old KNN algorithm is still competitive (nay better) than powerful deep-learning algorithms for tabular data imputation. This work advocates for an appropriate and reasonable use of machine learning, in a world where overconsumption, performances and rapidity unfortunately often prevails over sustainability and common sense.
PDF Abstract