Performance Evaluation of Linear Regression Algorithm in Cluster Environment

14 Sep 2020  ·  Cinantya Paramita, Fauzi Adi Rafrastara, Usman Sudibyo, R. I. W. Agung Wibowo ·

Cluster computing was introduced to replace the superiority of super computers. Cluster computing is able to overcome the problems that cannot be effectively dealt with supercomputers. In this paper, we are going to evaluate the performance of cluster computing by executing one of data mining techniques in the cluster environment. The experiment will attempt to predict the flight delay by using linear regression algorithm with apache spark as a framework for cluster computing. The result shows that, by involving 5 PCs in cluster environment with equal specifications can increase the performance of computation up to 39.76% compared to the standalone one. Attaching more nodes to the cluster can make the process become faster significantly.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods