Investigating the effect of binning on causal discovery

23 Feb 2022  ·  Andrew Colt Deckert, Erich Kummerfeld ·

Binning (a.k.a. discretization) of numerically continuous measurements is a wide-spread but controversial practice in data collection, analysis, and presentation. The consequences of binning have been evaluated for many different kinds of data analysis methods, however so far the effect of binning on causal discovery algorithms has not been directly investigated. This paper reports the results of a simulation study that examined the effect of binning on the Greedy Equivalence Search (GES) causal discovery algorithm. Our findings suggest that unbinned continuous data often result in the highest search performance, but some exceptions are identified. We also found that binned data are more sensitive to changes in sample size and tuning parameters, and identified some interactive effects between sample size, binning, and tuning parameter on performance.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here