Modeling Tabular data using Conditional GAN

Modeling the probability distribution of rows in tabular data and generating realistic synthetic data is a non-trivial task. Tabular data usually contains a mix of discrete and continuous columns. Continuous columns may have multiple modes whereas discrete columns are sometimes imbalanced making the modeling difficult. Existing statistical and deep neural network models fail to properly model this type of data. We design TGAN, which uses a conditional generative adversarial network to address these challenges. To aid in a fair and thorough comparison, we design a benchmark with 7 simulated and 8 real datasets and several Bayesian network baselines. TGAN outperforms Bayesian methods on most of the real datasets whereas other deep learning methods could not.

PDF Abstract NeurIPS 2019 PDF NeurIPS 2019 Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Tabular Data Generation Adult Census Income CTGAN LR Accuracy 83.2 # 4
DT Accuracy 81.32 # 5
RF Accuracy 83.53 # 4
Parameters(M) 0.302 # 3
Tabular Data Generation Adult Census Income CopulaGAN LR Accuracy 80.61 # 5
DT Accuracy 76.29 # 6
RF Accuracy 80.46 # 6
Parameters(M) 0.300 # 2
Tabular Data Generation Adult Census Income TVAE LR Accuracy 80.53 # 6
DT Accuracy 82.8 # 4
RF Accuracy 83.48 # 5
Parameters(M) 0.053 # 1
Tabular Data Generation California Housing Prices TVAE Parameters(M) 0.045 # 1
RF Mean Squared Error 0.35 # 3
LR Mean Squared Error 0.65 # 5
DT Mean Squared Error 0.45 # 3
Tabular Data Generation California Housing Prices CopulaGAN Parameters(M) 0.201 # 3
RF Mean Squared Error 0.99 # 6
LR Mean Squared Error 0.98 # 6
DT Mean Squared Error 1.19 # 6
Tabular Data Generation California Housing Prices CTGAN Parameters(M) 0.197 # 2
RF Mean Squared Error 0.62 # 5
LR Mean Squared Error 0.61 # 4
DT Mean Squared Error 0.82 # 5
Tabular Data Generation Diabetes CTGAN LR Accuracy 0.5093 # 5
DT Accuracy 0.4973 # 5
RF Accuracy 0.5223 # 5
Parameters(M) 9.6 # 4
Tabular Data Generation Diabetes CopulaGAN LR Accuracy 0.4027 # 6
DT Accuracy 0.385 # 6
RF Accuracy 0.3759 # 6
Parameters(M) 9.4 # 3
Tabular Data Generation Diabetes TVAE LR Accuracy 0.5634 # 4
DT Accuracy 0.5330 # 4
RF Accuracy 0.5517 # 4
Parameters(M) 0.359 # 1
Tabular Data Generation HELOC TVAE LR Accuracy 71.04 # 3
DT Accuracy 76.39 # 3
RF Accuracy 77.24 # 3
Parameters(M) 62 # 4
Tabular Data Generation HELOC CopulaGAN LR Accuracy 42.03 # 6
DT Accuracy 42.36 # 6
RF Accuracy 42.35 # 6
Parameters(M) 0.276 # 1
Tabular Data Generation HELOC CTGAN LR Accuracy 57.72 # 5
DT Accuracy 61.34 # 5
RF Accuracy 62.35 # 5
Parameters(M) 0.277 # 2
Tabular Data Generation SICK TVAE LR Accuracy 94.7 # 4
DT Accuracy 95.39 # 3
RF Accuracy 94.91 # 4
Parameters(M) 0.046 # 1
Tabular Data Generation SICK CopulaGAN LR Accuracy 94.57 # 5
DT Accuracy 93.77 # 5
RF Accuracy 94.57 # 5
Parameters(M) 0.226 # 3
Tabular Data Generation SICK CTGAN LR Accuracy 94.44 # 6
DT Accuracy 92.05 # 6
RF Accuracy 94.57 # 5
Parameters(M) 0.222 # 2
Tabular Data Generation Travel TVAE LR Accuracy 79.58 # 3
DT Accuracy 81.68 # 3
RF Accuracy 81.68 # 3
Parameters(M) 0.036 # 1
Tabular Data Generation Travel CTGAN LR Accuracy 73.3 # 5
DT Accuracy 73.3 # 6
RF Accuracy 71.41 # 6
Parameters(M) 0.155 # 2
Tabular Data Generation Travel CopulaGAN LR Accuracy 73.3 # 5
DT Accuracy 73.61 # 5
RF Accuracy 73.3 # 5
Parameters(M) 0.157 # 3

Methods


No methods listed for this paper. Add relevant methods here