Genome-AC-GAN: Enhancing Synthetic Genotype Generation through Auxiliary Classification

None 2024  ·  Shaked Ahronoviz, Ilan Gronau ·

In recent years, there have been increasing attempts to develop computational methods for generating synthetic genomic data that aim to mimic real genomic datasets. Artificial genomes (AGs) generated by these methods have emerged as a promising potential solution for privacy concerns raised by public genomic datasets and as means to provide adequate representation of under-sampled populations. However, existing methods for generating AGs provide a very limited capability for faithfully capturing features of different sub-populations within a larger cohort. In this study, we propose a novel method called the Genome Auxiliary Classifier Generative Adversarial Network (Genome-AC-GAN), which generates AGs tailored to specific sub-populations. We conducted experiments to evaluate the performance of the Genome-AC-GAN and compare the AGs it generates with real genomic data as well as with AGs generated by previously published methods. The Genome-AC-GAN outperforms other methods and faithfully models population structure, which is not adequately captured by existing methods. We also demonstrate the use of AGs generated by the Genome-AC-GAN in augmentation of datasets used as training sets for classifying genomes into populations. These experiments demonstrate the benefits of AGs in enhancing classification accuracy, especially when dealing with under-sampled and closely related populations.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods