Genome-wide nucleotide-resolution model of single-strand break site reveals species evolutionary hierarchy

21 Aug 2022  ·  Sheng Xu, Junkang Wei, Yu Li ·

Single-strand breaks (SSBs) are the major DNA damage in the genome arising spontaneously as the outcome of genotoxins and intermediates of DNA transactions. SSBs play a crucial role in various biological processes and show a non-random distribution in the genome. Several SSB detection approaches such as S1 END-seq and SSiNGLe-ILM emerged to characterize the genomic landscape of SSB with nucleotide resolution. However, these sequencing-based methods are costly and unfeasible for large-scale analysis of diverse species. Thus, we proposed the first computational approach, SSBlazer, which is an explainable and scalable deep learning framework for genome-wide nucleotide-resolution SSB site prediction. We demonstrated that SSBlazer can accurately predict SSB sites and sufficiently alleviate false positives by constructing an imbalanced dataset to simulate the realistic SSB distribution. The model interpretation analysis reveals that SSBlazer captures the pattern of individual CpG in genomic context and the motif of TGCC in the center region as critical features. Besides, SSBlazer is a lightweight model with robust cross-species generalization ability in the cross-species evaluation, which enables the large-scale genome-wide application in diverse species. Strikingly, the putative SSB genomic landscapes of 216 vertebrates reveal a negative correlation between SSB frequency and evolutionary hierarchy, suggesting that the genome tends to be integrity during evolution.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here