The design and construction of reference pangenome graphs
The recent advances in sequencing technologies enables the assembly of individual genomes to the reference quality. How to integrate multiple genomes from the same species and to make the integrated representation accessible to biologists remain an open challenge. Here we propose a graph-based data model and associated formats to represent multiple genomes while preserving the coordinate of the linear reference genome. We implemented our ideas in the minigraph toolkit and demonstrate that we can efficiently construct a pangenome graph and compactly encode tens of thousands of structural variants missing from the current reference genome.
PDF Abstract