Scalable Bayesian divergence time estimation with ratio transformations
Divergence time estimation is crucial to provide temporal signals for dating biologically important events, from species divergence to viral transmissions in space and time. With the advent of high-throughput sequencing, recent Bayesian phylogenetic studies have analyzed hundreds to thousands of sequences. Such large-scale analyses challenge divergence time reconstruction by requiring inference on highly-correlated internal node heights that often become computationally infeasible. To overcome this limitation, we explore a ratio transformation that maps the original N - 1 internal node heights into a space of one height parameter and N - 2 ratio parameters. To make analyses scalable, we develop a collection of linear-time algorithms to compute the gradient and Jacobian-associated terms of the log-likelihood with respect to these ratios. We then apply Hamiltonian Monte Carlo sampling with the ratio transform in a Bayesian framework to learn the divergence times in four pathogenic virus phylogenies: West Nile virus, rabies virus, Lassa virus and Ebola virus. Our method both resolves a mixing issue in the West Nile virus example and improves inference efficiency by at least 5-fold for the Lassa and rabies virus examples. Our method also makes it now computationally feasible to incorporate mixed-effects molecular clock models for the Ebola virus example, confirms the findings from the original study and reveals clearer multimodal distributions of the divergence times of some clades of interest.
PDF Abstract