1 code implementation • 8 Sep 2023 • Saksham Bassi, Giulio Duregon, Siddhartha Jalagam, David Roth
In light of recent successes in large scale audio pretraining, we revisit the performance comparison between two-stage and end-to-end model and find that audio based language models pretrained using weak self-supervised objectives match or exceed the performance of similarly trained two-stage models, and further, that the choice of pretraining objective substantially effects a model's ability to be adapted to the disfluency removal task.