1 code implementation • 29 Nov 2024 • Julian D Parker, Anton Smirnov, Jordi Pons, CJ Carr, Zack Zukowski, Zach Evans, Xubo Liu
The tokenization of speech with neural audio codec models is a vital part of modern AI pipelines for the generation or understanding of speech, alone or in a multimodal context.
1 code implementation • 19 Jul 2024 • Zach Evans, Julian D. Parker, CJ Carr, Zack Zukowski, Josiah Taylor, Jordi Pons
Open generative models are vitally important for the community, allowing for fine-tunes and serving as baselines when presenting new models.
Ranked #3 on
Audio Generation
on AudioCaps
1 code implementation • 16 Apr 2024 • Zach Evans, Julian D. Parker, CJ Carr, Zack Zukowski, Josiah Taylor, Jordi Pons
Audio-based generative models for music have seen great strides recently, but so far have not managed to produce full-length music tracks with coherent musical structure from text prompts.
Ranked #7 on
Audio Generation
on AudioCaps
2 code implementations • 7 Feb 2024 • Zach Evans, CJ Carr, Josiah Taylor, Scott H. Hawley, Jordi Pons
Generating long-form 44. 1kHz stereo audio from text prompts can be computationally demanding.
Ranked #1 on
Text-to-Music Generation
on MusicCaps
(KL_passt metric)