9 papers with code • 2 benchmarks • 2 datasets
LibrariesUse these libraries to find Text-to-Music Generation models and implementations
Benefiting from large-scale datasets and pre-trained models, the field of generative models has recently gained significant momentum.
Recent years have seen the rapid development of large generative models for text; however, much less research has explored the connection between text and another "language" of communication -- music.
Diffusion models have shown promising results in cross-modal generation tasks, including text-to-image and text-to-audio generation.
Any audio can be translated into LOA based on AudioMAE, a self-supervised pre-trained representation learning model.
Music Understanding LLaMA: Advancing Text-to-Music Generation with Question Answering and Captioning
To fill this gap, we present a methodology for generating question-answer pairs from existing audio captioning datasets and introduce the MusicQA Dataset designed for answering open-ended music-related questions.
With recent advancements in text-to-audio and text-to-music based on latent diffusion models, the quality of generated content has been reaching new heights.
We introduce the Song Describer dataset (SDD), a new crowdsourced corpus of high-quality audio-caption pairs, designed for the evaluation of music-and-language models.