1 code implementation • 14 Nov 2023 • Leonard Salewski, Stefan Fauth, A. Sophia Koepke, Zeynep Akata
In particular, our framework exploits a pre-trained large language model (LLM) for generating the text which is guided by a pre-trained audio-language model to produce captions that describe the audio content.
Ranked #1 on Zero-shot Audio Captioning on Clotho