Search Results for author: Ariel Shaulov

Found 1 papers, 0 papers with code

Zero-Shot Audio Captioning via Audibility Guidance

no code implementations • 7 Sep 2023 • Tal Shaharabany, Ariel Shaulov, Lior Wolf

Instead, captioning occurs as an inference process that involves three networks that correspond to the three desired qualities: (i) A Large Language Model, in our case, for reasons of convenience, GPT-2, (ii) A model that provides a matching score between an audio file and a text, for which we use a multimodal matching network called ImageBind, and (iii) A text classifier, trained using a dataset we collected automatically by instructing GPT-4 with prompts designed to direct the generation of both audible and inaudible sentences.

Ranked #2 on Zero-shot Audio Captioning on AudioCaps

Zero-shot Audio Captioning

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.