Search Results for author: Ariel Shaulov

Found 1 papers, 0 papers with code

Zero-Shot Audio Captioning via Audibility Guidance

no code implementations7 Sep 2023 Tal Shaharabany, Ariel Shaulov, Lior Wolf

Instead, captioning occurs as an inference process that involves three networks that correspond to the three desired qualities: (i) A Large Language Model, in our case, for reasons of convenience, GPT-2, (ii) A model that provides a matching score between an audio file and a text, for which we use a multimodal matching network called ImageBind, and (iii) A text classifier, trained using a dataset we collected automatically by instructing GPT-4 with prompts designed to direct the generation of both audible and inaudible sentences.

Zero-shot Audio Captioning

Cannot find the paper you are looking for? You can Submit a new open access paper.