1 code implementation • 1 Feb 2024 • Soham Deshmukh, Dareen Alharthi, Benjamin Elizalde, Hannes Gamper, Mahmoud Al Ismail, Rita Singh, Bhiksha Raj, Huaming Wang
Here, we exploit this capability and introduce PAM, a no-reference metric for assessing audio quality for different audio processing tasks.
no code implementations • 3 Oct 2023 • Hira Dhamyal, Benjamin Elizalde, Soham Deshmukh, Huaming Wang, Bhiksha Raj, Rita Singh
In this work, we address the challenge of automatically generating these prompts and training a model to better learn emotion representations from audio and prompt pairs.
no code implementations • 2 Oct 2023 • Muhammad Ahmed Shah, Roshan Sharma, Hira Dhamyal, Raphael Olivier, Ankit Shah, Joseph Konan, Dareen Alharthi, Hazim T Bukhari, Massa Baali, Soham Deshmukh, Michael Kuhlmann, Bhiksha Raj, Rita Singh
We hypothesize that for attacks to be transferrable, it is sufficient if the proxy can approximate the target model in the neighborhood of the harmful query.
1 code implementation • 14 Sep 2023 • Soham Deshmukh, Benjamin Elizalde, Dimitra Emmanouilidou, Bhiksha Raj, Rita Singh, Huaming Wang
During inference, the text encoder is replaced with the pretrained CLAP audio encoder.
1 code implementation • NeurIPS 2023 • Soham Deshmukh, Benjamin Elizalde, Rita Singh, Huaming Wang
We introduce Pengi, a novel Audio Language Model that leverages Transfer Learning by framing all audio tasks as text-generation tasks.
no code implementations • 20 Feb 2023 • Laurie M. Heller, Benjamin Elizalde, Bhiksha Raj, Soham Deshmukh
Machine Listening, as usually formalized, attempts to perform a task that is, from our perspective, fundamentally human-performable, and performed by humans.
no code implementations • 14 Nov 2022 • Hira Dhamyal, Benjamin Elizalde, Soham Deshmukh, Huaming Wang, Bhiksha Raj, Rita Singh
We investigate how the model can learn to associate the audio with the descriptions, resulting in performance improvement of Speech Emotion Recognition and Speech Audio Retrieval.
1 code implementation • 28 Sep 2022 • Soham Deshmukh, Benjamin Elizalde, Huaming Wang
In this work, we propose a new collection of web audio-text pairs and a new framework for retrieval.
no code implementations • 19 Aug 2022 • Soham Deshmukh, Charles Lee
Additionally, the modular nature of the proposed framework allows plug-and-play for any future developments in both pre-trained language and task-oriented dialogue models.
no code implementations • 5 Oct 2021 • Ruijie Zhou, Soham Deshmukh, Jeremiah Greer, Charles Lee
Current research in dialogue systems is focused on conversational assistants working on short conversations in either task-oriented or open domain settings.
1 code implementation • 12 Jun 2021 • Soham Deshmukh, Bhiksha Raj, Rita Singh
To that extent, we propose a shared encoder architecture with sound event detection as a primary task and an additional secondary decoder for a self-supervised auxiliary task.
no code implementations • 29 Oct 2020 • Soham Deshmukh, Mahmoud Al Ismail, Rita Singh
In the pathogenesis of COVID-19, impairment of respiratory functions is often one of the key symptoms.
no code implementations • 21 Oct 2020 • Mahmoud Al Ismail, Soham Deshmukh, Rita Singh
Phonation, or the vibration of the vocal folds, is the primary source of vocalization in the production of voiced sounds by humans.
1 code implementation • 17 Aug 2020 • Soham Deshmukh, Bhiksha Raj, Rita Singh
Weakly Labelled learning has garnered lot of attention in recent years due to its potential to scale Sound Event Detection (SED) and is formulated as Multiple Instance Learning (MIL) problem.
no code implementations • 28 May 2019 • Soham Deshmukh, Rahul Rade, Dr. Faruk Kazi
For modelling we propose a novel semi-supervised algorithm called Fusion Hidden Markov Model (FHMM) which is more robust to noise, requires comparatively less training time, and utilizes the benefits of ensemble learning to better model temporal relationships in data.