Search Results for author: Soham Deshmukh

Found 15 papers, 6 papers with code

Pengi: An Audio Language Model for Audio Tasks

1 code implementation NeurIPS 2023 Soham Deshmukh, Benjamin Elizalde, Rita Singh, Huaming Wang

We introduce Pengi, a novel Audio Language Model that leverages Transfer Learning by framing all audio tasks as text-generation tasks.

Audio captioning Audio Question Answering +6

Audio Retrieval with WavText5K and CLAP Training

1 code implementation28 Sep 2022 Soham Deshmukh, Benjamin Elizalde, Huaming Wang

In this work, we propose a new collection of web audio-text pairs and a new framework for retrieval.

AudioCaps Audio captioning +3

PAM: Prompting Audio-Language Models for Audio Quality Assessment

1 code implementation1 Feb 2024 Soham Deshmukh, Dareen Alharthi, Benjamin Elizalde, Hannes Gamper, Mahmoud Al Ismail, Rita Singh, Bhiksha Raj, Huaming Wang

Here, we exploit this capability and introduce PAM, a no-reference metric for assessing audio quality for different audio processing tasks.

Music Generation Text-to-Music Generation

Multi-Task Learning for Interpretable Weakly Labelled Sound Event Detection

1 code implementation17 Aug 2020 Soham Deshmukh, Bhiksha Raj, Rita Singh

Weakly Labelled learning has garnered lot of attention in recent years due to its potential to scale Sound Event Detection (SED) and is formulated as Multiple Instance Learning (MIL) problem.

Event Detection Multiple Instance Learning +3

Improving weakly supervised sound event detection with self-supervised auxiliary tasks

1 code implementation12 Jun 2021 Soham Deshmukh, Bhiksha Raj, Rita Singh

To that extent, we propose a shared encoder architecture with sound event detection as a primary task and an additional secondary decoder for a self-supervised auxiliary task.

Event Detection Sound Event Detection +2

Attacker Behaviour Profiling using Stochastic Ensemble of Hidden Markov Models

no code implementations28 May 2019 Soham Deshmukh, Rahul Rade, Dr. Faruk Kazi

For modelling we propose a novel semi-supervised algorithm called Fusion Hidden Markov Model (FHMM) which is more robust to noise, requires comparatively less training time, and utilizes the benefits of ensemble learning to better model temporal relationships in data.

Ensemble Learning Intrusion Detection

Detection of COVID-19 through the analysis of vocal fold oscillations

no code implementations21 Oct 2020 Mahmoud Al Ismail, Soham Deshmukh, Rita Singh

Phonation, or the vibration of the vocal folds, is the primary source of vocalization in the production of voiced sounds by humans.

Interpreting glottal flow dynamics for detecting COVID-19 from voice

no code implementations29 Oct 2020 Soham Deshmukh, Mahmoud Al Ismail, Rita Singh

In the pathogenesis of COVID-19, impairment of respiratory functions is often one of the key symptoms.

NaRLE: Natural Language Models using Reinforcement Learning with Emotion Feedback

no code implementations5 Oct 2021 Ruijie Zhou, Soham Deshmukh, Jeremiah Greer, Charles Lee

Current research in dialogue systems is focused on conversational assistants working on short conversations in either task-oriented or open domain settings.

intent-classification Intent Classification +3

Adapting Task-Oriented Dialogue Models for Email Conversations

no code implementations19 Aug 2022 Soham Deshmukh, Charles Lee

Additionally, the modular nature of the proposed framework allows plug-and-play for any future developments in both pre-trained language and task-oriented dialogue models.

Intent Detection Natural Language Understanding +1

Describing emotions with acoustic property prompts for speech emotion recognition

no code implementations14 Nov 2022 Hira Dhamyal, Benjamin Elizalde, Soham Deshmukh, Huaming Wang, Bhiksha Raj, Rita Singh

We investigate how the model can learn to associate the audio with the descriptions, resulting in performance improvement of Speech Emotion Recognition and Speech Audio Retrieval.

Retrieval Speech Emotion Recognition

Synergy between human and machine approaches to sound/scene recognition and processing: An overview of ICASSP special session

no code implementations20 Feb 2023 Laurie M. Heller, Benjamin Elizalde, Bhiksha Raj, Soham Deshmukh

Machine Listening, as usually formalized, attempts to perform a task that is, from our perspective, fundamentally human-performable, and performed by humans.

Scene Recognition

Prompting Audios Using Acoustic Properties For Emotion Representation

no code implementations3 Oct 2023 Hira Dhamyal, Benjamin Elizalde, Soham Deshmukh, Huaming Wang, Bhiksha Raj, Rita Singh

In this work, we address the challenge of automatically generating these prompts and training a model to better learn emotion representations from audio and prompt pairs.

Contrastive Learning Retrieval +1

Cannot find the paper you are looking for? You can Submit a new open access paper.