Search Results for author: Swapnil Bhosale

Found 8 papers, 0 papers with code

Unsupervised Audio-Visual Segmentation with Modality Alignment

no code implementations21 Mar 2024 Swapnil Bhosale, Haosen Yang, Diptesh Kanojia, Jiangkang Deng, Xiatian Zhu

Audio-Visual Segmentation (AVS) aims to identify, at the pixel level, the object in a visual scene that produces a given sound.

Contrastive Learning

Sarcasm in Sight and Sound: Benchmarking and Expansion to Improve Multimodal Sarcasm Detection

no code implementations29 Sep 2023 Swapnil Bhosale, Abhra Chaudhuri, Alex Lee Robert Williams, Divyank Tiwari, Anjan Dutta, Xiatian Zhu, Pushpak Bhattacharyya, Diptesh Kanojia

The introduction of the MUStARD dataset, and its emotion recognition extension MUStARD++, have identified sarcasm to be a multi-modal phenomenon -- expressed not only in natural language text, but also through manners of speech (like tonality and intonation) and visual cues (facial expression).

Benchmarking Emotion Recognition +1

Leveraging Foundation models for Unsupervised Audio-Visual Segmentation

no code implementations13 Sep 2023 Swapnil Bhosale, Haosen Yang, Diptesh Kanojia, Xiatian Zhu

Particularly, in situations where existing supervised AVS methods struggle with overlapping foreground objects, our models still excel in accurately segmenting overlapped auditory objects.

Segmentation

Text-to-Audio Grounding Based Novel Metric for Evaluating Audio Caption Similarity

no code implementations3 Oct 2022 Swapnil Bhosale, Rupayan Chakraborty, Sunil Kumar Kopparapu

Automatic Audio Captioning (AAC) refers to the task of translating an audio sample into a natural language (NL) text that describes the audio events, source of the events and their relationships.

Audio captioning Image Captioning +2

Automatic Audio Captioning using Attention weighted Event based Embeddings

no code implementations28 Jan 2022 Swapnil Bhosale, Rupayan Chakraborty, Sunil Kumar Kopparapu

Automatic Audio Captioning (AAC) refers to the task of translating audio into a natural language that describes the audio events, source of the events and their relationships.

Audio captioning Event Detection +1

Automatic Speaker Independent Dysarthric Speech Intelligibility Assessment System

no code implementations10 Mar 2021 Ayush Tripathi, Swapnil Bhosale, Sunil Kumar Kopparapu

Dysarthria is a condition which hampers the ability of an individual to control the muscles that play a major role in speech delivery.

Semi Supervised Learning For Few-shot Audio Classification By Episodic Triplet Mining

no code implementations16 Feb 2021 Swapnil Bhosale, Rupayan Chakraborty, Sunil Kumar Kopparapu

In this paper, we propose to replace the typical prototypical loss function with an Episodic Triplet Mining (ETM) technique.

Audio Classification Event Detection +4

Cannot find the paper you are looking for? You can Submit a new open access paper.