no code implementations • 21 Mar 2024 • Swapnil Bhosale, Haosen Yang, Diptesh Kanojia, Jiangkang Deng, Xiatian Zhu
Audio-Visual Segmentation (AVS) aims to identify, at the pixel level, the object in a visual scene that produces a given sound.
no code implementations • 29 Sep 2023 • Swapnil Bhosale, Abhra Chaudhuri, Alex Lee Robert Williams, Divyank Tiwari, Anjan Dutta, Xiatian Zhu, Pushpak Bhattacharyya, Diptesh Kanojia
The introduction of the MUStARD dataset, and its emotion recognition extension MUStARD++, have identified sarcasm to be a multi-modal phenomenon -- expressed not only in natural language text, but also through manners of speech (like tonality and intonation) and visual cues (facial expression).
no code implementations • 13 Sep 2023 • Swapnil Bhosale, Haosen Yang, Diptesh Kanojia, Xiatian Zhu
Particularly, in situations where existing supervised AVS methods struggle with overlapping foreground objects, our models still excel in accurately segmenting overlapped auditory objects.
no code implementations • 14 Aug 2023 • Swapnil Bhosale, Sauradip Nag, Diptesh Kanojia, Jiankang Deng, Xiatian Zhu
In this work, we reformulate the SED problem by taking a generative learning perspective.
no code implementations • 3 Oct 2022 • Swapnil Bhosale, Rupayan Chakraborty, Sunil Kumar Kopparapu
Automatic Audio Captioning (AAC) refers to the task of translating an audio sample into a natural language (NL) text that describes the audio events, source of the events and their relationships.
no code implementations • 28 Jan 2022 • Swapnil Bhosale, Rupayan Chakraborty, Sunil Kumar Kopparapu
Automatic Audio Captioning (AAC) refers to the task of translating audio into a natural language that describes the audio events, source of the events and their relationships.
no code implementations • 10 Mar 2021 • Ayush Tripathi, Swapnil Bhosale, Sunil Kumar Kopparapu
Dysarthria is a condition which hampers the ability of an individual to control the muscles that play a major role in speech delivery.
no code implementations • 16 Feb 2021 • Swapnil Bhosale, Rupayan Chakraborty, Sunil Kumar Kopparapu
In this paper, we propose to replace the typical prototypical loss function with an Episodic Triplet Mining (ETM) technique.