Search Results for author: Rita Singh

Found 47 papers, 18 papers with code

$\text{R}^2$-Bench: Benchmarking the Robustness of Referring Perception Models under Perturbations

2 code implementations7 Mar 2024 Xiang Li, Kai Qiu, Jinglu Wang, Xiaohao Xu, Rita Singh, Kashu Yamazak, Hao Chen, Xiaonan Huang, Bhiksha Raj

Referring perception, which aims at grounding visual objects with multimodal referring guidance, is essential for bridging the gap between humans, who provide instructions, and the environment where intelligent systems perceive.


A General Framework for Learning from Weak Supervision

1 code implementation2 Feb 2024 Hao Chen, Jindong Wang, Lei Feng, Xiang Li, Yidong Wang, Xing Xie, Masashi Sugiyama, Rita Singh, Bhiksha Raj

Weakly supervised learning generally faces challenges in applicability to various scenarios with diverse weak supervision and in scalability due to the complexity of existing algorithms, thereby hindering the practical deployment.

Weakly-supervised Learning

PAM: Prompting Audio-Language Models for Audio Quality Assessment

1 code implementation1 Feb 2024 Soham Deshmukh, Dareen Alharthi, Benjamin Elizalde, Hannes Gamper, Mahmoud Al Ismail, Rita Singh, Bhiksha Raj, Huaming Wang

Here, we exploit this capability and introduce PAM, a no-reference metric for assessing audio quality for different audio processing tasks.

Music Generation Text-to-Music Generation

Token Prediction as Implicit Classification to Identify LLM-Generated Text

1 code implementation15 Nov 2023 Yutian Chen, Hao Kang, Vivian Zhai, Liangze Li, Rita Singh, Bhiksha Raj

This paper introduces a novel approach for identifying the possible large language models (LLMs) involved in text generation.

text-classification Text Classification +1

Prompting Audios Using Acoustic Properties For Emotion Representation

no code implementations3 Oct 2023 Hira Dhamyal, Benjamin Elizalde, Soham Deshmukh, Huaming Wang, Bhiksha Raj, Rita Singh

In this work, we address the challenge of automatically generating these prompts and training a model to better learn emotion representations from audio and prompt pairs.

Contrastive Learning Retrieval +1

Completing Visual Objects via Bridging Generation and Segmentation

no code implementations1 Oct 2023 Xiang Li, Yinpeng Chen, Chung-Ching Lin, Hao Chen, Kai Hu, Rita Singh, Bhiksha Raj, Lijuan Wang, Zicheng Liu

This paper presents a novel approach to object completion, with the primary goal of reconstructing a complete object from its partially visible components.

Image Generation Object +1

Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech

1 code implementation1 Oct 2023 Dareen Alharthi, Roshan Sharma, Hira Dhamyal, Soumi Maiti, Bhiksha Raj, Rita Singh

In this paper, we propose an evaluation technique involving the training of an ASR model on synthetic speech and assessing its performance on real speech.

speech-recognition Speech Recognition +1

QDFormer: Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition

3 code implementations CVPR 2024 Xiang Li, Jinglu Wang, Xiaohao Xu, Xiulian Peng, Rita Singh, Yan Lu, Bhiksha Raj

We propose a semantic decomposition method based on product quantization, where the multi-source semantics can be decomposed and represented by several disentangled and noise-suppressed single-source semantics.


Importance of negative sampling in weak label learning

no code implementations23 Sep 2023 Ankit Shah, Fuyu Tang, Zelin Ye, Rita Singh, Bhiksha Raj

Weak-label learning is a challenging task that requires learning from data "bags" containing positive and negative instances, but only the bag labels are known.

Rethinking Voice-Face Correlation: A Geometry View

no code implementations26 Jul 2023 Xiang Li, Yandong Wen, Muqiao Yang, Jinglu Wang, Rita Singh, Bhiksha Raj

Previous works on voice-face matching and voice-guided face synthesis demonstrate strong correlations between voice and face, but mainly rely on coarse semantic cues such as gender, age, and emotion.

3D Face Reconstruction Face Generation

BASS: Block-wise Adaptation for Speech Summarization

no code implementations17 Jul 2023 Roshan Sharma, Kenneth Zheng, Siddhant Arora, Shinji Watanabe, Rita Singh, Bhiksha Raj

End-to-end speech summarization has been shown to improve performance over cascade baselines.

Pengi: An Audio Language Model for Audio Tasks

1 code implementation NeurIPS 2023 Soham Deshmukh, Benjamin Elizalde, Rita Singh, Huaming Wang

We introduce Pengi, a novel Audio Language Model that leverages Transfer Learning by framing all audio tasks as text-generation tasks.

Audio captioning Audio Question Answering +6

GPT-Sentinel: Distinguishing Human and ChatGPT Generated Content

2 code implementations13 May 2023 Yutian Chen, Hao Kang, Vivian Zhai, Liangze Li, Rita Singh, Bhiksha Raj

This paper presents a novel approach for detecting ChatGPT-generated vs. human-written text using language models.

text-classification Text Classification

Describing emotions with acoustic property prompts for speech emotion recognition

no code implementations14 Nov 2022 Hira Dhamyal, Benjamin Elizalde, Soham Deshmukh, Huaming Wang, Bhiksha Raj, Rita Singh

We investigate how the model can learn to associate the audio with the descriptions, resulting in performance improvement of Speech Emotion Recognition and Speech Audio Retrieval.

Retrieval Speech Emotion Recognition

Self-supervision and Learnable STRFs for Age, Emotion, and Country Prediction

no code implementations25 Jun 2022 Roshan Sharma, Tyler Vuong, Mark Lindsey, Hira Dhamyal, Rita Singh, Bhiksha Raj

This work presents a multitask approach to the simultaneous estimation of age, country of origin, and emotion given vocal burst audio for the 2022 ICML Expressive Vocalizations Challenge ExVo-MultiTask track.


An Overview of Techniques for Biomarker Discovery in Voice Signal

no code implementations10 Oct 2021 Rita Singh, Ankit Shah, Hira Dhamyal

This paper reflects on the effect of several categories of medical conditions on human voice, focusing on those that may be hypothesized to have effects on voice, but for which the changes themselves may be subtle enough to have eluded observation in standard analytical examinations of the voice signal.

Self-Supervised 3D Face Reconstruction via Conditional Estimation

no code implementations ICCV 2021 Yandong Wen, Weiyang Liu, Bhiksha Raj, Rita Singh

We present a conditional estimation (CEST) framework to learn 3D facial parameters from 2D single-view images by self-supervised training from videos.

3D Face Reconstruction Disentanglement

SphereFace Revived: Unifying Hyperspherical Face Recognition

1 code implementation12 Sep 2021 Weiyang Liu, Yandong Wen, Bhiksha Raj, Rita Singh, Adrian Weller

As one of the earliest works in hyperspherical face recognition, SphereFace explicitly proposed to learn face embeddings with large inter-class angular margin.

Face Recognition

SphereFace2: Binary Classification is All You Need for Deep Face Recognition

no code implementations ICLR 2022 Yandong Wen, Weiyang Liu, Adrian Weller, Bhiksha Raj, Rita Singh

In this paper, we start by identifying the discrepancy between training and evaluation in the existing multi-class classification framework and then discuss the potential limitations caused by the "competitive" nature of softmax normalization.

Binary Classification Classification +2

Controlled AutoEncoders to Generate Faces from Voices

no code implementations16 Jul 2021 Hao Liang, Lulan Yu, Guikang Xu, Bhiksha Raj, Rita Singh

With this in perspective, we propose a framework to morph a target face in response to a given voice in a way that facial features are implicitly guided by learned voice-face correlation in this paper.

MORPH Retrieval

Improving weakly supervised sound event detection with self-supervised auxiliary tasks

1 code implementation12 Jun 2021 Soham Deshmukh, Bhiksha Raj, Rita Singh

To that extent, we propose a shared encoder architecture with sound event detection as a primary task and an additional secondary decoder for a self-supervised auxiliary task.

Decoder Event Detection +3

Interpreting glottal flow dynamics for detecting COVID-19 from voice

no code implementations29 Oct 2020 Soham Deshmukh, Mahmoud Al Ismail, Rita Singh

In the pathogenesis of COVID-19, impairment of respiratory functions is often one of the key symptoms.

Detection of COVID-19 through the analysis of vocal fold oscillations

no code implementations21 Oct 2020 Mahmoud Al Ismail, Soham Deshmukh, Rita Singh

Phonation, or the vibration of the vocal folds, is the primary source of vocalization in the production of voiced sounds by humans.

Multi-Task Learning for Interpretable Weakly Labelled Sound Event Detection

1 code implementation17 Aug 2020 Soham Deshmukh, Bhiksha Raj, Rita Singh

Weakly Labelled learning has garnered lot of attention in recent years due to its potential to scale Sound Event Detection (SED) and is formulated as Multiple Instance Learning (MIL) problem.

Event Detection Multiple Instance Learning +3

Face Reconstruction from Voice using Generative Adversarial Networks

1 code implementation NeurIPS 2019 Yandong Wen, Bhiksha Raj, Rita Singh

The network learns to generate faces from voices by matching the identities of generated faces to those of the speakers, on a training set.

Face Reconstruction

The phonetic bases of vocal expressed emotion: natural versus acted

no code implementations13 Nov 2019 Hira Dhamyal, Shahan Ali Memon, Bhiksha Raj, Rita Singh

Our tests show significant differences in the manner and choice of phonemes in acted and natural speech, concluding moderate to low validity and value in using acted speech databases for emotion classification tasks.

Emotion Classification General Classification +1

Detecting gender differences in perception of emotion in crowdsourced data

no code implementations24 Oct 2019 Shahan Ali Memon, Hira Dhamyal, Oren Wright, Daniel Justice, Vijaykumar Palat, William Boler, Bhiksha Raj, Rita Singh

While we limit ourselves to a single modality (i. e. speech), our framework is applicable to studies of emotion perception from all such loosely annotated data in general.

Non-Determinism in Neural Networks for Adversarial Robustness

no code implementations26 May 2019 Daanish Ali Khan, Linhong Li, Ninghao Sha, Zhuoran Liu, Abelino Jimenez, Bhiksha Raj, Rita Singh

Recent breakthroughs in the field of deep learning have led to advancements in a broad spectrum of tasks in computer vision, audio processing, natural language processing and other areas.

Adversarial Robustness

Reconstructing faces from voices

1 code implementation25 May 2019 Yandong Wen, Rita Singh, Bhiksha Raj

Voice profiling aims at inferring various human parameters from their speech, e. g. gender, age, etc.

Hierarchical Routing Mixture of Experts

no code implementations18 Mar 2019 Wenbo Zhao, Yang Gao, Shahan Ali Memon, Bhiksha Raj, Rita Singh

Addressing these problems, we propose a binary tree-structured hierarchical routing mixture of experts (HRME) model that has classifiers as non-leaf node experts and simple regression models as leaf node experts.


Hide and Speak: Towards Deep Neural Networks for Speech Steganography

1 code implementation7 Feb 2019 Felix Kreuk, Yossi Adi, Bhiksha Raj, Rita Singh, Joseph Keshet

Steganography is the science of hiding a secret message within an ordinary public message, which is referred to as Carrier.


Neural Regression Trees

no code implementations1 Oct 2018 Shahan Ali Memon, Wenbo Zhao, Bhiksha Raj, Rita Singh

Regression-via-Classification (RvC) is the process of converting a regression problem to a classification one.

Classification General Classification +1

Neural Regression Tree

no code implementations27 Sep 2018 Wenbo Zhao, Shahan Ali Memon, Bhiksha Raj, Rita Singh

Regression-via-Classification (RvC) is the process of converting a regression problem to a classification one.

Classification regression

Disjoint Mapping Network for Cross-modal Matching of Voices and Faces

no code implementations ICLR 2019 Yandong Wen, Mahmoud Al Ismail, Weiyang Liu, Bhiksha Raj, Rita Singh

We propose a novel framework, called Disjoint Mapping Network (DIMNet), for cross-modal biometric matching, in particular of voices and faces.

Optimal Strategies for Matching and Retrieval Problems by Comparing Covariates

no code implementations12 Jul 2018 Yandong Wen, Mahmoud Al Ismail, Bhiksha Raj, Rita Singh

In many retrieval problems, where we must retrieve one or more entries from a gallery in response to a probe, it is common practice to learn to do by directly comparing the probe and gallery entries to one another.


Voice Impersonation using Generative Adversarial Networks

no code implementations19 Feb 2018 Yang Gao, Rita Singh, Bhiksha Raj

In voice impersonation, the resultant voice must convincingly convey the impression of having been naturally produced by the target speaker, mimicking not only the pitch and other perceivable signal qualities, but also the style of the target speaker.

Sound Audio and Speech Processing

Speaker identification from the sound of the human breath

no code implementations1 Dec 2017 Wenbo Zhao, Yang Gao, Rita Singh

The goal of this paper is to demonstrate that breath sounds are indeed bio-signatures that can be used to identify speakers.

Speaker Identification Speaker Recognition

Content-based Video Indexing and Retrieval Using Corr-LDA

no code implementations27 Feb 2016 Rahul Radhakrishnan Iyer, Sanjeel Parekh, Vikas Mohandoss, Anush Ramsurat, Bhiksha Raj, Rita Singh

Existing video indexing and retrieval methods on popular web-based multimedia sharing websites are based on user-provided sparse tagging.


Plagiarism Detection in Polyphonic Music using Monaural Signal Separation

no code implementations27 Feb 2015 Soham De, Indradyumna Roy, Tarunima Prabhakar, Kriti Suneja, Sourish Chaudhuri, Rita Singh, Bhiksha Raj

Given the large number of new musical tracks released each year, automated approaches to plagiarism detection are essential to help us track potential violations of copyright.

General Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.