no code implementations • 25 Jun 2022 • Roshan Sharma, Tyler Vuong, Mark Lindsey, Hira Dhamyal, Rita Singh, Bhiksha Raj
This work presents a multitask approach to the simultaneous estimation of age, country of origin, and emotion given vocal burst audio for the 2022 ICML Expressive Vocalizations Challenge ExVo-MultiTask track.
no code implementations • 23 Jun 2022 • Francisco Teixeira, Alberto Abad, Bhiksha Raj, Isabel Trancoso
This poses two important issues: first, knowledge of the speaker embedding extraction model may create security and robustness liabilities for the authentication system, as this knowledge might help attackers in crafting adversarial examples able to mislead the system; second, from the point of view of a service provider the speaker embedding extraction model is arguably one of the most valuable components in the system and, as such, disclosing it would be highly undesirable.
no code implementations • 18 Jun 2022 • Chonghan Chen, Qi Jiang, Chih-Hao Wang, Noel Chen, Haohan Wang, Xiang Li, Bhiksha Raj
With our proposed QCM, the downstream fusion module receives visual features that are more discriminative and focused on the desired object described in the expression, leading to more accurate predictions.
1 code implementation • 15 May 2022 • Yidong Wang, Hao Chen, Qiang Heng, Wenxin Hou, Yue Fan, Zhen Wu, Jindong Wang, Marios Savvides, Takahiro Shinozaki, Bhiksha Raj, Bernt Schiele, Xing Xie
Based on the analysis, we hence propose FreeMatch to define and adjust the confidence threshold in a self-adaptive manner according to the model's learning status.
no code implementations • 11 Apr 2022 • Ankit Shah, Hira Dhamyal, Yang Gao, Rita Singh, Bhiksha Raj
Lately, there has been a global effort by multiple research groups to detect COVID-19 from voice.
1 code implementation • 29 Mar 2022 • Raphael Olivier, Bhiksha Raj
Like many other tasks involving neural networks, Speech Recognition models are vulnerable to adversarial attacks.
no code implementations • 20 Mar 2022 • Shentong Mo, Jingfei Xia, Xiaoqing Tan, Bhiksha Raj
Our Point3D consists of a Point Head for action localization and a 3D Head for action classification.
1 code implementation • 6 Mar 2022 • Joseph Turian, Jordie Shier, Humair Raj Khan, Bhiksha Raj, Björn W. Schuller, Christian J. Steinmetz, Colin Malloy, George Tzanetakis, Gissel Velarde, Kirk McNally, Max Henry, Nicolas Pinto, Camille Noufi, Christian Clough, Dorien Herremans, Eduardo Fonseca, Jesse Engel, Justin Salamon, Philippe Esling, Pranay Manocha, Shinji Watanabe, Zeyu Jin, Yonatan Bisk
The aim of the HEAR benchmark is to develop a general-purpose audio representation that provides a strong basis for learning in a wide variety of tasks and scenarios.
no code implementations • 4 Mar 2022 • Larry Tang, Po Hao Chou, Yi Yu Zheng, Ziqian Ge, Ankit Shah, Bhiksha Raj
We find that the baseline Siamese does not perform better by incorporating ontology information in the weak and multi-label scenario, but that the GCN does capture the ontology knowledge better for weak, multi-labeled data.
1 code implementation • EMNLP 2021 • Raphael Olivier, Bhiksha Raj
We apply adaptive versions of state-of-the-art attacks, such as the Imperceptible ASR attack, to our model, and show that our strongest defense is robust to all attacks that use inaudible noise, and can only be broken with very high distortion.
no code implementations • ICCV 2021 • Yandong Wen, Weiyang Liu, Bhiksha Raj, Rita Singh
We present a conditional estimation (CEST) framework to learn 3D facial parameters from 2D single-view images by self-supervised training from videos.
no code implementations • 12 Sep 2021 • Weiyang Liu, Yandong Wen, Bhiksha Raj, Rita Singh, Adrian Weller
As one of the earliest works in hyperspherical face recognition, SphereFace explicitly proposed to learn face embeddings with large inter-class angular margin.
1 code implementation • ICCV 2021 • Thanh-Dat Truong, Chi Nhan Duong, The De Vu, Hoang Anh Pham, Bhiksha Raj, Ngan Le, Khoa Luu
Therefore, this work introduces a new Audio-Visual Transformer approach to the problem of localization and highlighting the main speaker in both audio and visual channels of a multi-speaker conversation video in the wild.
no code implementations • ICLR 2022 • Yandong Wen, Weiyang Liu, Adrian Weller, Bhiksha Raj, Rita Singh
In this paper, we start by identifying the discrepancy between training and evaluation in the existing multi-class classification framework and then discuss the potential limitations caused by the "competitive" nature of softmax normalization.
no code implementations • 16 Jul 2021 • Hao Liang, Lulan Yu, Guikang Xu, Bhiksha Raj, Rita Singh
With this in perspective, we propose a framework to morph a target face in response to a given voice in a way that facial features are implicitly guided by learned voice-face correlation in this paper.
1 code implementation • 12 Jun 2021 • Soham Deshmukh, Bhiksha Raj, Rita Singh
To that extent, we propose a shared encoder architecture with sound event detection as a primary task and an additional secondary decoder for a self-supervised auxiliary task.
1 code implementation • 19 Mar 2021 • Anxiang Zhang, Ankit Shah, Bhiksha Raj
Thus, this paper introduces a novel semi-weak label learning paradigm as a middle ground to mitigate the problem.
1 code implementation • 15 Mar 2021 • Bronya Roni Chernyak, Bhiksha Raj, Tamir Hazan, Joseph Keshet
This paper proposes an attack-independent (non-adversarial training) technique for improving adversarial robustness of neural network models, with minimal loss of standard accuracy.
no code implementations • ICCV 2021 • Kai Hu, Jie Shao, YuAn Liu, Bhiksha Raj, Marios Savvides, Zhiqiang Shen
To address this, we present a contrast-and-order representation (CORP) framework for learning self-supervised video representations that can automatically capture both the appearance information within each frame and temporal information across different frames.
1 code implementation • NeurIPS 2020 • Jie Shao, Kai Hu, Changhu Wang, xiangyang xue, Bhiksha Raj
In this paper, we study what would happen when normalization layers are removed from the network, and show how to train deep neural networks without normalization layers and without performance degradation.
2 code implementations • 17 Nov 2020 • Ali Shahin Shamsabadi, Francisco Sepúlveda Teixeira, Alberto Abad, Bhiksha Raj, Andrea Cavallaro, Isabel Trancoso
Speaker identification models are vulnerable to carefully designed adversarial perturbations of their input signals that induce misclassification.
1 code implementation • 9 Nov 2020 • Jiachen Lian, Aiswarya Vinod Kumar, Hira Dhamyal, Bhiksha Raj, Rita Singh
We further propose Multinomial Masked Proxy (MMP) loss to leverage the hardness of speaker pairs.
1 code implementation • 17 Aug 2020 • Soham Deshmukh, Bhiksha Raj, Rita Singh
Weakly Labelled learning has garnered lot of attention in recent years due to its potential to scale Sound Event Detection (SED) and is formulated as Multiple Instance Learning (MIL) problem.
no code implementations • 28 May 2020 • Muhammad A. Shah, Raphael Olivier, Bhiksha Raj
Deploying deep learning models, comprising of non-linear combination of millions, even billions, of parameters is challenging given the memory, power and compute constraints of the real world.
no code implementations • LREC 2020 • Joana Correia, Isabel Trancoso, Bhiksha Raj
The automation of the diagnosis and monitoring of speech affecting diseases in real life situations, such as Depression or Parkinson{'}s disease, depends on the existence of rich and large datasets that resemble real life conditions, such as those collected from in-the-wild multimedia repositories like YouTube.
1 code implementation • NeurIPS 2019 • Yandong Wen, Bhiksha Raj, Rita Singh
The network learns to generate faces from voices by matching the identities of generated faces to those of the speakers, on a training set.
no code implementations • 13 Nov 2019 • Hira Dhamyal, Shahan Ali Memon, Bhiksha Raj, Rita Singh
Our tests show significant differences in the manner and choice of phonemes in acted and natural speech, concluding moderate to low validity and value in using acted speech databases for emotion classification tasks.
no code implementations • 24 Oct 2019 • Shahan Ali Memon, Hira Dhamyal, Oren Wright, Daniel Justice, Vijaykumar Palat, William Boler, Bhiksha Raj, Rita Singh
While we limit ourselves to a single modality (i. e. speech), our framework is applicable to studies of emotion perception from all such loosely annotated data in general.
no code implementations • 26 May 2019 • Daanish Ali Khan, Linhong Li, Ninghao Sha, Zhuoran Liu, Abelino Jimenez, Bhiksha Raj, Rita Singh
Recent breakthroughs in the field of deep learning have led to advancements in a broad spectrum of tasks in computer vision, audio processing, natural language processing and other areas.
1 code implementation • 25 May 2019 • Yandong Wen, Rita Singh, Bhiksha Raj
Voice profiling aims at inferring various human parameters from their speech, e. g. gender, age, etc.
1 code implementation • 14 May 2019 • Chirag Nagpal, Rohan Sangave, Amit Chahar, Parth Shah, Artur Dubrawski, Bhiksha Raj
Semi-parametric survival analysis methods like the Cox Proportional Hazards (CPH) regression (Cox, 1972) are a popular approach for survival analysis.
no code implementations • 18 Mar 2019 • Wenbo Zhao, Yang Gao, Shahan Ali Memon, Bhiksha Raj, Rita Singh
Addressing these problems, we propose a binary tree-structured hierarchical routing mixture of experts (HRME) model that has classifiers as non-leaf node experts and simple regression models as leaf node experts.
1 code implementation • 7 Feb 2019 • Felix Kreuk, Yossi Adi, Bhiksha Raj, Rita Singh, Joseph Keshet
Steganography is the science of hiding a secret message within an ordinary public message, which is referred to as Carrier.
1 code implementation • 25 Nov 2018 • Anurag Kumar, Ankit Shah, Alex Hauptmann, Bhiksha Raj
In the last couple of years, weakly labeled learning for sound events has turned out to be an exciting approach for audio event detection.
no code implementations • 19 Nov 2018 • Kai Hu, Bhiksha Raj
Capturing spatiotemporal dynamics is an essential topic in video recognition.
no code implementations • 1 Oct 2018 • Shahan Ali Memon, Wenbo Zhao, Bhiksha Raj, Rita Singh
Regression-via-Classification (RvC) is the process of converting a regression problem to a classification one.
no code implementations • 27 Sep 2018 • Wenbo Zhao, Shahan Ali Memon, Bhiksha Raj, Rita Singh
Regression-via-Classification (RvC) is the process of converting a regression problem to a classification one.
no code implementations • ICLR 2019 • Yandong Wen, Mahmoud Al Ismail, Weiyang Liu, Bhiksha Raj, Rita Singh
We propose a novel framework, called Disjoint Mapping Network (DIMNet), for cross-modal biometric matching, in particular of voices and faces.
no code implementations • 12 Jul 2018 • Yandong Wen, Mahmoud Al Ismail, Bhiksha Raj, Rita Singh
In many retrieval problems, where we must retrieve one or more entries from a gallery in response to a probe, it is common practice to learn to do by directly comparing the probe and gallery entries to one another.
1 code implementation • 24 Apr 2018 • Ankit Shah, Anurag Kumar, Alexander G. Hauptmann, Bhiksha Raj
In this work, we first describe a CNN based approach for weakly supervised training of audio events.
no code implementations • 19 Feb 2018 • Yang Gao, Rita Singh, Bhiksha Raj
In voice impersonation, the resultant voice must convincingly convey the impression of having been naturally produced by the target speaker, mimicking not only the pitch and other perceivable signal qualities, but also the style of the target speaker.
Sound Audio and Speech Processing
no code implementations • 2 Nov 2017 • Rohan Badlani, Ankit Shah, Benjamin Elizalde, Anurag Kumar, Bhiksha Raj
The framework crawls videos using search queries corresponding to 78 sound event labels drawn from three datasets.
no code implementations • 13 Jul 2017 • Anders Oland, Aayush Bansal, Roger B. Dannenberg, Bhiksha Raj
To this end, we demonstrate faster convergence and better performance on diverse classification tasks: image classification using CIFAR-10 and ImageNet, and semantic segmentation using PASCAL VOC 2012.
no code implementations • 9 Jul 2017 • Anurag Kumar, Bhiksha Raj
We propose that learning algorithms that can exploit weak labels offer an effective method to learn from web data.
15 code implementations • CVPR 2017 • Weiyang Liu, Yandong Wen, Zhiding Yu, Ming Li, Bhiksha Raj, Le Song
This paper addresses deep face recognition (FR) problem under open-set protocol, where ideal face features are expected to have smaller maximal intra-class distance than minimal inter-class distance under a suitably chosen metric space.
Ranked #1 on
Face Verification
on CK+
no code implementations • 24 Feb 2017 • Haohan Wang, Bhiksha Raj
This paper is a review of the evolutionary history of deep learning models.
no code implementations • 16 Jan 2017 • Aditya Sharma, Nikolas Wolfe, Bhiksha Raj
How much can pruning algorithms teach us about the fundamentals of learning representations in neural networks?
no code implementations • 12 Nov 2016 • Anurag Kumar, Bhiksha Raj
In this paper we propose a novel learning framework called Supervised and Weakly Supervised Learning where the goal is to learn simultaneously from weakly and strongly labeled data.
no code implementations • 23 Sep 2016 • Anurag Kumar, Bhiksha Raj, Ndapandula Nakashole
In this paper we describe approaches for discovering acoustic concepts and relations in text.
no code implementations • 20 Sep 2016 • Benjamin Elizalde, Ankit Shah, Siddharth Dalmia, Min Hun Lee, Rohan Badlani, Anurag Kumar, Bhiksha Raj, Ian Lane
The audio event detectors are trained on the labeled audio and ran on the unlabeled audio downloaded from YouTube.
no code implementations • 19 Jul 2016 • Anurag Kumar, Bhiksha Raj
One of the most important problems in audio event detection research is absence of benchmark results for comparison with any proposed method.
Sound Multimedia
no code implementations • 13 Jul 2016 • Sebastian Sager, Benjamin Elizalde, Damian Borth, Christian Schulze, Bhiksha Raj, Ian Lane
One contribution is the previously unavailable documentation of the challenges and implications of collecting audio recordings with these type of labels.
no code implementations • 9 Jul 2016 • Anurag Kumar, Bhiksha Raj
In this paper we propose strategies for estimating performance of a classifier when labels cannot be obtained for the whole test set.
no code implementations • 12 Jun 2016 • Anurag Kumar, Bhiksha Raj
Audio Event Detection is an important task for content analysis of multimedia data.
no code implementations • 9 May 2016 • Anurag Kumar, Bhiksha Raj
This helps in obtaining a complete description of the recording and is notable since temporal information was never known in the first place in weakly labeled data.
no code implementations • 27 Feb 2016 • Rahul Radhakrishnan Iyer, Sanjeel Parekh, Vikas Mohandoss, Anush Ramsurat, Bhiksha Raj, Rita Singh
Existing video indexing and retrieval methods on popular web-based multimedia sharing websites are based on user-provided sparse tagging.
no code implementations • 11 Jan 2016 • Suyoun Kim, Bhiksha Raj, Ian Lane
We propose a novel deep neural network architecture for speech recognition that explicitly employs knowledge of the background environmental noise within a deep neural network acoustic model.
no code implementations • 16 Nov 2015 • Zhenzhong Lan, Shoou-I Yu, Ming Lin, Bhiksha Raj, Alexander G. Hauptmann
We approach this problem by first showing that local handcrafted features and Convolutional Neural Networks (CNNs) share the same convolution-pooling network structure.
no code implementations • 16 Oct 2015 • Haohan Wang, Bhiksha Raj
Further, we will also look into the development history of modelling time series data with neural networks.
no code implementations • 6 Aug 2015 • Luís Marujo, José Portêlo, Wang Ling, David Martins de Matos, João P. Neto, Anatole Gershman, Jaime Carbonell, Isabel Trancoso, Bhiksha Raj
State-of-the-art extractive multi-document summarization systems are usually designed without any concern about privacy issues, meaning that all documents are open to third parties.
no code implementations • 27 Feb 2015 • Soham De, Indradyumna Roy, Tarunima Prabhakar, Kriti Suneja, Sourish Chaudhuri, Rita Singh, Bhiksha Raj
Given the large number of new musical tracks released each year, automated approaches to plagiarism detection are essential to help us track potential violations of copyright.
no code implementations • 6 Feb 2015 • Anurag Kumar, Bhiksha Raj
We also introduce a novel metric for ranking instances based on an index which depends upon the rank of weighted scores of test points among the weighted scores of training points.
no code implementations • CVPR 2015 • Zhenzhong Lan, Ming Lin, Xuanchong Li, Alexander G. Hauptmann, Bhiksha Raj
MIFS compensates for information lost from using differential operators by recapturing information at coarse scales.
no code implementations • NeurIPS 2012 • Sourish Chaudhuri, Bhiksha Raj
Approaches to audio classification and retrieval tasks largely rely on detection-based discriminative models.
no code implementations • 7 Sep 2012 • Sohail Bahmani, Petros T. Boufounos, Bhiksha Raj
As an example we elaborate on application of the main results to estimation in Generalized Linear Model.
no code implementations • NeurIPS 2010 • Manas Pathak, Shantanu Rane, Bhiksha Raj
As increasing amounts of sensitive personal information finds its way into data repositories, it is important to develop analysis mechanisms that can derive aggregate information from these repositories without revealing information about individual data instances.
no code implementations • NeurIPS 2009 • Paris Smaragdis, Madhusudana Shashanka, Bhiksha Raj
In this paper we present an algorithm for separating mixed sounds from a monophonic recording.
no code implementations • NeurIPS 2007 • Madhusudana Shashanka, Bhiksha Raj, Paris Smaragdis
An important problem in many fields is the analysis of counts data to extract meaningful latent components.