Search Results for author: Bhiksha Raj

Found 69 papers, 19 papers with code

Self-supervision and Learnable STRFs for Age, Emotion, and Country Prediction

no code implementations25 Jun 2022 Roshan Sharma, Tyler Vuong, Mark Lindsey, Hira Dhamyal, Rita Singh, Bhiksha Raj

This work presents a multitask approach to the simultaneous estimation of age, country of origin, and emotion given vocal burst audio for the 2022 ICML Expressive Vocalizations Challenge ExVo-MultiTask track.

Towards End-to-End Private Automatic Speaker Recognition

no code implementations23 Jun 2022 Francisco Teixeira, Alberto Abad, Bhiksha Raj, Isabel Trancoso

This poses two important issues: first, knowledge of the speaker embedding extraction model may create security and robustness liabilities for the authentication system, as this knowledge might help attackers in crafting adversarial examples able to mislead the system; second, from the point of view of a service provider the speaker embedding extraction model is arguably one of the most valuable components in the system and, as such, disclosing it would be highly undesirable.

Privacy Preserving Speaker Recognition +1

Bear the Query in Mind: Visual Grounding with Query-conditioned Convolution

no code implementations18 Jun 2022 Chonghan Chen, Qi Jiang, Chih-Hao Wang, Noel Chen, Haohan Wang, Xiang Li, Bhiksha Raj

With our proposed QCM, the downstream fusion module receives visual features that are more discriminative and focused on the desired object described in the expression, leading to more accurate predictions.

Visual Grounding

FreeMatch: Self-adaptive Thresholding for Semi-supervised Learning

1 code implementation15 May 2022 Yidong Wang, Hao Chen, Qiang Heng, Wenxin Hou, Yue Fan, Zhen Wu, Jindong Wang, Marios Savvides, Takahiro Shinozaki, Bhiksha Raj, Bernt Schiele, Xing Xie

Based on the analysis, we hence propose FreeMatch to define and adjust the confidence threshold in a self-adaptive manner according to the model's learning status.


Recent improvements of ASR models in the face of adversarial attacks

1 code implementation29 Mar 2022 Raphael Olivier, Bhiksha Raj

Like many other tasks involving neural networks, Speech Recognition models are vulnerable to adversarial attacks.

speech-recognition Speech Recognition

Point3D: tracking actions as moving points with 3D CNNs

no code implementations20 Mar 2022 Shentong Mo, Jingfei Xia, Xiaoqing Tan, Bhiksha Raj

Our Point3D consists of a Point Head for action localization and a 3D Head for action classification.

Action Classification Action Localization +1

Ontological Learning from Weak Labels

no code implementations4 Mar 2022 Larry Tang, Po Hao Chou, Yi Yu Zheng, Ziqian Ge, Ankit Shah, Bhiksha Raj

We find that the baseline Siamese does not perform better by incorporating ontology information in the weak and multi-label scenario, but that the GCN does capture the ontology knowledge better for weak, multi-labeled data.

Sequential Randomized Smoothing for Adversarially Robust Speech Recognition

1 code implementation EMNLP 2021 Raphael Olivier, Bhiksha Raj

We apply adaptive versions of state-of-the-art attacks, such as the Imperceptible ASR attack, to our model, and show that our strongest defense is robust to all attacks that use inaudible noise, and can only be broken with very high distortion.

Automatic Speech Recognition Robust Speech Recognition +1

Self-Supervised 3D Face Reconstruction via Conditional Estimation

no code implementations ICCV 2021 Yandong Wen, Weiyang Liu, Bhiksha Raj, Rita Singh

We present a conditional estimation (CEST) framework to learn 3D facial parameters from 2D single-view images by self-supervised training from videos.

3D Face Reconstruction Disentanglement

SphereFace Revived: Unifying Hyperspherical Face Recognition

no code implementations12 Sep 2021 Weiyang Liu, Yandong Wen, Bhiksha Raj, Rita Singh, Adrian Weller

As one of the earliest works in hyperspherical face recognition, SphereFace explicitly proposed to learn face embeddings with large inter-class angular margin.

Face Recognition

The Right to Talk: An Audio-Visual Transformer Approach

1 code implementation ICCV 2021 Thanh-Dat Truong, Chi Nhan Duong, The De Vu, Hoang Anh Pham, Bhiksha Raj, Ngan Le, Khoa Luu

Therefore, this work introduces a new Audio-Visual Transformer approach to the problem of localization and highlighting the main speaker in both audio and visual channels of a multi-speaker conversation video in the wild.

SphereFace2: Binary Classification is All You Need for Deep Face Recognition

no code implementations ICLR 2022 Yandong Wen, Weiyang Liu, Adrian Weller, Bhiksha Raj, Rita Singh

In this paper, we start by identifying the discrepancy between training and evaluation in the existing multi-class classification framework and then discuss the potential limitations caused by the "competitive" nature of softmax normalization.

Classification Face Recognition +1

Controlled AutoEncoders to Generate Faces from Voices

no code implementations16 Jul 2021 Hao Liang, Lulan Yu, Guikang Xu, Bhiksha Raj, Rita Singh

With this in perspective, we propose a framework to morph a target face in response to a given voice in a way that facial features are implicitly guided by learned voice-face correlation in this paper.

Improving weakly supervised sound event detection with self-supervised auxiliary tasks

1 code implementation12 Jun 2021 Soham Deshmukh, Bhiksha Raj, Rita Singh

To that extent, we propose a shared encoder architecture with sound event detection as a primary task and an additional secondary decoder for a self-supervised auxiliary task.

Event Detection Sound Event Detection +1

Training image classifiers using Semi-Weak Label Data

1 code implementation19 Mar 2021 Anxiang Zhang, Ankit Shah, Bhiksha Raj

Thus, this paper introduces a novel semi-weak label learning paradigm as a middle ground to mitigate the problem.

Multiple Instance Learning

Constant Random Perturbations Provide Adversarial Robustness with Minimal Effect on Accuracy

1 code implementation15 Mar 2021 Bronya Roni Chernyak, Bhiksha Raj, Tamir Hazan, Joseph Keshet

This paper proposes an attack-independent (non-adversarial training) technique for improving adversarial robustness of neural network models, with minimal loss of standard accuracy.

Adversarial Robustness

Contrast and Order Representations for Video Self-Supervised Learning

no code implementations ICCV 2021 Kai Hu, Jie Shao, YuAn Liu, Bhiksha Raj, Marios Savvides, Zhiqiang Shen

To address this, we present a contrast-and-order representation (CORP) framework for learning self-supervised video representations that can automatically capture both the appearance information within each frame and temporal information across different frames.

Action Recognition Self-Supervised Learning

Is normalization indispensable for training deep neural network?

1 code implementation NeurIPS 2020 Jie Shao, Kai Hu, Changhu Wang, xiangyang xue, Bhiksha Raj

In this paper, we study what would happen when normalization layers are removed from the network, and show how to train deep neural networks without normalization layers and without performance degradation.

General Classification Image Classification +5

Multi-Task Learning for Interpretable Weakly Labelled Sound Event Detection

1 code implementation17 Aug 2020 Soham Deshmukh, Bhiksha Raj, Rita Singh

Weakly Labelled learning has garnered lot of attention in recent years due to its potential to scale Sound Event Detection (SED) and is formulated as Multiple Instance Learning (MIL) problem.

Event Detection Multiple Instance Learning +2

Exploiting Non-Linear Redundancy for Neural Model Compression

no code implementations28 May 2020 Muhammad A. Shah, Raphael Olivier, Bhiksha Raj

Deploying deep learning models, comprising of non-linear combination of millions, even billions, of parameters is challenging given the memory, power and compute constraints of the real world.

Model Compression

Automatic In-the-wild Dataset Annotation with Deep Generalized Multiple Instance Learning

no code implementations LREC 2020 Joana Correia, Isabel Trancoso, Bhiksha Raj

The automation of the diagnosis and monitoring of speech affecting diseases in real life situations, such as Depression or Parkinson{'}s disease, depends on the existence of rich and large datasets that resemble real life conditions, such as those collected from in-the-wild multimedia repositories like YouTube.

Multiple Instance Learning

Face Reconstruction from Voice using Generative Adversarial Networks

1 code implementation NeurIPS 2019 Yandong Wen, Bhiksha Raj, Rita Singh

The network learns to generate faces from voices by matching the identities of generated faces to those of the speakers, on a training set.

Face Reconstruction

The phonetic bases of vocal expressed emotion: natural versus acted

no code implementations13 Nov 2019 Hira Dhamyal, Shahan Ali Memon, Bhiksha Raj, Rita Singh

Our tests show significant differences in the manner and choice of phonemes in acted and natural speech, concluding moderate to low validity and value in using acted speech databases for emotion classification tasks.

Emotion Classification General Classification

Detecting gender differences in perception of emotion in crowdsourced data

no code implementations24 Oct 2019 Shahan Ali Memon, Hira Dhamyal, Oren Wright, Daniel Justice, Vijaykumar Palat, William Boler, Bhiksha Raj, Rita Singh

While we limit ourselves to a single modality (i. e. speech), our framework is applicable to studies of emotion perception from all such loosely annotated data in general.

Non-Determinism in Neural Networks for Adversarial Robustness

no code implementations26 May 2019 Daanish Ali Khan, Linhong Li, Ninghao Sha, Zhuoran Liu, Abelino Jimenez, Bhiksha Raj, Rita Singh

Recent breakthroughs in the field of deep learning have led to advancements in a broad spectrum of tasks in computer vision, audio processing, natural language processing and other areas.

Adversarial Robustness Computer Vision +1

Reconstructing faces from voices

1 code implementation25 May 2019 Yandong Wen, Rita Singh, Bhiksha Raj

Voice profiling aims at inferring various human parameters from their speech, e. g. gender, age, etc.

Nonlinear Semi-Parametric Models for Survival Analysis

1 code implementation14 May 2019 Chirag Nagpal, Rohan Sangave, Amit Chahar, Parth Shah, Artur Dubrawski, Bhiksha Raj

Semi-parametric survival analysis methods like the Cox Proportional Hazards (CPH) regression (Cox, 1972) are a popular approach for survival analysis.

Survival Analysis

Hierarchical Routing Mixture of Experts

no code implementations18 Mar 2019 Wenbo Zhao, Yang Gao, Shahan Ali Memon, Bhiksha Raj, Rita Singh

Addressing these problems, we propose a binary tree-structured hierarchical routing mixture of experts (HRME) model that has classifiers as non-leaf node experts and simple regression models as leaf node experts.

Hide and Speak: Towards Deep Neural Networks for Speech Steganography

1 code implementation7 Feb 2019 Felix Kreuk, Yossi Adi, Bhiksha Raj, Rita Singh, Joseph Keshet

Steganography is the science of hiding a secret message within an ordinary public message, which is referred to as Carrier.

Learning Sound Events From Webly Labeled Data

1 code implementation25 Nov 2018 Anurag Kumar, Ankit Shah, Alex Hauptmann, Bhiksha Raj

In the last couple of years, weakly labeled learning for sound events has turned out to be an exciting approach for audio event detection.

Event Detection Sound Event Detection +1

Higher-order Network for Action Recognition

no code implementations19 Nov 2018 Kai Hu, Bhiksha Raj

Capturing spatiotemporal dynamics is an essential topic in video recognition.

Action Recognition General Classification +1

Neural Regression Trees

no code implementations1 Oct 2018 Shahan Ali Memon, Wenbo Zhao, Bhiksha Raj, Rita Singh

Regression-via-Classification (RvC) is the process of converting a regression problem to a classification one.

Classification General Classification

Neural Regression Tree

no code implementations27 Sep 2018 Wenbo Zhao, Shahan Ali Memon, Bhiksha Raj, Rita Singh

Regression-via-Classification (RvC) is the process of converting a regression problem to a classification one.


Disjoint Mapping Network for Cross-modal Matching of Voices and Faces

no code implementations ICLR 2019 Yandong Wen, Mahmoud Al Ismail, Weiyang Liu, Bhiksha Raj, Rita Singh

We propose a novel framework, called Disjoint Mapping Network (DIMNet), for cross-modal biometric matching, in particular of voices and faces.

Optimal Strategies for Matching and Retrieval Problems by Comparing Covariates

no code implementations12 Jul 2018 Yandong Wen, Mahmoud Al Ismail, Bhiksha Raj, Rita Singh

In many retrieval problems, where we must retrieve one or more entries from a gallery in response to a probe, it is common practice to learn to do by directly comparing the probe and gallery entries to one another.

A Closer Look at Weak Label Learning for Audio Events

1 code implementation24 Apr 2018 Ankit Shah, Anurag Kumar, Alexander G. Hauptmann, Bhiksha Raj

In this work, we first describe a CNN based approach for weakly supervised training of audio events.

Audio Classification Event Detection +1

Voice Impersonation using Generative Adversarial Networks

no code implementations19 Feb 2018 Yang Gao, Rita Singh, Bhiksha Raj

In voice impersonation, the resultant voice must convincingly convey the impression of having been naturally produced by the target speaker, mimicking not only the pitch and other perceivable signal qualities, but also the style of the target speaker.

Sound Audio and Speech Processing

Framework for evaluation of sound event detection in web videos

no code implementations2 Nov 2017 Rohan Badlani, Ankit Shah, Benjamin Elizalde, Anurag Kumar, Bhiksha Raj

The framework crawls videos using search queries corresponding to 78 sound event labels drawn from three datasets.

Event Detection Sound Event Detection

Be Careful What You Backpropagate: A Case For Linear Output Activations & Gradient Boosting

no code implementations13 Jul 2017 Anders Oland, Aayush Bansal, Roger B. Dannenberg, Bhiksha Raj

To this end, we demonstrate faster convergence and better performance on diverse classification tasks: image classification using CIFAR-10 and ImageNet, and semantic segmentation using PASCAL VOC 2012.

Classification General Classification +2

Deep CNN Framework for Audio Event Recognition using Weakly Labeled Web Data

no code implementations9 Jul 2017 Anurag Kumar, Bhiksha Raj

We propose that learning algorithms that can exploit weak labels offer an effective method to learn from web data.

SphereFace: Deep Hypersphere Embedding for Face Recognition

15 code implementations CVPR 2017 Weiyang Liu, Yandong Wen, Zhiding Yu, Ming Li, Bhiksha Raj, Le Song

This paper addresses deep face recognition (FR) problem under open-set protocol, where ideal face features are expected to have smaller maximal intra-class distance than minimal inter-class distance under a suitably chosen metric space.

Face Identification Face Recognition +1

On the Origin of Deep Learning

no code implementations24 Feb 2017 Haohan Wang, Bhiksha Raj

This paper is a review of the evolutionary history of deep learning models.

The Incredible Shrinking Neural Network: New Perspectives on Learning Representations Through The Lens of Pruning

no code implementations16 Jan 2017 Aditya Sharma, Nikolas Wolfe, Bhiksha Raj

How much can pruning algorithms teach us about the fundamentals of learning representations in neural networks?

Network Pruning

Audio Event and Scene Recognition: A Unified Approach using Strongly and Weakly Labeled Data

no code implementations12 Nov 2016 Anurag Kumar, Bhiksha Raj

In this paper we propose a novel learning framework called Supervised and Weakly Supervised Learning where the goal is to learn simultaneously from weakly and strongly labeled data.

Scene Recognition

Discovering Sound Concepts and Acoustic Relations In Text

no code implementations23 Sep 2016 Anurag Kumar, Bhiksha Raj, Ndapandula Nakashole

In this paper we describe approaches for discovering acoustic concepts and relations in text.

Dependency Parsing

An Approach for Self-Training Audio Event Detectors Using Web Data

no code implementations20 Sep 2016 Benjamin Elizalde, Ankit Shah, Siddharth Dalmia, Min Hun Lee, Rohan Badlani, Anurag Kumar, Bhiksha Raj, Ian Lane

The audio event detectors are trained on the labeled audio and ran on the unlabeled audio downloaded from YouTube.

Event Detection

Features and Kernels for Audio Event Recognition

no code implementations19 Jul 2016 Anurag Kumar, Bhiksha Raj

One of the most important problems in audio event detection research is absence of benchmark results for comparison with any proposed method.

Sound Multimedia

AudioPairBank: Towards A Large-Scale Tag-Pair-Based Audio Content Analysis

no code implementations13 Jul 2016 Sebastian Sager, Benjamin Elizalde, Damian Borth, Christian Schulze, Bhiksha Raj, Ian Lane

One contribution is the previously unavailable documentation of the challenges and implications of collecting audio recordings with these type of labels.


Classifier Risk Estimation under Limited Labeling Resources

no code implementations9 Jul 2016 Anurag Kumar, Bhiksha Raj

In this paper we propose strategies for estimating performance of a classifier when labels cannot be obtained for the whole test set.

Weakly Supervised Scalable Audio Content Analysis

no code implementations12 Jun 2016 Anurag Kumar, Bhiksha Raj

Audio Event Detection is an important task for content analysis of multimedia data.

Event Detection Multiple Instance Learning

Audio Event Detection using Weakly Labeled Data

no code implementations9 May 2016 Anurag Kumar, Bhiksha Raj

This helps in obtaining a complete description of the recording and is notable since temporal information was never known in the first place in weakly labeled data.

Event Detection Multiple Instance Learning

Content-based Video Indexing and Retrieval Using Corr-LDA

no code implementations27 Feb 2016 Rahul Radhakrishnan Iyer, Sanjeel Parekh, Vikas Mohandoss, Anush Ramsurat, Bhiksha Raj, Rita Singh

Existing video indexing and retrieval methods on popular web-based multimedia sharing websites are based on user-provided sparse tagging.

Environmental Noise Embeddings for Robust Speech Recognition

no code implementations11 Jan 2016 Suyoun Kim, Bhiksha Raj, Ian Lane

We propose a novel deep neural network architecture for speech recognition that explicitly employs knowledge of the background environmental noise within a deep neural network acoustic model.

Multi-Task Learning Robust Speech Recognition +1

Handcrafted Local Features are Convolutional Neural Networks

no code implementations16 Nov 2015 Zhenzhong Lan, Shoou-I Yu, Ming Lin, Bhiksha Raj, Alexander G. Hauptmann

We approach this problem by first showing that local handcrafted features and Convolutional Neural Networks (CNNs) share the same convolution-pooling network structure.

Action Recognition Optical Flow Estimation

Privacy-Preserving Multi-Document Summarization

no code implementations6 Aug 2015 Luís Marujo, José Portêlo, Wang Ling, David Martins de Matos, João P. Neto, Anatole Gershman, Jaime Carbonell, Isabel Trancoso, Bhiksha Raj

State-of-the-art extractive multi-document summarization systems are usually designed without any concern about privacy issues, meaning that all documents are open to third parties.

Document Summarization Multi-Document Summarization +1

Plagiarism Detection in Polyphonic Music using Monaural Signal Separation

no code implementations27 Feb 2015 Soham De, Indradyumna Roy, Tarunima Prabhakar, Kriti Suneja, Sourish Chaudhuri, Rita Singh, Bhiksha Raj

Given the large number of new musical tracks released each year, automated approaches to plagiarism detection are essential to help us track potential violations of copyright.

General Classification

Unsupervised Fusion Weight Learning in Multiple Classifier Systems

no code implementations6 Feb 2015 Anurag Kumar, Bhiksha Raj

We also introduce a novel metric for ranking instances based on an index which depends upon the rank of weighted scores of test points among the weighted scores of training points.

Learning Model-Based Sparsity via Projected Gradient Descent

no code implementations7 Sep 2012 Sohail Bahmani, Petros T. Boufounos, Bhiksha Raj

As an example we elaborate on application of the main results to estimation in Generalized Linear Model.

Multiparty Differential Privacy via Aggregation of Locally Trained Classifiers

no code implementations NeurIPS 2010 Manas Pathak, Shantanu Rane, Bhiksha Raj

As increasing amounts of sensitive personal information finds its way into data repositories, it is important to develop analysis mechanisms that can derive aggregate information from these repositories without revealing information about individual data instances.

Privacy Preserving

Sparse Overcomplete Latent Variable Decomposition of Counts Data

no code implementations NeurIPS 2007 Madhusudana Shashanka, Bhiksha Raj, Paris Smaragdis

An important problem in many fields is the analysis of counts data to extract meaningful latent components.

Cannot find the paper you are looking for? You can Submit a new open access paper.