Search Results for author: Bhiksha Raj

Found 127 papers, 44 papers with code

Synergistic Global-space Camera and Human Reconstruction from Videos

no code implementations • 23 May 2024 • Yizhou Zhao, Tuanfeng Y. Wang, Bhiksha Raj, Min Xu, Jimei Yang, Chun-Hao Paul Huang

Specifically, we design Human-aware Metric SLAM to reconstruct metric-scale camera poses and scene point clouds using camera-frame HMR as a strong prior, addressing depth, scale, and dynamic ambiguities.

Paper
Add Code

Improving Membership Inference in ASR Model Auditing with Perturbed Loss Features

no code implementations • 2 May 2024 • Francisco Teixeira, Karla Pizzi, Raphael Olivier, Alberto Abad, Bhiksha Raj, Isabel Trancoso

Membership Inference (MI) poses a substantial privacy threat to the training data of Automatic Speech Recognition (ASR) systems, while also offering an opportunity to audit these models with regard to user data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Learning with Noisy Foundation Models

no code implementations • 11 Mar 2024 • Hao Chen, Jindong Wang, Zihan Wang, Ran Tao, Hongxin Wei, Xing Xie, Masashi Sugiyama, Bhiksha Raj

Foundation models are usually pre-trained on large-scale datasets and then adapted to downstream tasks through tuning.

Paper
Add Code

$\text{R}^2$-Bench: Benchmarking the Robustness of Referring Perception Models under Perturbations

2 code implementations • 7 Mar 2024 • Xiang Li, Kai Qiu, Jinglu Wang, Xiaohao Xu, Rita Singh, Kashu Yamazak, Hao Chen, Xiaonan Huang, Bhiksha Raj

Referring perception, which aims at grounding visual objects with multimodal referring guidance, is essential for bridging the gap between humans, who provide instructions, and the environment where intelligent systems perceive.

Benchmarking

Paper
Code

AutoPRM: Automating Procedural Supervision for Multi-Step Reasoning via Controllable Question Decomposition

no code implementations • 18 Feb 2024 • Zhaorun Chen, Zhuokai Zhao, Zhihong Zhu, Ruiqi Zhang, Xiang Li, Bhiksha Raj, Huaxiu Yao

Recent advancements in large language models (LLMs) have shown promise in multi-step reasoning tasks, yet their reliance on extensive manual labeling to provide procedural feedback remains a significant impediment.

Paper
Add Code

Evaluating and Improving Continual Learning in Spoken Language Understanding

no code implementations • 16 Feb 2024 • Muqiao Yang, Xiang Li, Umberto Cappellazzo, Shinji Watanabe, Bhiksha Raj

In this work, we propose an evaluation methodology that provides a unified evaluation on stability, plasticity, and generalizability in continual learning.

Continual Learning Spoken Language Understanding

Paper
Add Code

Customizable Perturbation Synthesis for Robust SLAM Benchmarking

1 code implementation • 12 Feb 2024 • Xiaohao Xu, Tianyi Zhang, Sibo Wang, Xiang Li, Yongqi Chen, Ye Li, Bhiksha Raj, Matthew Johnson-Roberson, Xiaonan Huang

To this end, we propose a novel, customizable pipeline for noisy data synthesis, aimed at assessing the resilience of multi-modal SLAM models against various perturbations.

Benchmarking Simultaneous Localization and Mapping

Paper
Code

A General Framework for Learning from Weak Supervision

1 code implementation • 2 Feb 2024 • Hao Chen, Jindong Wang, Lei Feng, Xiang Li, Yidong Wang, Xing Xie, Masashi Sugiyama, Rita Singh, Bhiksha Raj

Weakly supervised learning generally faces challenges in applicability to various scenarios with diverse weak supervision and in scalability due to the complexity of existing algorithms, thereby hindering the practical deployment.

Weakly-supervised Learning

Paper
Code

On Catastrophic Inheritance of Large Foundation Models

no code implementations • 2 Feb 2024 • Hao Chen, Bhiksha Raj, Xing Xie, Jindong Wang

Large foundation models (LFMs) are claiming incredible performances.

Paper
Add Code

PAM: Prompting Audio-Language Models for Audio Quality Assessment

1 code implementation • 1 Feb 2024 • Soham Deshmukh, Dareen Alharthi, Benjamin Elizalde, Hannes Gamper, Mahmoud Al Ismail, Rita Singh, Bhiksha Raj, Huaming Wang

Here, we exploit this capability and introduce PAM, a no-reference metric for assessing audio quality for different audio processing tasks.

Music Generation Text-to-Music Generation

Paper
Code

AugSumm: towards generalizable speech summarization using synthetic labels from large language model

1 code implementation • 10 Jan 2024 • Jee-weon Jung, Roshan Sharma, William Chen, Bhiksha Raj, Shinji Watanabe

We tackle this challenge by proposing AugSumm, a method to leverage large language models (LLMs) as a proxy for human annotators to generate augmented summaries for training and evaluation.

Language Modelling Large Language Model +1

Paper
Code

FALCON: Fairness Learning via Contrastive Attention Approach to Continual Semantic Scene Understanding

no code implementations • 27 Nov 2023 • Thanh-Dat Truong, Utsav Prabhu, Bhiksha Raj, Jackson Cothren, Khoa Luu

In particular, we first introduce a new Fairness Contrastive Clustering loss to address the problems of catastrophic forgetting and fairness.

Continual Learning Continual Semantic Segmentation +3

Paper
Add Code

Token Prediction as Implicit Classification to Identify LLM-Generated Text

1 code implementation • 15 Nov 2023 • Yutian Chen, Hao Kang, Vivian Zhai, Liangze Li, Rita Singh, Bhiksha Raj

This paper introduces a novel approach for identifying the possible large language models (LLMs) involved in text generation.

text-classification Text Classification +1

Paper
Code

Pairwise Similarity Learning is SimPLE

2 code implementations • ICCV 2023 • Yandong Wen, Weiyang Liu, Yao Feng, Bhiksha Raj, Rita Singh, Adrian Weller, Michael J. Black, Bernhard Schölkopf

In this paper, we focus on a general yet important learning problem, pairwise similarity learning (PSL).

Face Recognition Image Retrieval +4

257

Paper
Code

Psychoacoustic Challenges Of Speech Enhancement On VoIP Platforms

no code implementations • 11 Oct 2023 • Joseph Konan, Ojas Bhargave, Shikhar Agnihotri, Shuo Han, Yunyang Zeng, Ankit Shah, Bhiksha Raj

Within the ambit of VoIP (Voice over Internet Protocol) telecommunications, the complexities introduced by acoustic transformations merit rigorous analysis.

Benchmarking Denoising +1

Paper
Add Code

Privacy-oriented manipulation of speaker representations

no code implementations • 10 Oct 2023 • Francisco Teixeira, Alberto Abad, Bhiksha Raj, Isabel Trancoso

Speaker embeddings are ubiquitous, with applications ranging from speaker recognition and diarization to speech synthesis and voice anonymisation.

Speaker Recognition Speech Synthesis

Paper
Add Code

Continual Contrastive Spoken Language Understanding

no code implementations • 4 Oct 2023 • Umberto Cappellazzo, Enrico Fini, Muqiao Yang, Daniele Falavigna, Alessio Brutti, Bhiksha Raj

In this paper, we investigate the problem of learning sequence-to-sequence models for spoken language understanding in a class-incremental learning (CIL) setting and we propose COCONUT, a CIL method that relies on the combination of experience replay and contrastive learning.

Class Incremental Learning Contrastive Learning +3

Paper
Add Code

Prompting Audios Using Acoustic Properties For Emotion Representation

no code implementations • 3 Oct 2023 • Hira Dhamyal, Benjamin Elizalde, Soham Deshmukh, Huaming Wang, Bhiksha Raj, Rita Singh

In this work, we address the challenge of automatically generating these prompts and training a model to better learn emotion representations from audio and prompt pairs.

Contrastive Learning Retrieval +1

Paper
Add Code

uSee: Unified Speech Enhancement and Editing with Conditional Diffusion Models

no code implementations • 2 Oct 2023 • Muqiao Yang, Chunlei Zhang, Yong Xu, Zhongweiyang Xu, Heming Wang, Bhiksha Raj, Dong Yu

Speech enhancement aims to improve the quality of speech signals in terms of quality and intelligibility, and speech editing refers to the process of editing the speech according to specific user needs.

Denoising Self-Supervised Learning +2

Paper
Add Code

LoFT: Local Proxy Fine-tuning For Improving Transferability Of Adversarial Attacks Against Large Language Model

no code implementations • 2 Oct 2023 • Muhammad Ahmed Shah, Roshan Sharma, Hira Dhamyal, Raphael Olivier, Ankit Shah, Joseph Konan, Dareen Alharthi, Hazim T Bukhari, Massa Baali, Soham Deshmukh, Michael Kuhlmann, Bhiksha Raj, Rita Singh

We hypothesize that for attacks to be transferrable, it is sufficient if the proxy can approximate the target model in the neighborhood of the harmful query.

Language Modelling Large Language Model

Paper
Add Code

Completing Visual Objects via Bridging Generation and Segmentation

no code implementations • 1 Oct 2023 • Xiang Li, Yinpeng Chen, Chung-Ching Lin, Hao Chen, Kai Hu, Rita Singh, Bhiksha Raj, Lijuan Wang, Zicheng Liu

This paper presents a novel approach to object completion, with the primary goal of reconstructing a complete object from its partially visible components.

Image Generation Object +1

Paper
Add Code

Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech

1 code implementation • 1 Oct 2023 • Dareen Alharthi, Roshan Sharma, Hira Dhamyal, Soumi Maiti, Bhiksha Raj, Rita Singh

In this paper, we propose an evaluation technique involving the training of an ASR model on synthetic speech and assessing its performance on real speech.

speech-recognition Speech Recognition +1

Paper
Code

QDFormer: Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition

3 code implementations • 29 Sep 2023 • Xiang Li, Jinglu Wang, Xiaohao Xu, Xiulian Peng, Rita Singh, Yan Lu, Bhiksha Raj

We propose a semantic decomposition method based on product quantization, where the multi-source semantics can be decomposed and represented by several disentangled and noise-suppressed single-source semantics.

Quantization

Paper
Code

Understanding and Mitigating the Label Noise in Pre-training on Downstream Tasks

no code implementations • 29 Sep 2023 • Hao Chen, Jindong Wang, Ankit Shah, Ran Tao, Hongxin Wei, Xing Xie, Masashi Sugiyama, Bhiksha Raj

This paper aims to understand the nature of noise in pre-training datasets and to mitigate its impact on downstream tasks.

Paper
Add Code

Importance of negative sampling in weak label learning

no code implementations • 23 Sep 2023 • Ankit Shah, Fuyu Tang, Zelin Ye, Rita Singh, Bhiksha Raj

Weak-label learning is a challenging task that requires learning from data "bags" containing positive and negative instances, but only the bag labels are known.

Paper
Add Code

Training Audio Captioning Models without Audio

1 code implementation • 14 Sep 2023 • Soham Deshmukh, Benjamin Elizalde, Dimitra Emmanouilidou, Bhiksha Raj, Rita Singh, Huaming Wang

During inference, the text encoder is replaced with the pretrained CLAP audio encoder.

Audio captioning Decoder

Paper
Code

Fixed Inter-Neuron Covariability Induces Adversarial Robustness

no code implementations • 7 Aug 2023 • Muhammad Ahmed Shah, Bhiksha Raj

The vulnerability to adversarial perturbations is a major flaw of Deep Neural Networks (DNNs) that raises question about their reliability when in real-world scenarios.

Adversarial Robustness

Paper
Add Code

Rethinking Voice-Face Correlation: A Geometry View

no code implementations • 26 Jul 2023 • Xiang Li, Yandong Wen, Muqiao Yang, Jinglu Wang, Rita Singh, Bhiksha Raj

Previous works on voice-face matching and voice-guided face synthesis demonstrate strong correlations between voice and face, but mainly rely on coarse semantic cues such as gender, age, and emotion.

3D Face Reconstruction Face Generation

Paper
Add Code

The Hidden Dance of Phonemes and Visage: Unveiling the Enigmatic Link between Phonemes and Facial Features

1 code implementation • 26 Jul 2023 • Liao Qu, Xianwei Zou, Xiang Li, Yandong Wen, Rita Singh, Bhiksha Raj

This work unveils the enigmatic link between phonemes and facial features.

Paper
Code

BASS: Block-wise Adaptation for Speech Summarization

no code implementations • 17 Jul 2023 • Roshan Sharma, Kenneth Zheng, Siddhant Arora, Shinji Watanabe, Rita Singh, Bhiksha Raj

End-to-end speech summarization has been shown to improve performance over cascade baselines.

Paper
Add Code

UTOPIA: Unconstrained Tracking Objects without Preliminary Examination via Cross-Domain Adaptation

no code implementations • 16 Jun 2023 • Pha Nguyen, Kha Gia Quach, John Gauch, Samee U. Khan, Bhiksha Raj, Khoa Luu

Then, a new cross-domain MOT adaptation from existing datasets is proposed without any pre-defined human knowledge in understanding and modeling objects.

Domain Adaptation Multiple Object Tracking +1

Paper
Add Code

PaintSeg: Training-free Segmentation via Painting

1 code implementation • 30 May 2023 • Xiang Li, Chung-Ching Lin, Yinpeng Chen, Zicheng Liu, Jinglu Wang, Bhiksha Raj

The paper introduces PaintSeg, a new unsupervised method for segmenting objects without any training.

Referring Image Matting (Prompt-based) Segmentation +1

Paper
Code

Imprecise Label Learning: A Unified Framework for Learning with Various Imprecise Label Configurations

no code implementations • 22 May 2023 • Hao Chen, Ankit Shah, Jindong Wang, Ran Tao, Yidong Wang, Xing Xie, Masashi Sugiyama, Rita Singh, Bhiksha Raj

In this paper, we introduce imprecise label learning (ILL), a framework for the unification of learning with various imprecise label configurations.

Ranked #1 on Learning with noisy labels on mini WebVision 1.0

Learning with noisy labels Partial Label Learning

Paper
Add Code

GPT-Sentinel: Distinguishing Human and ChatGPT Generated Content

2 code implementations • 13 May 2023 • Yutian Chen, Hao Kang, Vivian Zhai, Liangze Li, Rita Singh, Bhiksha Raj

This paper presents a novel approach for detecting ChatGPT-generated vs. human-written text using language models.

text-classification Text Classification

Paper
Code

FREDOM: Fairness Domain Adaptation Approach to Semantic Scene Understanding

1 code implementation • CVPR 2023 • Thanh-Dat Truong, Ngan Le, Bhiksha Raj, Jackson Cothren, Khoa Luu

Although Domain Adaptation in Semantic Scene Segmentation has shown impressive improvement in recent years, the fairness concerns in the domain adaptation have yet to be well defined and addressed.

Ranked #6 on Domain Adaptation on SYNTHIA-to-Cityscapes

Autonomous Driving Domain Adaptation +4

Paper
Code

Improving Perceptual Quality, Intelligibility, and Acoustics on VoIP Platforms

no code implementations • 16 Mar 2023 • Joseph Konan, Ojas Bhargave, Shikhar Agnihotri, Hojeong Lee, Ankit Shah, Shuo Han, Yunyang Zeng, Amanda Shu, Haohui Liu, Xuankai Chang, Hamza Khalid, Minseon Gwak, Kawon Lee, Minjeong Kim, Bhiksha Raj

In this paper, we present a method for fine-tuning models trained on the Deep Noise Suppression (DNS) 2020 Challenge to improve their performance on Voice over Internet Protocol (VoIP) applications.

Multi-Task Learning Speech Enhancement +2

Paper
Add Code

Approach to Learning Generalized Audio Representation Through Batch Embedding Covariance Regularization and Constant-Q Transforms

no code implementations • 7 Mar 2023 • Ankit Shah, Shuyi Chen, Kejun Zhou, Yue Chen, Bhiksha Raj

Preliminary results show (1) the proposed BECR can incur a more dispersed embedding on the test set, (2) BECR improves the PaSST model without extra computation complexity, and (3) STFT preprocessing outperforms CQT in all tasks we tested.

Zero-Shot Learning

Paper
Add Code

Synergy between human and machine approaches to sound/scene recognition and processing: An overview of ICASSP special session

no code implementations • 20 Feb 2023 • Laurie M. Heller, Benjamin Elizalde, Bhiksha Raj, Soham Deshmukh

Machine Listening, as usually formalized, attempts to perform a task that is, from our perspective, fundamentally human-performable, and performed by humans.

Scene Recognition

Paper
Add Code

TAPLoss: A Temporal Acoustic Parameter Loss for Speech Enhancement

2 code implementations • 16 Feb 2023 • Yunyang Zeng, Joseph Konan, Shuo Han, David Bick, Muqiao Yang, Anurag Kumar, Shinji Watanabe, Bhiksha Raj

We propose an objective for perceptual quality based on temporal acoustic parameters.

Speaker Recognition Speech Enhancement

Paper
Code

PAAPLoss: A Phonetic-Aligned Acoustic Parameter Loss for Speech Enhancement

2 code implementations • 16 Feb 2023 • Muqiao Yang, Joseph Konan, David Bick, Yunyang Zeng, Shuo Han, Anurag Kumar, Shinji Watanabe, Bhiksha Raj

We can add this criterion as an auxiliary loss to any model that produces speech, to optimize speech outputs to match the values of clean speech in these features.

Speech Enhancement Time Series +1

Paper
Code

SoftMatch: Addressing the Quantity-Quality Trade-off in Semi-supervised Learning

4 code implementations • 26 Jan 2023 • Hao Chen, Ran Tao, Yue Fan, Yidong Wang, Jindong Wang, Bernt Schiele, Xing Xie, Bhiksha Raj, Marios Savvides

The critical challenge of Semi-Supervised Learning (SSL) is how to effectively leverage the limited labeled data and massive unlabeled data to improve the model's generalization performance.

imbalanced classification

1,273

Paper
Code

Understanding Political Polarisation using Language Models: A dataset and method

no code implementations • 2 Jan 2023 • Samiran Gode, Supreeth Bare, Bhiksha Raj, Hyungon Yoo

To understand the polarization we begin by showing results from some classical language models in Word2Vec and Doc2Vec.

Language Modelling

Paper
Add Code

Robust Referring Video Object Segmentation with Cyclic Structural Consensus

no code implementations • ICCV 2023 • Xiang Li, Jinglu Wang, Xiaohao Xu, Xiao Li, Bhiksha Raj, Yan Lu

Our model achieves state-of-the-art performance on R-VOS benchmarks, Ref-DAVIS17 and Ref-Youtube-VOS, and also our RRYTVOS dataset.

Object Referring Video Object Segmentation +2

Paper
Add Code

VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning

1 code implementation • 28 Nov 2022 • Kashu Yamazaki, Khoa Vo, Sang Truong, Bhiksha Raj, Ngan Le

Video paragraph captioning aims to generate a multi-sentence description of an untrimmed video with several temporal event locations in coherent storytelling.

Ranked #2 on Video Captioning on ActivityNet Captions

Sentence Video Captioning

Paper
Code

Panoramic Video Salient Object Detection with Ambisonic Audio Guidance

no code implementations • 26 Nov 2022 • Xiang Li, Haoyuan Cao, Shijie Zhao, Junlin Li, Li Zhang, Bhiksha Raj

In this paper, we aim to tackle the video salient object detection problem for panoramic videos, with their corresponding ambisonic audios.

Object object-detection +2

Paper
Add Code

An Embarrassingly Simple Baseline for Imbalanced Semi-Supervised Learning

no code implementations • 20 Nov 2022 • Hao Chen, Yue Fan, Yidong Wang, Jindong Wang, Bernt Schiele, Xing Xie, Marios Savvides, Bhiksha Raj

While standard SSL assumes uniform data distribution, we consider a more realistic and challenging setting called imbalanced SSL, where imbalanced class distributions occur in both labeled and unlabeled data.

Pseudo Label

Paper
Add Code

Describing emotions with acoustic property prompts for speech emotion recognition

no code implementations • 14 Nov 2022 • Hira Dhamyal, Benjamin Elizalde, Soham Deshmukh, Huaming Wang, Bhiksha Raj, Rita Singh

We investigate how the model can learn to associate the audio with the descriptions, resulting in performance improvement of Speech Emotion Recognition and Speech Audio Retrieval.

Retrieval Speech Emotion Recognition

Paper
Add Code

Unifying the Discrete and Continuous Emotion labels for Speech Emotion Recognition

no code implementations • 29 Oct 2022 • Roshan Sharma, Hira Dhamyal, Bhiksha Raj, Rita Singh

Accordingly, models that have been proposed for emotion detection use one or the other of these label types.

Multi-Task Learning Speech Emotion Recognition

Paper
Add Code

XNOR-FORMER: Learning Accurate Approximations in Long Speech Transformers

no code implementations • 29 Oct 2022 • Roshan Sharma, Bhiksha Raj

Transformers are among the state of the art for many tasks in speech, vision, and natural language processing, among others.

speech-recognition Speech Recognition

Paper
Add Code

There is more than one kind of robustness: Fooling Whisper with adversarial examples

1 code implementation • 26 Oct 2022 • Raphael Olivier, Bhiksha Raj

Whisper is a recent Automatic Speech Recognition (ASR) model displaying impressive robustness to both out-of-distribution inputs and random noise.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

Privacy-preserving Automatic Speaker Diarization

no code implementations • 26 Oct 2022 • Francisco Teixeira, Alberto Abad, Bhiksha Raj, Isabel Trancoso

Automatic Speaker Diarization (ASD) is an enabling technology with numerous applications, which deals with recordings of multiple speakers, raising special concerns in terms of privacy.

Privacy Preserving speaker-diarization +1

Paper
Add Code

AOE-Net: Entities Interactions Modeling with Adaptive Attention Mechanism for Temporal Action Proposals Generation

1 code implementation • 5 Oct 2022 • Khoa Vo, Sang Truong, Kashu Yamazaki, Bhiksha Raj, Minh-Triet Tran, Ngan Le

PMR module represents each video snippet by a visual-linguistic feature, in which main actors and surrounding environment are represented by visual information, whereas relevant objects are depicted by linguistic features through an image-text model.

Ranked #1 on Temporal Action Proposal Generation on ActivityNet-1.3

Action Detection Temporal Action Proposal Generation

Paper
Code

Watch What You Pretrain For: Targeted, Transferable Adversarial Examples on Self-Supervised Speech Recognition models

1 code implementation • 17 Sep 2022 • Raphael Olivier, Hadi Abdullah, Bhiksha Raj

To exploit ASR models in real-world, black-box settings, an adversary can leverage the transferability property, i. e. that an adversarial sample produced for a proxy ASR can also fool a different remote ASR.

Adversarial Attack Automatic Speech Recognition +3

Paper
Code

USB: A Unified Semi-supervised Learning Benchmark for Classification

5 code implementations • 12 Aug 2022 • Yidong Wang, Hao Chen, Yue Fan, Wang Sun, Ran Tao, Wenxin Hou, RenJie Wang, Linyi Yang, Zhi Zhou, Lan-Zhe Guo, Heli Qi, Zhen Wu, Yu-Feng Li, Satoshi Nakamura, Wei Ye, Marios Savvides, Bhiksha Raj, Takahiro Shinozaki, Bernt Schiele, Jindong Wang, Xing Xie, Yue Zhang

We further provide the pre-trained versions of the state-of-the-art neural models for CV tasks to make the cost affordable for further tuning.

Ranked #2 on Semi-Supervised Image Classification on CIFAR-100, 400 Labels

General Classification Semi-Supervised Image Classification

1,222

Paper
Code

Online Video Instance Segmentation via Robust Context Fusion

no code implementations • 12 Jul 2022 • Xiang Li, Jinglu Wang, Xiaohao Xu, Bhiksha Raj, Yan Lu

We propose a robust context fusion network to tackle VIS in an online fashion, which predicts instance segmentation frame-by-frame with a few preceding frames.

Instance Segmentation Segmentation +2

Paper
Add Code

How many perturbations break this model? Evaluating robustness beyond adversarial accuracy

1 code implementation • 8 Jul 2022 • Raphael Olivier, Bhiksha Raj

Finally, with sparsity we can measure increases in robustness that do not affect accuracy: we show for example that data augmentation can by itself increase adversarial robustness, without using adversarial training.

Adversarial Attack Adversarial Robustness +1

Paper
Code

Towards Robust Referring Video Object Segmentation with Cyclic Relational Consensus

1 code implementation • 4 Jul 2022 • Xiang Li, Jinglu Wang, Xiaohao Xu, Xiao Li, Bhiksha Raj, Yan Lu

Referring Video Object Segmentation (R-VOS) is a challenging task that aims to segment an object in a video based on a linguistic expression.

Ranked #11 on Referring Video Object Segmentation on Refer-YouTube-VOS

Referring Expression Segmentation Referring Video Object Segmentation +2

Paper
Code

Improving Speech Enhancement through Fine-Grained Speech Characteristics

1 code implementation • 1 Jul 2022 • Muqiao Yang, Joseph Konan, David Bick, Anurag Kumar, Shinji Watanabe, Bhiksha Raj

We first identify key acoustic parameters that have been found to correlate well with voice quality (e. g. jitter, shimmer, and spectral flux) and then propose objective functions which are aimed at reducing the difference between clean speech and enhanced speech with respect to these features.

Speech Enhancement

Paper
Code

Self-supervision and Learnable STRFs for Age, Emotion, and Country Prediction

no code implementations • 25 Jun 2022 • Roshan Sharma, Tyler Vuong, Mark Lindsey, Hira Dhamyal, Rita Singh, Bhiksha Raj

This work presents a multitask approach to the simultaneous estimation of age, country of origin, and emotion given vocal burst audio for the 2022 ICML Expressive Vocalizations Challenge ExVo-MultiTask track.

Decoder

Paper
Add Code

Towards End-to-End Private Automatic Speaker Recognition

no code implementations • 23 Jun 2022 • Francisco Teixeira, Alberto Abad, Bhiksha Raj, Isabel Trancoso

This poses two important issues: first, knowledge of the speaker embedding extraction model may create security and robustness liabilities for the authentication system, as this knowledge might help attackers in crafting adversarial examples able to mislead the system; second, from the point of view of a service provider the speaker embedding extraction model is arguably one of the most valuable components in the system and, as such, disclosing it would be highly undesirable.

Privacy Preserving Speaker Recognition +1

Paper
Add Code

Bear the Query in Mind: Visual Grounding with Query-conditioned Convolution

no code implementations • 18 Jun 2022 • Chonghan Chen, Qi Jiang, Chih-Hao Wang, Noel Chen, Haohan Wang, Xiang Li, Bhiksha Raj

With our proposed QCM, the downstream fusion module receives visual features that are more discriminative and focused on the desired object described in the expression, leading to more accurate predictions.

Visual Grounding

Paper
Add Code

FreeMatch: Self-adaptive Thresholding for Semi-supervised Learning

5 code implementations • 15 May 2022 • Yidong Wang, Hao Chen, Qiang Heng, Wenxin Hou, Yue Fan, Zhen Wu, Jindong Wang, Marios Savvides, Takahiro Shinozaki, Bhiksha Raj, Bernt Schiele, Xing Xie

Semi-supervised Learning (SSL) has witnessed great success owing to the impressive performances brought by various methods based on pseudo labeling and consistency regularization.

Ranked #1 on Semi-Supervised Image Classification on CIFAR-10, 40 Labels

Fairness Semi-Supervised Image Classification

1,273

Paper
Code

On the pragmatism of using binary classifiers over data intensive neural network classifiers for detection of COVID-19 from voice

no code implementations • 11 Apr 2022 • Ankit Shah, Hira Dhamyal, Yang Gao, Daniel Arancibia, Mario Arancibia, Bhiksha Raj, Rita Singh

Lately, there has been a global effort by multiple research groups to detect COVID-19 from voice.

Paper
Add Code

Recent improvements of ASR models in the face of adversarial attacks

1 code implementation • 29 Mar 2022 • Raphael Olivier, Bhiksha Raj

Like many other tasks involving neural networks, Speech Recognition models are vulnerable to adversarial attacks.

speech-recognition Speech Recognition

Paper
Code

Point3D: tracking actions as moving points with 3D CNNs

no code implementations • 20 Mar 2022 • Shentong Mo, Jingfei Xia, Xiaoqing Tan, Bhiksha Raj

Our Point3D consists of a Point Head for action localization and a 3D Head for action classification.

Action Classification Action Localization +1

Paper
Add Code

HEAR: Holistic Evaluation of Audio Representations

3 code implementations • 6 Mar 2022 • Joseph Turian, Jordie Shier, Humair Raj Khan, Bhiksha Raj, Björn W. Schuller, Christian J. Steinmetz, Colin Malloy, George Tzanetakis, Gissel Velarde, Kirk McNally, Max Henry, Nicolas Pinto, Camille Noufi, Christian Clough, Dorien Herremans, Eduardo Fonseca, Jesse Engel, Justin Salamon, Philippe Esling, Pranay Manocha, Shinji Watanabe, Zeyu Jin, Yonatan Bisk

The aim of the HEAR benchmark is to develop a general-purpose audio representation that provides a strong basis for learning in a wide variety of tasks and scenarios.

Open-Ended Question Answering

Paper
Code

Ontological Learning from Weak Labels

no code implementations • 4 Mar 2022 • Larry Tang, Po Hao Chou, Yi Yu Zheng, Ziqian Ge, Ankit Shah, Bhiksha Raj

We find that the baseline Siamese does not perform better by incorporating ontology information in the weak and multi-label scenario, but that the GCN does capture the ontology knowledge better for weak, multi-labeled data.

Paper
Add Code

Sequential Randomized Smoothing for Adversarially Robust Speech Recognition

1 code implementation • EMNLP 2021 • Raphael Olivier, Bhiksha Raj

We apply adaptive versions of state-of-the-art attacks, such as the Imperceptible ASR attack, to our model, and show that our strongest defense is robust to all attacks that use inaudible noise, and can only be broken with very high distortion.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

Self-Supervised 3D Face Reconstruction via Conditional Estimation

no code implementations • ICCV 2021 • Yandong Wen, Weiyang Liu, Bhiksha Raj, Rita Singh

We present a conditional estimation (CEST) framework to learn 3D facial parameters from 2D single-view images by self-supervised training from videos.

Ranked #16 on 3D Face Reconstruction on REALY

3D Face Reconstruction Disentanglement

Paper
Add Code

SphereFace Revived: Unifying Hyperspherical Face Recognition

1 code implementation • 12 Sep 2021 • Weiyang Liu, Yandong Wen, Bhiksha Raj, Rita Singh, Adrian Weller

As one of the earliest works in hyperspherical face recognition, SphereFace explicitly proposed to learn face embeddings with large inter-class angular margin.

Face Recognition

Paper
Code

The Right to Talk: An Audio-Visual Transformer Approach

1 code implementation • ICCV 2021 • Thanh-Dat Truong, Chi Nhan Duong, The De Vu, Hoang Anh Pham, Bhiksha Raj, Ngan Le, Khoa Luu

Therefore, this work introduces a new Audio-Visual Transformer approach to the problem of localization and highlighting the main speaker in both audio and visual channels of a multi-speaker conversation video in the wild.

Paper
Code

SphereFace2: Binary Classification is All You Need for Deep Face Recognition

no code implementations • ICLR 2022 • Yandong Wen, Weiyang Liu, Adrian Weller, Bhiksha Raj, Rita Singh

In this paper, we start by identifying the discrepancy between training and evaluation in the existing multi-class classification framework and then discuss the potential limitations caused by the "competitive" nature of softmax normalization.

Binary Classification Classification +2

Paper
Add Code

Controlled AutoEncoders to Generate Faces from Voices

no code implementations • 16 Jul 2021 • Hao Liang, Lulan Yu, Guikang Xu, Bhiksha Raj, Rita Singh

With this in perspective, we propose a framework to morph a target face in response to a given voice in a way that facial features are implicitly guided by learned voice-face correlation in this paper.

MORPH Retrieval

Paper
Add Code

Improving weakly supervised sound event detection with self-supervised auxiliary tasks

1 code implementation • 12 Jun 2021 • Soham Deshmukh, Bhiksha Raj, Rita Singh

To that extent, we propose a shared encoder architecture with sound event detection as a primary task and an additional secondary decoder for a self-supervised auxiliary task.

Decoder Event Detection +3

Paper
Code

Training image classifiers using Semi-Weak Label Data

no code implementations • 19 Mar 2021 • Anxiang Zhang, Ankit Shah, Bhiksha Raj

Thus, this paper introduces a novel semi-weak label learning paradigm as a middle ground to mitigate the problem.

Multiple Instance Learning

Paper
Add Code

Constant Random Perturbations Provide Adversarial Robustness with Minimal Effect on Accuracy

1 code implementation • 15 Mar 2021 • Bronya Roni Chernyak, Bhiksha Raj, Tamir Hazan, Joseph Keshet

This paper proposes an attack-independent (non-adversarial training) technique for improving adversarial robustness of neural network models, with minimal loss of standard accuracy.

Adversarial Robustness

Paper
Code

Contrast and Order Representations for Video Self-Supervised Learning

no code implementations • ICCV 2021 • Kai Hu, Jie Shao, YuAn Liu, Bhiksha Raj, Marios Savvides, Zhiqiang Shen

To address this, we present a contrast-and-order representation (CORP) framework for learning self-supervised video representations that can automatically capture both the appearance information within each frame and temporal information across different frames.

Action Recognition Self-Supervised Learning

Paper
Add Code

Is normalization indispensable for training deep neural network?

1 code implementation • NeurIPS 2020 • Jie Shao, Kai Hu, Changhu Wang, xiangyang xue, Bhiksha Raj

In this paper, we study what would happen when normalization layers are removed from the network, and show how to train deep neural networks without normalization layers and without performance degradation.

General Classification Image Classification +5

Paper
Code

FoolHD: Fooling speaker identification by Highly imperceptible adversarial Disturbances

2 code implementations • 17 Nov 2020 • Ali Shahin Shamsabadi, Francisco Sepúlveda Teixeira, Alberto Abad, Bhiksha Raj, Andrea Cavallaro, Isabel Trancoso

Speaker identification models are vulnerable to carefully designed adversarial perturbations of their input signals that induce misclassification.

Adversarial Attack Speaker Identification

Paper
Code

Masked Proxy Loss For Text-Independent Speaker Verification

1 code implementation • 9 Nov 2020 • Jiachen Lian, Aiswarya Vinod Kumar, Hira Dhamyal, Bhiksha Raj, Rita Singh

We further propose Multinomial Masked Proxy (MMP) loss to leverage the hardness of speaker pairs.

Metric Learning Speaker Recognition +2

Paper
Code

Multi-Task Learning for Interpretable Weakly Labelled Sound Event Detection

1 code implementation • 17 Aug 2020 • Soham Deshmukh, Bhiksha Raj, Rita Singh

Weakly Labelled learning has garnered lot of attention in recent years due to its potential to scale Sound Event Detection (SED) and is formulated as Multiple Instance Learning (MIL) problem.

Event Detection Multiple Instance Learning +3

Paper
Code

Exploiting Non-Linear Redundancy for Neural Model Compression

no code implementations • 28 May 2020 • Muhammad A. Shah, Raphael Olivier, Bhiksha Raj

Deploying deep learning models, comprising of non-linear combination of millions, even billions, of parameters is challenging given the memory, power and compute constraints of the real world.

Model Compression

Paper
Add Code

Automatic In-the-wild Dataset Annotation with Deep Generalized Multiple Instance Learning

no code implementations • LREC 2020 • Joana Correia, Isabel Trancoso, Bhiksha Raj

The automation of the diagnosis and monitoring of speech affecting diseases in real life situations, such as Depression or Parkinson{'}s disease, depends on the existence of rich and large datasets that resemble real life conditions, such as those collected from in-the-wild multimedia repositories like YouTube.

Multiple Instance Learning

Paper
Add Code

Face Reconstruction from Voice using Generative Adversarial Networks

1 code implementation • NeurIPS 2019 • Yandong Wen, Bhiksha Raj, Rita Singh

The network learns to generate faces from voices by matching the identities of generated faces to those of the speakers, on a training set.

Face Reconstruction

183

Paper
Code

The phonetic bases of vocal expressed emotion: natural versus acted

no code implementations • 13 Nov 2019 • Hira Dhamyal, Shahan Ali Memon, Bhiksha Raj, Rita Singh

Our tests show significant differences in the manner and choice of phonemes in acted and natural speech, concluding moderate to low validity and value in using acted speech databases for emotion classification tasks.

Emotion Classification General Classification +1

Paper
Add Code

Detecting gender differences in perception of emotion in crowdsourced data

no code implementations • 24 Oct 2019 • Shahan Ali Memon, Hira Dhamyal, Oren Wright, Daniel Justice, Vijaykumar Palat, William Boler, Bhiksha Raj, Rita Singh

While we limit ourselves to a single modality (i. e. speech), our framework is applicable to studies of emotion perception from all such loosely annotated data in general.

Paper
Add Code

Non-Determinism in Neural Networks for Adversarial Robustness

no code implementations • 26 May 2019 • Daanish Ali Khan, Linhong Li, Ninghao Sha, Zhuoran Liu, Abelino Jimenez, Bhiksha Raj, Rita Singh

Recent breakthroughs in the field of deep learning have led to advancements in a broad spectrum of tasks in computer vision, audio processing, natural language processing and other areas.

Adversarial Robustness

Paper
Add Code

Reconstructing faces from voices

1 code implementation • 25 May 2019 • Yandong Wen, Rita Singh, Bhiksha Raj

Voice profiling aims at inferring various human parameters from their speech, e. g. gender, age, etc.

183

Paper
Code

Nonlinear Semi-Parametric Models for Survival Analysis

1 code implementation • 14 May 2019 • Chirag Nagpal, Rohan Sangave, Amit Chahar, Parth Shah, Artur Dubrawski, Bhiksha Raj

Semi-parametric survival analysis methods like the Cox Proportional Hazards (CPH) regression (Cox, 1972) are a popular approach for survival analysis.

regression Survival Analysis

Paper
Code

Hierarchical Routing Mixture of Experts

no code implementations • 18 Mar 2019 • Wenbo Zhao, Yang Gao, Shahan Ali Memon, Bhiksha Raj, Rita Singh

Addressing these problems, we propose a binary tree-structured hierarchical routing mixture of experts (HRME) model that has classifiers as non-leaf node experts and simple regression models as leaf node experts.

regression

Paper
Add Code

Hide and Speak: Towards Deep Neural Networks for Speech Steganography

1 code implementation • 7 Feb 2019 • Felix Kreuk, Yossi Adi, Bhiksha Raj, Rita Singh, Joseph Keshet

Steganography is the science of hiding a secret message within an ordinary public message, which is referred to as Carrier.

Decoder

Paper
Code

Learning Sound Events From Webly Labeled Data

1 code implementation • 28th International Joint Conference on Artificial Intelligence 2019 • Anurag Kumar, Ankit Shah, Alex Hauptmann, Bhiksha Raj

In the last couple of years, weakly labeled learning for sound events has turned out to be an exciting approach for audio event detection.

Event Detection Sound Event Detection +1

Paper
Code

Higher-order Network for Action Recognition

no code implementations • 19 Nov 2018 • Kai Hu, Bhiksha Raj

Capturing spatiotemporal dynamics is an essential topic in video recognition.

Action Recognition General Classification +2

Paper
Add Code

Neural Regression Trees

no code implementations • 1 Oct 2018 • Shahan Ali Memon, Wenbo Zhao, Bhiksha Raj, Rita Singh

Regression-via-Classification (RvC) is the process of converting a regression problem to a classification one.

Classification General Classification +1

Paper
Add Code

Neural Regression Tree

no code implementations • 27 Sep 2018 • Wenbo Zhao, Shahan Ali Memon, Bhiksha Raj, Rita Singh

Regression-via-Classification (RvC) is the process of converting a regression problem to a classification one.

Classification regression

Paper
Add Code

Optimal Strategies for Matching and Retrieval Problems by Comparing Covariates

no code implementations • 12 Jul 2018 • Yandong Wen, Mahmoud Al Ismail, Bhiksha Raj, Rita Singh

In many retrieval problems, where we must retrieve one or more entries from a gallery in response to a probe, it is common practice to learn to do by directly comparing the probe and gallery entries to one another.

Retrieval

Paper
Add Code

Disjoint Mapping Network for Cross-modal Matching of Voices and Faces

no code implementations • ICLR 2019 • Yandong Wen, Mahmoud Al Ismail, Weiyang Liu, Bhiksha Raj, Rita Singh

We propose a novel framework, called Disjoint Mapping Network (DIMNet), for cross-modal biometric matching, in particular of voices and faces.

Paper
Add Code

A Closer Look at Weak Label Learning for Audio Events

1 code implementation • 24 Apr 2018 • Ankit Shah, Anurag Kumar, Alexander G. Hauptmann, Bhiksha Raj

In this work, we first describe a CNN based approach for weakly supervised training of audio events.

Audio Classification Event Detection +2

Paper
Code

Voice Impersonation using Generative Adversarial Networks

no code implementations • 19 Feb 2018 • Yang Gao, Rita Singh, Bhiksha Raj

In voice impersonation, the resultant voice must convincingly convey the impression of having been naturally produced by the target speaker, mimicking not only the pitch and other perceivable signal qualities, but also the style of the target speaker.

Sound Audio and Speech Processing

Paper
Add Code

Framework for evaluation of sound event detection in web videos

no code implementations • 2 Nov 2017 • Rohan Badlani, Ankit Shah, Benjamin Elizalde, Anurag Kumar, Bhiksha Raj

The framework crawls videos using search queries corresponding to 78 sound event labels drawn from three datasets.

Event Detection Sound Event Detection

Paper
Add Code

Be Careful What You Backpropagate: A Case For Linear Output Activations & Gradient Boosting

no code implementations • 13 Jul 2017 • Anders Oland, Aayush Bansal, Roger B. Dannenberg, Bhiksha Raj

To this end, we demonstrate faster convergence and better performance on diverse classification tasks: image classification using CIFAR-10 and ImageNet, and semantic segmentation using PASCAL VOC 2012.

Classification General Classification +2

Paper
Add Code

Deep CNN Framework for Audio Event Recognition using Weakly Labeled Web Data

no code implementations • 9 Jul 2017 • Anurag Kumar, Bhiksha Raj

We propose that learning algorithms that can exploit weak labels offer an effective method to learn from web data.

Paper
Add Code

SphereFace: Deep Hypersphere Embedding for Face Recognition

22 code implementations • CVPR 2017 • Weiyang Liu, Yandong Wen, Zhiding Yu, Ming Li, Bhiksha Raj, Le Song

This paper addresses deep face recognition (FR) problem under open-set protocol, where ideal face features are expected to have smaller maximal intra-class distance than minimal inter-class distance under a suitably chosen metric space.

Ranked #1 on Face Verification on CK+

Face Identification Face Recognition +1

1,574

Paper
Code

On the Origin of Deep Learning

no code implementations • 24 Feb 2017 • Haohan Wang, Bhiksha Raj

This paper is a review of the evolutionary history of deep learning models.

Paper
Add Code

The Incredible Shrinking Neural Network: New Perspectives on Learning Representations Through The Lens of Pruning

no code implementations • 16 Jan 2017 • Aditya Sharma, Nikolas Wolfe, Bhiksha Raj

How much can pruning algorithms teach us about the fundamentals of learning representations in neural networks?

Network Pruning

Paper
Add Code

Audio Event and Scene Recognition: A Unified Approach using Strongly and Weakly Labeled Data

no code implementations • 12 Nov 2016 • Anurag Kumar, Bhiksha Raj

In this paper we propose a novel learning framework called Supervised and Weakly Supervised Learning where the goal is to learn simultaneously from weakly and strongly labeled data.

Scene Recognition Weakly-supervised Learning

Paper
Add Code

Discovering Sound Concepts and Acoustic Relations In Text

no code implementations • 23 Sep 2016 • Anurag Kumar, Bhiksha Raj, Ndapandula Nakashole

In this paper we describe approaches for discovering acoustic concepts and relations in text.

Dependency Parsing

Paper
Add Code

An Approach for Self-Training Audio Event Detectors Using Web Data

no code implementations • 20 Sep 2016 • Benjamin Elizalde, Ankit Shah, Siddharth Dalmia, Min Hun Lee, Rohan Badlani, Anurag Kumar, Bhiksha Raj, Ian Lane

The audio event detectors are trained on the labeled audio and ran on the unlabeled audio downloaded from YouTube.

Event Detection

Paper
Add Code

Features and Kernels for Audio Event Recognition

no code implementations • 19 Jul 2016 • Anurag Kumar, Bhiksha Raj

One of the most important problems in audio event detection research is absence of benchmark results for comparison with any proposed method.

Sound Multimedia

Paper
Add Code

AudioPairBank: Towards A Large-Scale Tag-Pair-Based Audio Content Analysis

no code implementations • 13 Jul 2016 • Sebastian Sager, Benjamin Elizalde, Damian Borth, Christian Schulze, Bhiksha Raj, Ian Lane

One contribution is the previously unavailable documentation of the challenges and implications of collecting audio recordings with these type of labels.

TAG

Paper
Add Code

Classifier Risk Estimation under Limited Labeling Resources

no code implementations • 9 Jul 2016 • Anurag Kumar, Bhiksha Raj

In this paper we propose strategies for estimating performance of a classifier when labels cannot be obtained for the whole test set.

Paper
Add Code

Weakly Supervised Scalable Audio Content Analysis

no code implementations • 12 Jun 2016 • Anurag Kumar, Bhiksha Raj

Audio Event Detection is an important task for content analysis of multimedia data.

Event Detection Multiple Instance Learning +1

Paper
Add Code

Audio Event Detection using Weakly Labeled Data

no code implementations • 9 May 2016 • Anurag Kumar, Bhiksha Raj

This helps in obtaining a complete description of the recording and is notable since temporal information was never known in the first place in weakly labeled data.

Event Detection Multiple Instance Learning

Paper
Add Code

Content-based Video Indexing and Retrieval Using Corr-LDA

no code implementations • 27 Feb 2016 • Rahul Radhakrishnan Iyer, Sanjeel Parekh, Vikas Mohandoss, Anush Ramsurat, Bhiksha Raj, Rita Singh

Existing video indexing and retrieval methods on popular web-based multimedia sharing websites are based on user-provided sparse tagging.

Retrieval

Paper
Add Code

Environmental Noise Embeddings for Robust Speech Recognition

no code implementations • 11 Jan 2016 • Suyoun Kim, Bhiksha Raj, Ian Lane

We propose a novel deep neural network architecture for speech recognition that explicitly employs knowledge of the background environmental noise within a deep neural network acoustic model.

Management Multi-Task Learning +2

Paper
Add Code

Handcrafted Local Features are Convolutional Neural Networks

no code implementations • 16 Nov 2015 • Zhenzhong Lan, Shoou-I Yu, Ming Lin, Bhiksha Raj, Alexander G. Hauptmann

We approach this problem by first showing that local handcrafted features and Convolutional Neural Networks (CNNs) share the same convolution-pooling network structure.

Action Recognition Optical Flow Estimation +2

Paper
Add Code

A Survey: Time Travel in Deep Learning Space: An Introduction to Deep Learning Models and How Deep Learning Models Evolved from the Initial Ideas

no code implementations • 16 Oct 2015 • Haohan Wang, Bhiksha Raj

Further, we will also look into the development history of modelling time series data with neural networks.

Time Series Time Series Analysis

Paper
Add Code

Privacy-Preserving Multi-Document Summarization

no code implementations • 6 Aug 2015 • Luís Marujo, José Portêlo, Wang Ling, David Martins de Matos, João P. Neto, Anatole Gershman, Jaime Carbonell, Isabel Trancoso, Bhiksha Raj

State-of-the-art extractive multi-document summarization systems are usually designed without any concern about privacy issues, meaning that all documents are open to third parties.

Document Summarization Multi-Document Summarization +1

Paper
Add Code

Plagiarism Detection in Polyphonic Music using Monaural Signal Separation

no code implementations • 27 Feb 2015 • Soham De, Indradyumna Roy, Tarunima Prabhakar, Kriti Suneja, Sourish Chaudhuri, Rita Singh, Bhiksha Raj

Given the large number of new musical tracks released each year, automated approaches to plagiarism detection are essential to help us track potential violations of copyright.

General Classification

Paper
Add Code

Unsupervised Fusion Weight Learning in Multiple Classifier Systems

no code implementations • 6 Feb 2015 • Anurag Kumar, Bhiksha Raj

We also introduce a novel metric for ranking instances based on an index which depends upon the rank of weighted scores of test points among the weighted scores of training points.

Paper
Add Code

Beyond Gaussian Pyramid: Multi-skip Feature Stacking for Action Recognition

no code implementations • CVPR 2015 • Zhenzhong Lan, Ming Lin, Xuanchong Li, Alexander G. Hauptmann, Bhiksha Raj

MIFS compensates for information lost from using differential operators by recapturing information at coarse scales.

Action Recognition Event Detection +1

Paper
Add Code

Unsupervised Structure Discovery for Semantic Analysis of Audio

no code implementations • NeurIPS 2012 • Sourish Chaudhuri, Bhiksha Raj

Approaches to audio classification and retrieval tasks largely rely on detection-based discriminative models.

Audio Classification General Classification +1

Paper
Add Code

Learning Model-Based Sparsity via Projected Gradient Descent

no code implementations • 7 Sep 2012 • Sohail Bahmani, Petros T. Boufounos, Bhiksha Raj

As an example we elaborate on application of the main results to estimation in Generalized Linear Model.

Paper
Add Code

An Unsupervised Dynamic Bayesian Network Approach to Measuring Speech Style Accommodation

no code implementations • EACL 2012 • Mahaveer Jain, John McDonough, Gahgene Gweon, Bhiksha Raj, Carolyn Penstein Ros{\'e}

Paper
Add Code

Multiparty Differential Privacy via Aggregation of Locally Trained Classifiers

no code implementations • NeurIPS 2010 • Manas Pathak, Shantanu Rane, Bhiksha Raj

As increasing amounts of sensitive personal information finds its way into data repositories, it is important to develop analysis mechanisms that can derive aggregate information from these repositories without revealing information about individual data instances.

Privacy Preserving

Paper
Add Code

A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds

no code implementations • NeurIPS 2009 • Paris Smaragdis, Madhusudana Shashanka, Bhiksha Raj

In this paper we present an algorithm for separating mixed sounds from a monophonic recording.

Paper
Add Code

Sparse Overcomplete Latent Variable Decomposition of Counts Data

no code implementations • NeurIPS 2007 • Madhusudana Shashanka, Bhiksha Raj, Paris Smaragdis

An important problem in many fields is the analysis of counts data to extract meaningful latent components.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.