Search Results for author: Shehzeen Hussain

Found 16 papers, 5 papers with code

Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance

no code implementations7 Feb 2025 Shehzeen Hussain, Paarth Neekhara, Xuesong Yang, Edresson Casanova, Subhankar Ghosh, Mikyas T. Desta, Roy Fejgin, Rafael Valle, Jason Li

While autoregressive speech token generation models produce speech with remarkable variety and naturalness, their inherent lack of controllability often results in issues such as hallucinations and undesired vocalizations that do not conform to conditioning inputs.

Automatic Speech Recognition Decoder +3

Low Frame-rate Speech Codec: a Codec Designed for Fast High-quality Speech LLM Training and Inference

no code implementations18 Sep 2024 Edresson Casanova, Ryan Langman, Paarth Neekhara, Shehzeen Hussain, Jason Li, Subhankar Ghosh, Ante Jukić, Sang-gil Lee

Large language models (LLMs) have significantly advanced audio processing through audio codecs that convert audio into discrete tokens, enabling the application of language modeling techniques to audio data.

Audio Compression Language Modeling +3

Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment

no code implementations25 Jun 2024 Paarth Neekhara, Shehzeen Hussain, Subhankar Ghosh, Jason Li, Rafael Valle, Rohan Badlani, Boris Ginsburg

Large Language Model (LLM) based text-to-speech (TTS) systems have demonstrated remarkable capabilities in handling large speech datasets and generating natural speech for new speakers.

Decoder Language Modeling +4

SelfVC: Voice Conversion With Iterative Refinement using Self Transformations

no code implementations14 Oct 2023 Paarth Neekhara, Shehzeen Hussain, Rafael Valle, Boris Ginsburg, Rishabh Ranjan, Shlomo Dubnov, Farinaz Koushanfar, Julian McAuley

In this work, instead of explicitly disentangling attributes with loss terms, we present a framework to train a controllable voice conversion model on entangled speech representations derived from self-supervised learning (SSL) and speaker verification models.

Self-Supervised Learning Speaker Verification +2

FastStamp: Accelerating Neural Steganography and Digital Watermarking of Images on FPGAs

no code implementations26 Sep 2022 Shehzeen Hussain, Nojan Sheybani, Paarth Neekhara, Xinqiao Zhang, Javier Duarte, Farinaz Koushanfar

In this work, we design the first accelerator platform FastStamp to perform DNN based steganography and digital watermarking of images on hardware.

Image Steganography

ReFace: Real-time Adversarial Attacks on Face Recognition Systems

no code implementations9 Jun 2022 Shehzeen Hussain, Todd Huster, Chris Mesterharm, Paarth Neekhara, Kevin An, Malhar Jere, Harshvardhan Sikka, Farinaz Koushanfar

We find that the white-box attack success rate of a pure U-Net ATN falls substantially short of gradient-based attacks like PGD on large face recognition datasets.

Face Identification Face Recognition +1

FaceSigns: Semi-Fragile Neural Watermarks for Media Authentication and Countering Deepfakes

1 code implementation5 Apr 2022 Paarth Neekhara, Shehzeen Hussain, Xinqiao Zhang, Ke Huang, Julian McAuley, Farinaz Koushanfar

We demonstrate that FaceSigns can embed a 128 bit secret as an imperceptible image watermark that can be recovered with a high bit recovery accuracy at several compression levels, while being non-recoverable when unseen Deepfake manipulations are applied.

Face Swapping Image Compression +1

Multi-task Voice Activated Framework using Self-supervised Learning

no code implementations3 Oct 2021 Shehzeen Hussain, Van Nguyen, Shuhua Zhang, Erik Visser

Finally, we extend our framework to perform multi-task learning by jointly optimizing the network parameters on multiple voice activated tasks using a shared transformer backbone.

Emotion Classification Keyword Spotting +5

Cross-modal Adversarial Reprogramming

1 code implementation15 Feb 2021 Paarth Neekhara, Shehzeen Hussain, Jinglong Du, Shlomo Dubnov, Farinaz Koushanfar, Julian McAuley

Recent works on adversarial reprogramming have shown that it is possible to repurpose neural networks for alternate tasks without modifying the network architecture or parameters.

Classification General Classification +1

Expressive Neural Voice Cloning

no code implementations30 Jan 2021 Paarth Neekhara, Shehzeen Hussain, Shlomo Dubnov, Farinaz Koushanfar, Julian McAuley

In this work, we propose a controllable voice cloning method that allows fine-grained control over various style aspects of the synthesized speech for an unseen speaker.

Speech Synthesis Style Transfer +2

FastWave: Accelerating Autoregressive Convolutional Neural Networks on FPGA

no code implementations9 Feb 2020 Shehzeen Hussain, Mojan Javaheripi, Paarth Neekhara, Ryan Kastner, Farinaz Koushanfar

While WaveNet produces state-of-the art audio generation results, the naive inference implementation is quite slow; it takes a few minutes to generate just one second of audio on a high-end GPU.

Audio Generation Audio Synthesis +4

Universal Adversarial Perturbations for Speech Recognition Systems

no code implementations9 May 2019 Paarth Neekhara, Shehzeen Hussain, Prakhar Pandey, Shlomo Dubnov, Julian McAuley, Farinaz Koushanfar

In this work, we demonstrate the existence of universal adversarial audio perturbations that cause mis-transcription of audio signals by automatic speech recognition (ASR) systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Adversarial Reprogramming of Text Classification Neural Networks

1 code implementation IJCNLP 2019 Paarth Neekhara, Shehzeen Hussain, Shlomo Dubnov, Farinaz Koushanfar

Adversarial Reprogramming has demonstrated success in utilizing pre-trained neural network classifiers for alternative classification tasks without modification to the original network.

General Classification text-classification +1

Cannot find the paper you are looking for? You can Submit a new open access paper.