no code implementations • 7 Feb 2025 • Shehzeen Hussain, Paarth Neekhara, Xuesong Yang, Edresson Casanova, Subhankar Ghosh, Mikyas T. Desta, Roy Fejgin, Rafael Valle, Jason Li
While autoregressive speech token generation models produce speech with remarkable variety and naturalness, their inherent lack of controllability often results in issues such as hallucinations and undesired vocalizations that do not conform to conditioning inputs.
no code implementations • 18 Sep 2024 • Edresson Casanova, Ryan Langman, Paarth Neekhara, Shehzeen Hussain, Jason Li, Subhankar Ghosh, Ante Jukić, Sang-gil Lee
Large language models (LLMs) have significantly advanced audio processing through audio codecs that convert audio into discrete tokens, enabling the application of language modeling techniques to audio data.
no code implementations • 25 Jun 2024 • Paarth Neekhara, Shehzeen Hussain, Subhankar Ghosh, Jason Li, Rafael Valle, Rohan Badlani, Boris Ginsburg
Large Language Model (LLM) based text-to-speech (TTS) systems have demonstrated remarkable capabilities in handling large speech datasets and generating natural speech for new speakers.
no code implementations • 14 Oct 2023 • Paarth Neekhara, Shehzeen Hussain, Rafael Valle, Boris Ginsburg, Rishabh Ranjan, Shlomo Dubnov, Farinaz Koushanfar, Julian McAuley
In this work, instead of explicitly disentangling attributes with loss terms, we present a framework to train a controllable voice conversion model on entangled speech representations derived from self-supervised learning (SSL) and speaker verification models.
no code implementations • 16 Feb 2023 • Shehzeen Hussain, Paarth Neekhara, Jocelyn Huang, Jason Li, Boris Ginsburg
In this work, we propose a zero-shot voice conversion method using speech representations trained with self-supervised learning.
no code implementations • 26 Sep 2022 • Shehzeen Hussain, Nojan Sheybani, Paarth Neekhara, Xinqiao Zhang, Javier Duarte, Farinaz Koushanfar
In this work, we design the first accelerator platform FastStamp to perform DNN based steganography and digital watermarking of images on hardware.
no code implementations • 9 Jun 2022 • Shehzeen Hussain, Todd Huster, Chris Mesterharm, Paarth Neekhara, Kevin An, Malhar Jere, Harshvardhan Sikka, Farinaz Koushanfar
We find that the white-box attack success rate of a pure U-Net ATN falls substantially short of gradient-based attacks like PGD on large face recognition datasets.
1 code implementation • 5 Apr 2022 • Paarth Neekhara, Shehzeen Hussain, Xinqiao Zhang, Ke Huang, Julian McAuley, Farinaz Koushanfar
We demonstrate that FaceSigns can embed a 128 bit secret as an imperceptible image watermark that can be recovered with a high bit recovery accuracy at several compression levels, while being non-recoverable when unseen Deepfake manipulations are applied.
no code implementations • 3 Oct 2021 • Shehzeen Hussain, Van Nguyen, Shuhua Zhang, Erik Visser
Finally, we extend our framework to perform multi-task learning by jointly optimizing the network parameters on multiple voice activated tasks using a shared transformer backbone.
Ranked #21 on
Speaker Verification
on VoxCeleb
1 code implementation • 4 Mar 2021 • Shehzeen Hussain, Paarth Neekhara, Shlomo Dubnov, Julian McAuley, Farinaz Koushanfar
There has been a recent surge in adversarial attacks on deep learning based automatic speech recognition (ASR) systems.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
1 code implementation • 15 Feb 2021 • Paarth Neekhara, Shehzeen Hussain, Jinglong Du, Shlomo Dubnov, Farinaz Koushanfar, Julian McAuley
Recent works on adversarial reprogramming have shown that it is possible to repurpose neural networks for alternate tasks without modifying the network architecture or parameters.
no code implementations • 30 Jan 2021 • Paarth Neekhara, Shehzeen Hussain, Shlomo Dubnov, Farinaz Koushanfar, Julian McAuley
In this work, we propose a controllable voice cloning method that allows fine-grained control over various style aspects of the synthesized speech for an unseen speaker.
no code implementations • 9 Feb 2020 • Shehzeen Hussain, Mojan Javaheripi, Paarth Neekhara, Ryan Kastner, Farinaz Koushanfar
While WaveNet produces state-of-the art audio generation results, the naive inference implementation is quite slow; it takes a few minutes to generate just one second of audio on a high-end GPU.
1 code implementation • 9 Feb 2020 • Shehzeen Hussain, Paarth Neekhara, Malhar Jere, Farinaz Koushanfar, Julian McAuley
Recent advances in video manipulation techniques have made the generation of fake videos more accessible than ever before.
no code implementations • 9 May 2019 • Paarth Neekhara, Shehzeen Hussain, Prakhar Pandey, Shlomo Dubnov, Julian McAuley, Farinaz Koushanfar
In this work, we demonstrate the existence of universal adversarial audio perturbations that cause mis-transcription of audio signals by automatic speech recognition (ASR) systems.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
1 code implementation • IJCNLP 2019 • Paarth Neekhara, Shehzeen Hussain, Shlomo Dubnov, Farinaz Koushanfar
Adversarial Reprogramming has demonstrated success in utilizing pre-trained neural network classifiers for alternative classification tasks without modification to the original network.