Search Results for author: Bac Nguyen

Found 8 papers, 5 papers with code

SPARO: Selective Attention for Robust and Compositional Transformer Encodings for Vision

1 code implementation24 Apr 2024 Ankit Vani, Bac Nguyen, Samuel Lavoie, Ranjay Krishna, Aaron Courville

Using SPARO, we demonstrate improvements on downstream recognition, robustness, retrieval, and compositionality benchmarks with CLIP (up to +14% for ImageNet, +4% for SugarCrepe), and on nearest neighbors and linear probe for ImageNet with DINO (+3% each).

Towards Robust FastSpeech 2 by Modelling Residual Multimodality

1 code implementation2 Jun 2023 Fabian Kögel, Bac Nguyen, Fabien Cardinaux

State-of-the-art non-autoregressive text-to-speech (TTS) models based on FastSpeech 2 can efficiently synthesise high-fidelity and natural speech.

Efficient Training of Deep Equilibrium Models

1 code implementation23 Apr 2023 Bac Nguyen, Lukas Mauch

Deep equilibrium models (DEQs) have proven to be very powerful for learning data representations.

AutoTTS: End-to-End Text-to-Speech Synthesis through Differentiable Duration Modeling

no code implementations21 Mar 2022 Bac Nguyen, Fabien Cardinaux, Stefan Uhlich

Using this differentiable duration method, we introduce AutoTTS, a direct text-to-waveform speech synthesis model.

Speech Synthesis Text-To-Speech Synthesis

Neural Predictor for Black-Box Adversarial Attacks on Speech Recognition

1 code implementation18 Mar 2022 Marie Biolková, Bac Nguyen

Recent works have revealed the vulnerability of automatic speech recognition (ASR) models to adversarial examples (AEs), i. e., small perturbations that cause an error in the transcription of the audio signal.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

NVC-Net: End-to-End Adversarial Voice Conversion

1 code implementation2 Jun 2021 Bac Nguyen, Fabien Cardinaux

By disentangling the speaker identity from the speech content, NVC-Net is able to perform non-parallel traditional many-to-many voice conversion as well as zero-shot voice conversion from a short utterance of an unseen target speaker.

Speech Synthesis Voice Conversion

A Simple Approach for Zero-Shot Learning based on Triplet Distribution Embeddings

no code implementations29 Mar 2021 Vivek Chalumuri, Bac Nguyen

Given the semantic descriptions of classes, Zero-Shot Learning (ZSL) aims to recognize unseen classes without labeled training data by exploiting semantic information, which contains knowledge between seen and unseen classes.

Generalized Zero-Shot Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.