no code implementations • 3 Jul 2024 • Bac Nguyen, Stefan Uhlich, Fabien Cardinaux, Lukas Mauch, Marzieh Edraki, Aaron Courville
While a pre-trained vision-language model like CLIP has demonstrated remarkable zero-shot performance, further adaptation of the model to downstream tasks leads to undesirable degradation for OOD data.
no code implementations • 31 Mar 2024 • Yassir Bendou, Giulia Lioi, Bastien Pasdeloup, Lukas Mauch, Ghouthi Boukli Hacene, Fabien Cardinaux, Vincent Gripon
Namely, we propose a realistic benchmark where negative query samples are drawn from the same original dataset as positive ones, including a granularity-controlled version of iNaturalist, where negative samples are at a fixed distance in the taxonomy tree from the positive ones.
1 code implementation • 20 Jan 2024 • Reda Bensaid, Vincent Gripon, François Leduc-Primeau, Lukas Mauch, Ghouthi Boukli Hacene, Fabien Cardinaux
In recent years, the rapid evolution of computer vision has seen the emergence of various foundation models, each tailored to specific data types and tasks.
1 code implementation • 24 Nov 2023 • Yassir Bendou, Vincent Gripon, Bastien Pasdeloup, Giulia Lioi, Lukas Mauch, Fabien Cardinaux, Ghouthi Boukli Hacene
In this paper, we present a novel approach that leverages text-derived statistics to predict the mean and covariance of the visual feature distribution for each class.
no code implementations • 7 Sep 2023 • Pau Mulet Arabi, Alec Flowers, Lukas Mauch, Fabien Cardinaux
Computing gradients of an expectation with respect to the distributional parameters of a discrete distribution is a problem arising in many fields of science and engineering.
1 code implementation • 2 Jun 2023 • Fabian Kögel, Bac Nguyen, Fabien Cardinaux
State-of-the-art non-autoregressive text-to-speech (TTS) models based on FastSpeech 2 can efficiently synthesise high-fidelity and natural speech.
1 code implementation • 13 Dec 2022 • Yassir Bendou, Vincent Gripon, Bastien Pasdeloup, Lukas Mauch, Stefan Uhlich, Fabien Cardinaux, Ghouthi Boukli Hacene, Javier Alonso Garcia
Such a set is hardly available in few-shot learning scenarios, a highly disregarded shortcoming in the field.
no code implementations • 25 May 2022 • Xiaowen Jiang, Valerio Cambareri, Gianluca Agresti, Cynthia Ifeyinwa Ugwu, Adriano Simonetto, Fabien Cardinaux, Pietro Zanuttigh
We also achieve low memory footprint for weights and activations by means of mixed precision quantization-at-training techniques.
no code implementations • 21 Mar 2022 • Bac Nguyen, Fabien Cardinaux, Stefan Uhlich
Using this differentiable duration method, we introduce AutoTTS, a direct text-to-waveform speech synthesis model.
1 code implementation • 2 Jun 2021 • Bac Nguyen, Fabien Cardinaux
By disentangling the speaker identity from the speech content, NVC-Net is able to perform non-parallel traditional many-to-many voice conversion as well as zero-shot voice conversion from a short utterance of an unseen target speaker.
no code implementations • 24 Mar 2021 • Ghouthi Boukli Hacene, Lukas Mauch, Stefan Uhlich, Fabien Cardinaux
We call this procedure \textit{DNN Quantization with Attention} (DQA).
1 code implementation • 12 Feb 2021 • Takuya Narihira, Javier Alonsogarcia, Fabien Cardinaux, Akio Hayakawa, Masato Ishii, Kazunori Iwaki, Thomas Kemp, Yoshiyuki Kobayashi, Lukas Mauch, Akira Nakamura, Yukio Obuchi, Andrew Shin, Kenji Suzuki, Stephen Tiedmann, Stefan Uhlich, Takuya Yashima, Kazuki Yoshiyama
While there exist a plethora of deep learning tools and frameworks, the fast-growing complexity of the field brings new demands and challenges, such as more flexible network design, speedy computation on distributed setting, and compatibility between different tools.
no code implementations • 24 Nov 2020 • Lukas Mauch, Stephen Tiedemann, Javier Alonso Garcia, Bac Nguyen Cong, Kazuki Yoshiyama, Fabien Cardinaux, Thomas Kemp
Usually, we compute the proxy for all DNNs in the network search space and pick those that maximize the proxy as candidates for optimization.
no code implementations • 15 May 2020 • Mohammad Asif Khan, Fabien Cardinaux, Stefan Uhlich, Marc Ferras, Asja Fischer
This procedure bears the problem that the generated magnitude spectrogram may not be consistent, which is required for finding a phase such that the full spectrogram has a natural-sounding speech waveform.
no code implementations • NIPS Workshop CDNNRIA 2018 • Fabien Cardinaux, Stefan Uhlich, Kazuki Yoshiyama, Javier Alonso Garcia, Lukas Mauch, Stephen Tiedemann, Thomas Kemp, Akira Nakamura
For each layer, we learn a value dictionary and an assignment matrix to represent the network weights.
2 code implementations • ICLR 2020 • Stefan Uhlich, Lukas Mauch, Fabien Cardinaux, Kazuki Yoshiyama, Javier Alonso Garcia, Stephen Tiedemann, Thomas Kemp, Akira Nakamura
Since choosing the optimal bitwidths is not straight forward, training methods, which can learn them, are desirable.
no code implementations • 13 Nov 2018 • Fabien Cardinaux, Stefan Uhlich, Kazuki Yoshiyama, Javier Alonso García, Stephen Tiedemann, Thomas Kemp, Akira Nakamura
In this paper we introduce a training method, called look-up table quantization, LUT-Q, which learns a dictionary and assigns each weight to one of the dictionary's values.
1 code implementation • 7 Jul 2018 • Joachim Muth, Stefan Uhlich, Nathanael Perraudin, Thomas Kemp, Fabien Cardinaux, Yuki Mitsufuji
Music source separation with deep neural networks typically relies only on amplitude features.