1 code implementation • 17 Sep 2024 • Alexandre Défossez, Laurent Mazaré, Manu Orsini, Amélie Royer, Patrick Pérez, Hervé Jégou, Edouard Grave, Neil Zeghidour
Our resulting model is the first real-time full-duplex spoken large language model, with a theoretical latency of 160ms, 200ms in practice, and is available at https://github. com/kyutai-labs/moshi.
no code implementations • 4 Jun 2024 • Jean-Marie Lemercier, Simon Rouard, Jade Copet, Yossi Adi, Alexandre Défossez
This leads to an increase in the generated music quality when modelling the product of the marginal distributions, while generating audio much faster than the joint distribution model.
1 code implementation • 30 Jan 2024 • Robin San Roman, Pierre Fernandez, Alexandre Défossez, Teddy Furon, Tuan Tran, Hady Elsahar
In the rapidly evolving field of speech generative models, there is a pressing need to ensure audio authenticity against the risks of voice cloning.
no code implementations • 9 Jan 2024 • Alon Ziv, Itai Gat, Gael Le Lan, Tal Remez, Felix Kreuk, Alexandre Défossez, Jade Copet, Gabriel Synnaeve, Yossi Adi
We introduce MAGNeT, a masked generative sequence modeling method that operates directly over several streams of audio tokens.
2 code implementations • 24 Aug 2023 • Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Romain Sauvestre, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, Gabriel Synnaeve
We release Code Llama, a family of large language models for code based on Llama 2 providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks.
Ranked #34 on Code Generation on MBPP
2 code implementations • 14 Aug 2023 • Giorgio Fabbro, Stefan Uhlich, Chieh-Hsin Lai, Woosung Choi, Marco Martínez-Ramírez, WeiHsiang Liao, Igor Gadelha, Geraldo Ramos, Eddie Hsu, Hugo Rodrigues, Fabian-Robert Stöter, Alexandre Défossez, Yi Luo, Jianwei Yu, Dipam Chakraborty, Sharada Mohanty, Roman Solovyev, Alexander Stempkovskiy, Tatiana Habruseva, Nabarun Goswami, Tatsuya Harada, Minseok Kim, Jun Hyung Lee, Yuanliang Dong, Xinran Zhang, Jiafeng Liu, Yuki Mitsufuji
We propose a formalization of the errors that can occur in the design of a training dataset for MSS systems and introduce two new datasets that simulate such errors: SDXDB23_LabelNoise and SDXDB23_Bleeding.
5 code implementations • NeurIPS 2023 • Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi, Alexandre Défossez
We tackle the task of conditional music generation.
Ranked #7 on Text-to-Music Generation on MusicCaps
2 code implementations • 15 Nov 2022 • Simon Rouard, Francisco Massa, Alexandre Défossez
While it performs poorly when trained only on MUSDB, we show that it outperforms Hybrid Demucs (trained on the same data) by 0. 45 dB of SDR when using 800 extra training songs.
Ranked #1 on Music Source Separation on MUSDB18 (using extra training data)
4 code implementations • 24 Oct 2022 • Alexandre Défossez, Jade Copet, Gabriel Synnaeve, Yossi Adi
We introduce a state-of-the-art real-time, high-fidelity, audio codec leveraging neural networks.
1 code implementation • 30 Sep 2022 • Felix Kreuk, Gabriel Synnaeve, Adam Polyak, Uriel Singer, Alexandre Défossez, Jade Copet, Devi Parikh, Yaniv Taigman, Yossi Adi
Finally, we explore the ability of the proposed method to generate audio continuation conditionally and unconditionally.
Ranked #13 on Audio Generation on AudioCaps
1 code implementation • 25 Aug 2022 • Alexandre Défossez, Charlotte Caucheteux, Jérémy Rapin, Ori Kabeli, Jean-Rémi King
Overall, this effective decoding of perceived speech from non-invasive recordings delineates a promising path to decode language from brain activity, without putting patients at risk for brain surgery.
1 code implementation • 5 Nov 2021 • Alexandre Défossez
Source separation models either work on the spectrogram or waveform domain.
Ranked #6 on Music Source Separation on MUSDB18
1 code implementation • 31 Aug 2021 • Yuki Mitsufuji, Giorgio Fabbro, Stefan Uhlich, Fabian-Robert Stöter, Alexandre Défossez, Minseok Kim, Woosung Choi, Chin-Yun Yu, Kin-Wai Cheuk
The main differences compared with the past challenges are 1) the competition is designed to more easily allow machine learning practitioners from other disciplines to participate, 2) evaluation is done on a hidden test set created by music professionals dedicated exclusively to the challenge to assure the transparency of the challenge, i. e., the test set is not accessible from anyone except the challenge organizers, and 3) the dataset provides a wider range of music genres and involved a greater number of mixing engineers.
1 code implementation • 20 Apr 2021 • Alexandre Défossez, Yossi Adi, Gabriel Synnaeve
DiffQ is differentiable both with respect to the unquantized weights and the number of bits used.
Ranked #29 on Language Modelling on WikiText-103
no code implementations • 5 Mar 2020 • Alexandre Défossez, Léon Bottou, Francis Bach, Nicolas Usunier
We provide a simple proof of convergence covering both the Adam and Adagrad adaptive optimization algorithms when applied to smooth (possibly non-convex) objective functions with bounded gradients.
1 code implementation • 27 Nov 2019 • Alexandre Défossez, Nicolas Usunier, Léon Bottou, Francis Bach
Source separation for music is the task of isolating contributions, or stems, from different instruments recorded individually and arranged together to form a song.
Ranked #3 on Multi-task Audio Source Seperation on MTASS
1 code implementation • 3 Sep 2019 • Alexandre Défossez, Nicolas Usunier, Léon Bottou, Francis Bach
We study the problem of source separation for music using deep learning with four known sources: drums, bass, vocals and other accompaniments.
1 code implementation • NeurIPS 2018 • Alexandre Défossez, Neil Zeghidour, Nicolas Usunier, Léon Bottou, Francis Bach
On the generalization task of synthesizing notes for pairs of pitch and instrument not seen during training, SING produces audio with significantly improved perceptual quality compared to a state-of-the-art autoencoder based on WaveNet as measured by a Mean Opinion Score (MOS), and is about 32 times faster for training and 2, 500 times faster for inference.
no code implementations • 6 Nov 2017 • Alexandre Défossez, Francis Bach
We study a new aggregation operator for gradients coming from a mini-batch for stochastic gradient (SG) methods that allows a significant speed-up in the case of sparse optimization problems.
no code implementations • 29 Nov 2014 • Alexandre Défossez, Francis Bach
Our analysis leads to new insights into stochastic approximation algorithms: (a) it gives a tighter bound on the allowed step-size; (b) the generalization error may be divided into a variance term which is decaying as O(1/n), independently of the step-size $\gamma$, and a bias term that decays as O(1/$\gamma$ 2 n 2); (c) when allowing non-uniform sampling, the choice of a good sampling density depends on whether the variance or bias terms dominate.