no code implementations • 14 Oct 2024 • Yingahao Aaron Li, Rithesh Kumar, Zeyu Jin
Diffusion models have demonstrated significant potential in speech synthesis tasks, including text-to-speech (TTS) and voice cloning.
1 code implementation • 10 Jul 2023 • Hugo Flores Garcia, Prem Seetharaman, Rithesh Kumar, Bryan Pardo
We introduce VampNet, a masked acoustic token modeling approach to music synthesis, compression, inpainting, and variation.
4 code implementations • NeurIPS 2023 • Rithesh Kumar, Prem Seetharaman, Alejandro Luebs, Ishaan Kumar, Kundan Kumar
Language models have been successfully used to model natural signals, such as images, speech, and music.
1 code implementation • ICLR 2022 • Max Morrison, Rithesh Kumar, Kundan Kumar, Prem Seetharaman, Aaron Courville, Yoshua Bengio
We show that simple pitch and periodicity conditioning is insufficient for reducing this error relative to using autoregression.
no code implementations • 22 Oct 2020 • Rithesh Kumar, Kundan Kumar, Vicki Anand, Yoshua Bengio, Aaron Courville
In this paper, we propose NU-GAN, a new method for resampling audio from lower to higher sampling rates (upsampling).
21 code implementations • NeurIPS 2019 • Kundan Kumar, Rithesh Kumar, Thibault de Boissiere, Lucas Gestin, Wei Zhen Teoh, Jose Sotelo, Alexandre de Brebisson, Yoshua Bengio, Aaron Courville
In this paper, we show that it is possible to train GANs reliably to generate high quality coherent waveforms by introducing a set of architectural changes and simple training techniques.
no code implementations • ICLR 2019 • Rithesh Kumar, Anirudh Goyal, Aaron Courville, Yoshua Bengio
Unsupervised learning is about capturing dependencies between variables and is driven by the contrast between the probable vs improbable configurations of these variables, often either via a generative model which only samples probable ones or with an energy function (unnormalized log-density) which is low for probable ones and high for improbable ones.
2 code implementations • 24 Jan 2019 • Rithesh Kumar, Sherjil Ozair, Anirudh Goyal, Aaron Courville, Yoshua Bengio
Maximum likelihood estimation of energy-based models is a challenging problem due to the intractability of the log-likelihood gradient.
1 code implementation • 18 Nov 2018 • Kyle Kastner, Rithesh Kumar, Tim Cooijmans, Aaron Courville
We demonstrate a conditional autoregressive pipeline for efficient music recomposition, based on methods presented in van den Oord et al.(2017).
1 code implementation • 6 Dec 2017 • Rithesh Kumar, Jose Sotelo, Kundan Kumar, Alexandre de Brebisson, Yoshua Bengio
We present ObamaNet, the first architecture that generates both audio and synchronized photo-realistic lip-sync videos from any new text.
4 code implementations • 22 Dec 2016 • Soroush Mehri, Kundan Kumar, Ishaan Gulrajani, Rithesh Kumar, Shubham Jain, Jose Sotelo, Aaron Courville, Yoshua Bengio
In this paper we propose a novel model for unconditional audio generation based on generating one audio sample at a time.
Ranked #1 on
Speech Synthesis
on Blizzard Challenge 2013