no code implementations • 6 Feb 2024 • Geoffrey Cideron, Sertan Girgin, Mauro Verzetti, Damien Vincent, Matej Kastelic, Zalán Borsos, Brian McWilliams, Victor Ungureanu, Olivier Bachem, Olivier Pietquin, Matthieu Geist, Léonard Hussenot, Neil Zeghidour, Andrea Agostinelli
MusicRL is a pretrained autoregressive MusicLM (Agostinelli et al., 2023) model of discrete audio tokens finetuned with reinforcement learning to maximise sequence-level rewards.
no code implementations • 21 Aug 2023 • Hakan Erdogan, Scott Wisdom, Xuankai Chang, Zalán Borsos, Marco Tagliasacchi, Neil Zeghidour, John R. Hershey
The model operates on transcripts and audio token sequences and achieves multiple tasks through masking of inputs.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 22 Jun 2023 • Paul K. Rubenstein, Chulayuth Asawaroengchai, Duc Dung Nguyen, Ankur Bapna, Zalán Borsos, Félix de Chaumont Quitry, Peter Chen, Dalia El Badawy, Wei Han, Eugene Kharitonov, Hannah Muckenhirn, Dirk Padfield, James Qin, Danny Rozenberg, Tara Sainath, Johan Schalkwyk, Matt Sharifi, Michelle Tadmor, Ramanovich, Marco Tagliasacchi, Alexandru Tudor, Mihajlo Velimirović, Damien Vincent, Jiahui Yu, Yongqiang Wang, Vicky Zayats, Neil Zeghidour, Yu Zhang, Zhishuai Zhang, Lukas Zilka, Christian Frank
AudioPaLM inherits the capability to preserve paralinguistic information such as speaker identity and intonation from AudioLM and the linguistic knowledge present only in text large language models such as PaLM-2.
1 code implementation • 16 May 2023 • Zalán Borsos, Matt Sharifi, Damien Vincent, Eugene Kharitonov, Neil Zeghidour, Marco Tagliasacchi
We present SoundStorm, a model for efficient, non-autoregressive audio generation.
3 code implementations • 26 Jan 2023 • Andrea Agostinelli, Timo I. Denk, Zalán Borsos, Jesse Engel, Mauro Verzetti, Antoine Caillon, Qingqing Huang, Aren Jansen, Adam Roberts, Marco Tagliasacchi, Matt Sharifi, Neil Zeghidour, Christian Frank
We introduce MusicLM, a model generating high-fidelity music from text descriptions such as "a calming violin melody backed by a distorted guitar riff".
Ranked #8 on Text-to-Music Generation on MusicCaps
5 code implementations • 7 Sep 2022 • Zalán Borsos, Raphaël Marinier, Damien Vincent, Eugene Kharitonov, Olivier Pietquin, Matt Sharifi, Dominik Roblek, Olivier Teboul, David Grangier, Marco Tagliasacchi, Neil Zeghidour
We introduce AudioLM, a framework for high-quality audio generation with long-term consistency.
no code implementations • 29 Mar 2022 • Ahmed Omran, Neil Zeghidour, Zalán Borsos, Félix de Chaumont Quitry, Malcolm Slaney, Marco Tagliasacchi
We present a method to separate speech signals from noisy environments in the embedding space of a neural audio codec.
no code implementations • 15 Feb 2022 • Zalán Borsos, Matt Sharifi, Marco Tagliasacchi
We propose SpeechPainter, a model for filling in gaps of up to one second in speech samples by leveraging an auxiliary textual input.
no code implementations • 26 Sep 2021 • Zalán Borsos, Mojmír Mutný, Marco Tagliasacchi, Andreas Krause
We show the effectiveness of our framework for a wide range of models in various settings, including training non-convex models online and batch active learning.
1 code implementation • 19 Oct 2020 • Zalán Borsos, Yunpeng Li, Beat Gfeller, Marco Tagliasacchi
A crucial aspect for the successful deployment of audio-based models "in-the-wild" is the robustness to the transformations introduced by heterogeneous acquisition conditions.
1 code implementation • 19 Oct 2020 • Zalán Borsos, Marco Tagliasacchi, Andreas Krause
Active learning is an effective technique for reducing the labeling cost by improving data efficiency.
1 code implementation • NeurIPS 2020 • Zalán Borsos, Mojmír Mutný, Andreas Krause
Coresets are small data summaries that are sufficient for model training.
no code implementations • 19 Jun 2019 • Zalán Borsos, Andrey Khorlin, Andrea Gesmundo
Recent advances in Neural Architecture Search (NAS) have produced state-of-the-art architectures on several tasks.
1 code implementation • 29 Mar 2019 • Zalán Borsos, Sebastian Curi, Kfir. Y. Levy, Andreas Krause
Adaptive importance sampling for stochastic optimization is a promising approach that offers improved convergence through variance reduction.
no code implementations • 22 Nov 2018 • Bianca-Cristina Cristescu, Zalán Borsos, John Lygeros, María Rodríguez Martínez, Maria Anna Rapsomaniki
In this work, we explore the idea of manifold learning for the 3D chromatin structure inference and present a novel method, REcurrent Autoencoders for CHromatin 3D structure prediction (REACH-3D).
2 code implementations • 13 Feb 2018 • Zalán Borsos, Andreas Krause, Kfir. Y. Levy
Modern stochastic optimization methods often rely on uniform sampling which is agnostic to the underlying characteristics of the data.