no code implementations • 28 Nov 2022 • Sander Dieleman, Laurent Sartran, Arman Roshannai, Nikolay Savinov, Yaroslav Ganin, Pierre H. Richemond, Arnaud Doucet, Robin Strudel, Chris Dyer, Conor Durkan, Curtis Hawthorne, Rémi Leblond, Will Grathwohl, Jonas Adler
Diffusion models have quickly become the go-to paradigm for generative modelling of perceptual signals (such as images and sound) through iterative refinement.
1 code implementation • 28 Sep 2022 • Yusong Wu, Josh Gardner, Ethan Manilow, Ian Simon, Curtis Hawthorne, Jesse Engel
We call this system the Chamber Ensemble Generator (CEG), and use it to generate a large dataset of chorales from four different chamber ensembles (CocoChorales).
Ranked #3 on Multi-instrument Music Transcription on Slakh2100
Information Retrieval Multi-instrument Music Transcription +2
1 code implementation • 11 Jun 2022 • Curtis Hawthorne, Ian Simon, Adam Roberts, Neil Zeghidour, Josh Gardner, Ethan Manilow, Jesse Engel
An ideal music synthesizer should be both interactive and expressive, generating high-fidelity audio in realtime for arbitrary combinations of instruments and notes.
3 code implementations • 31 Mar 2022 • Adam Roberts, Hyung Won Chung, Anselm Levskaya, Gaurav Mishra, James Bradbury, Daniel Andor, Sharan Narang, Brian Lester, Colin Gaffney, Afroz Mohiuddin, Curtis Hawthorne, Aitor Lewkowycz, Alex Salcianu, Marc van Zee, Jacob Austin, Sebastian Goodman, Livio Baldini Soares, Haitang Hu, Sasha Tsvyashchenko, Aakanksha Chowdhery, Jasmijn Bastings, Jannis Bulian, Xavier Garcia, Jianmo Ni, Andrew Chen, Kathleen Kenealy, Jonathan H. Clark, Stephan Lee, Dan Garrette, James Lee-Thorp, Colin Raffel, Noam Shazeer, Marvin Ritter, Maarten Bosma, Alexandre Passos, Jeremy Maitin-Shepard, Noah Fiedel, Mark Omernick, Brennan Saeta, Ryan Sepassi, Alexander Spiridonov, Joshua Newlan, Andrea Gesmundo
Recent neural network-based language models have benefited greatly from scaling up the size of training datasets and the number of parameters in the models themselves.
3 code implementations • 15 Feb 2022 • Curtis Hawthorne, Andrew Jaegle, Cătălina Cangea, Sebastian Borgeaud, Charlie Nash, Mateusz Malinowski, Sander Dieleman, Oriol Vinyals, Matthew Botvinick, Ian Simon, Hannah Sheahan, Neil Zeghidour, Jean-Baptiste Alayrac, João Carreira, Jesse Engel
Real-world data is high-dimensional: a book, image, or musical performance can easily contain hundreds of thousands of elements even after compression.
Ranked #35 on Language Modelling on WikiText-103
3 code implementations • ICLR 2022 • Josh Gardner, Ian Simon, Ethan Manilow, Curtis Hawthorne, Jesse Engel
Automatic Music Transcription (AMT), inferring musical notes from raw audio, is a challenging task at the core of music understanding.
Ranked #2 on Music Transcription on URMP (using extra training data)
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
2 code implementations • 19 Jul 2021 • Curtis Hawthorne, Ian Simon, Rigel Swavely, Ethan Manilow, Jesse Engel
Automatic Music Transcription has seen significant progress in recent years by training custom deep neural networks on large datasets.
1 code implementation • 30 Mar 2021 • Gautam Mittal, Jesse Engel, Curtis Hawthorne, Ian Simon
Score-based generative models and diffusion probabilistic models have been successful at generating high-quality samples in continuous domains such as images and audio.
1 code implementation • 1 Apr 2020 • Lee Callender, Curtis Hawthorne, Jesse Engel
We introduce the Expanded Groove MIDI dataset (E-GMD), an automatic drum transcription (ADT) dataset that contains 444 hours of audio from 43 drum kits, making it an order of magnitude larger than similar datasets, and the first with human-performed velocity annotations.
no code implementations • ICML 2020 • Kristy Choi, Curtis Hawthorne, Ian Simon, Monica Dinculescu, Jesse Engel
We consider the problem of learning high-level controls over the global structure of generated sequences, particularly in the context of symbolic music generation with complex language models.
no code implementations • 14 Jul 2019 • Cheng-Zhi Anna Huang, Curtis Hawthorne, Adam Roberts, Monica Dinculescu, James Wexler, Leon Hong, Jacob Howcroft
To make music composition more approachable, we designed the first AI-powered Google Doodle, the Bach Doodle, where users can create their own melody and have it harmonized by a machine learning model Coconet (Huang et al., 2017) in the style of Bach.
4 code implementations • ICLR 2019 • Curtis Hawthorne, Andriy Stasyuk, Adam Roberts, Ian Simon, Cheng-Zhi Anna Huang, Sander Dieleman, Erich Elsen, Jesse Engel, Douglas Eck
Generating musical audio directly with neural networks is notoriously difficult because it requires coherently modeling structure at many different timescales.
12 code implementations • ICLR 2019 • Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Noam Shazeer, Ian Simon, Curtis Hawthorne, Andrew M. Dai, Matthew D. Hoffman, Monica Dinculescu, Douglas Eck
This is impractical for long sequences such as musical compositions since their memory complexity for intermediate relative information is quadratic in the sequence length.
Ranked #3 on Music Modeling on JSB Chorales
1 code implementation • 1 Jun 2018 • Ian Simon, Adam Roberts, Colin Raffel, Jesse Engel, Curtis Hawthorne, Douglas Eck
Discovering and exploring the underlying structure of multi-instrumental music using learning-based approaches remains an open problem.
7 code implementations • ICML 2018 • Adam Roberts, Jesse Engel, Colin Raffel, Curtis Hawthorne, Douglas Eck
The Variational Autoencoder (VAE) has proven to be an effective model for producing semantically meaningful latent representations for natural data.
1 code implementation • 30 Oct 2017 • Curtis Hawthorne, Erich Elsen, Jialin Song, Adam Roberts, Ian Simon, Colin Raffel, Jesse Engel, Sageev Oore, Douglas Eck
We advance the state of the art in polyphonic piano music transcription by using a deep convolutional and recurrent neural network which is trained to jointly predict onsets and frames.