no code implementations • 22 Jan 2024 • Zachary Novack, Julian McAuley, Taylor Berg-Kirkpatrick, Nicholas J. Bryan
We propose Diffusion Inference-Time T-Optimization (DITTO), a general-purpose frame-work for controlling pre-trained text-to-music diffusion models at inference-time via optimizing initial noise latents.
1 code implementation • 26 Nov 2021 • Minz Won, Justin Salamon, Nicholas J. Bryan, Gautham J. Mysore, Xavier Serra
Content creators often use music to enhance their stories, as it can be a powerful tool to convey emotion.
1 code implementation • 5 Oct 2021 • Max Morrison, Zeyu Jin, Nicholas J. Bryan, Juan-Pablo Caceres, Bryan Pardo
Modifying the pitch and timing of an audio signal are fundamental audio editing operations with applications in speech manipulation, audio-visual synchronization, and singing voice editing and synthesis.
no code implementations • 28 Jul 2021 • Haici Yang, Shivani Firodiya, Nicholas J. Bryan, Minje Kim
In this work, we learn to remix music directly by re-purposing Conv-TasNet, a well-known source separation model, into two neural remixing architectures.
2 code implementations • 11 May 2021 • Marco A. Martínez Ramírez, Oliver Wang, Paris Smaragdis, Nicholas J. Bryan
We present a data-driven approach to automate audio signal processing by incorporating stateful third-party, audio effects as layers within a deep neural network.
no code implementations • 16 Feb 2021 • Max Morrison, Lucas Rencker, Zeyu Jin, Nicholas J. Bryan, Juan-Pablo Caceres, Bryan Pardo
Text-based speech editors expedite the process of editing speech recordings by permitting editing via intuitive cut, copy, and paste operations on a speech transcript.
no code implementations • 9 Aug 2020 • Jongpil Lee, Nicholas J. Bryan, Justin Salamon, Zeyu Jin, Juhan Nam
For this task, it is typically necessary to define a similarity metric to compare one recording to another.
no code implementations • 9 Aug 2020 • Jongpil Lee, Nicholas J. Bryan, Justin Salamon, Zeyu Jin, Juhan Nam
For this, we (1) outline past work on the relationship between metric learning and classification, (2) extend this relationship to multi-label data by exploring three different learning approaches and their disentangled versions, and (3) evaluate all models on four tasks (training time, similarity retrieval, auto-tagging, and triplet prediction).
no code implementations • 7 Aug 2020 • Max Morrison, Zeyu Jin, Justin Salamon, Nicholas J. Bryan, Gautham J. Mysore
Speech synthesis has recently seen significant improvements in fidelity, driven by the advent of neural vocoders and neural prosody generators.
1 code implementation • 13 Jan 2020 • Pranay Manocha, Adam Finkelstein, Zeyu Jin, Nicholas J. Bryan, Richard Zhang, Gautham J. Mysore
Assessment of many audio processing tasks relies on subjective evaluation which is time-consuming and expensive.
no code implementations • 14 Nov 2019 • Zhenyu Tang, Nicholas J. Bryan, DIngzeyu Li, Timothy R. Langlois, Dinesh Manocha
We present a new method to capture the acoustic characteristics of real-world rooms using commodity devices, and use the captured characteristics to generate similar sounding sources with virtual models.
Sound Graphics Multimedia Audio and Speech Processing