1 code implementation • 23 Oct 2024 • Junwon Lee, Modan Tailleur, Laurie M. Heller, Keunwoo Choi, Mathieu Lagrange, Brian McFee, Keisuke Imoto, Yuki Okamoto
Despite significant advancements in neural text-to-audio generation, challenges persist in controllability and evaluation.
1 code implementation • 19 Jan 2024 • Iran R. Roman, Christopher Ick, Sivan Ding, Adrian S. Roman, Brian McFee, Juan P. Bello
Major advancements rely on simulated data with sound events in specific rooms and strong spatio-temporal labels.
1 code implementation • 6 Sep 2023 • Christopher Ick, Brian McFee
As deeper and more complex models are developed for the task of sound event localization and detection (SELD), the demand for annotated spatial audio data continues to increase.
1 code implementation • 20 Jul 2023 • Changhong Wang, Gaël Richard, Brian McFee
This approach allows representations derived for one task to be applied to another, and can result in high accuracy with less stringent training data requirements for the downstream task.
no code implementations • 21 Jul 2022 • Keunwoo Choi, Sangshin Oh, Minsung Kang, Brian McFee
"Foley" refers to sound effects that are added to multimedia during post-production to enhance its perceived acoustic properties, e. g., by simulating the sounds of footsteps, ambient environmental sounds, or visible objects on the screen.
no code implementations • 6 Feb 2021 • Christopher Ick, Brian McFee
Recent literature has demonstrated that the use of per-channel energy normalization (PCEN), has significant performance improvements over traditional log-scaled mel-frequency spectrograms in acoustic sound event detection (SED) in a multi-class setting with overlapping events.
no code implementations • 5 Feb 2021 • Ho-Hsiang Wu, Chieh-Chi Kao, Qingming Tang, Ming Sun, Brian McFee, Juan Pablo Bello, Chao Wang
Deep learning is very data hungry, and supervised learning especially requires massive labeled data to work well.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +7
1 code implementation • 9 Sep 2020 • Helena Cuesta, Brian McFee, Emilia Gómez
This paper addresses the extraction of multiple F0 values from polyphonic and a cappella vocal performances using convolutional neural networks (CNNs).
1 code implementation • 22 Oct 2019 • Vincent Lostanlen, Sripathi Sridhar, Brian McFee, Andrew Farnsworth, Juan Pablo Bello
To explain the consonance of octaves, music psychologists represent pitch as a helix where azimuth and axial coordinate correspond to pitch class and pitch height respectively.
1 code implementation • 2 Sep 2018 • Rachel M. Bittner, Brian McFee, Juan P. Bello
Fundamental frequency (f0) estimation from polyphonic music includes the tasks of multiple-f0, melody, vocal, and bass line estimation.
2 code implementations • 26 Apr 2018 • Brian McFee, Justin Salamon, Juan Pablo Bello
In this work, we treat SED as a multiple instance learning (MIL) problem, where training labels are static over a short excerpt, indicating the presence or absence of sound sources but not their temporal locality.
no code implementations • 17 Aug 2016 • Keunwoo Choi, George Fazekas, Brian McFee, Kyunghyun Cho, Mark Sandler
Descriptions are often provided along with recommendations to help users' discovery.
no code implementations • 19 Dec 2013 • Yonatan Vaizman, Brian McFee, Gert Lanckriet
Automated recommendation systems are essential for users to discover music they love and for artists to reach appropriate audience.