no code implementations • 12 Oct 2023 • Sreyan Ghosh, Ashish Seth, Sonal Kumar, Utkarsh Tyagi, Chandra Kiran Evuru, S. Ramaneswaran, S. Sakshi, Oriol Nieto, Ramani Duraiswami, Dinesh Manocha
In this paper, we propose CompA, a collection of two expert-annotated benchmarks with a majority of real-world audio samples, to evaluate compositional reasoning in ALMs.
no code implementations • 17 Aug 2023 • Julia Wilkins, Justin Salamon, Magdalena Fuentes, Juan Pablo Bello, Oriol Nieto
We show that our system, trained using our automatic data curation pipeline, significantly outperforms baselines trained on in-the-wild data on the task of HQ SFX retrieval for video.
no code implementations • 2 Jun 2023 • Oriol Nieto, Zeyu Jin, Franck Dernoncourt, Justin Salamon
Spoken language recognition (SLR) is the task of automatically identifying the language present in a speech signal.
no code implementations • CVPR 2023 • Reuben Tan, Arijit Ray, Andrea Burns, Bryan A. Plummer, Justin Salamon, Oriol Nieto, Bryan Russell, Kate Saenko
We propose a self-supervised approach for learning to perform audio source separation in videos based on natural language queries, using only unlabeled video and audio pairs as training data.
no code implementations • 28 Apr 2022 • Nikhil Kandpal, Oriol Nieto, Zeyu Jin
Consumer-grade music recordings such as those captured by mobile devices typically contain distortions in the form of background noise, reverb, and microphone-induced EQ.
1 code implementation • 30 Oct 2020 • Minz Won, Sergio Oramas, Oriol Nieto, Fabien Gouyon, Xavier Serra
In this paper, we investigate three ideas to successfully introduce multimodal metric learning for tag-based music retrieval: elaborate triplet sampling, acoustic and cultural music information, and domain-specific word embeddings.
1 code implementation • 22 Oct 2020 • Filip Korzeniowski, Oriol Nieto, Matthew McCallum, Minz Won, Sergio Oramas, Erik Schmidt
The mood of a song is a highly relevant feature for exploration and recommendation in large collections of music.
no code implementations • 9 Feb 2018 • Samaneh Ebrahimi, Hossein Vahabi, Matthew Prockup, Oriol Nieto
In these platforms, which tend to host tens of thousands of unique audio advertisements (ads), providing high quality ads ensures a better user experience and results in longer user engagement.
4 code implementations • 7 Nov 2017 • Jordi Pons, Oriol Nieto, Matthew Prockup, Erik Schmidt, Andreas Ehmann, Xavier Serra
The lack of data tends to limit the outcomes of deep learning research, particularly when dealing with end-to-end learning stacks processing raw data such as waveforms.
Sound Audio and Speech Processing
1 code implementation • 29 Jun 2017 • Sergio Oramas, Oriol Nieto, Mohamed Sordo, Xavier Serra
Second, track embeddings are learned from the audio signal and available feedback data.