Search Results for author: Oriol Nieto

Found 10 papers, 4 papers with code

CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models

no code implementations • 12 Oct 2023 • Sreyan Ghosh, Ashish Seth, Sonal Kumar, Utkarsh Tyagi, Chandra Kiran Evuru, S. Ramaneswaran, S. Sakshi, Oriol Nieto, Ramani Duraiswami, Dinesh Manocha

In this paper, we propose CompA, a collection of two expert-annotated benchmarks with a majority of real-world audio samples, to evaluate compositional reasoning in ALMs.

Attribute Audio Classification +1

Paper
Add Code

Bridging High-Quality Audio and Video via Language for Sound Effects Retrieval from Visual Queries

no code implementations • 17 Aug 2023 • Julia Wilkins, Justin Salamon, Magdalena Fuentes, Juan Pablo Bello, Oriol Nieto

We show that our system, trained using our automatic data curation pipeline, significantly outperforms baselines trained on in-the-wild data on the task of HQ SFX retrieval for video.

Contrastive Learning Retrieval

Paper
Add Code

Efficient Spoken Language Recognition via Multilabel Classification

no code implementations • 2 Jun 2023 • Oriol Nieto, Zeyu Jin, Franck Dernoncourt, Justin Salamon

Spoken language recognition (SLR) is the task of automatically identifying the language present in a speech signal.

Classification

Paper
Add Code

Language-Guided Audio-Visual Source Separation via Trimodal Consistency

no code implementations • CVPR 2023 • Reuben Tan, Arijit Ray, Andrea Burns, Bryan A. Plummer, Justin Salamon, Oriol Nieto, Bryan Russell, Kate Saenko

We propose a self-supervised approach for learning to perform audio source separation in videos based on natural language queries, using only unlabeled video and audio pairs as training data.

Audio Source Separation Natural Language Queries

Paper
Add Code

Music Enhancement via Image Translation and Vocoding

no code implementations • 28 Apr 2022 • Nikhil Kandpal, Oriol Nieto, Zeyu Jin

Consumer-grade music recordings such as those captured by mobile devices typically contain distortions in the form of background noise, reverb, and microphone-induced EQ.

Image-to-Image Translation Translation

Paper
Add Code

Multimodal Metric Learning for Tag-based Music Retrieval

1 code implementation • 30 Oct 2020 • Minz Won, Sergio Oramas, Oriol Nieto, Fabien Gouyon, Xavier Serra

In this paper, we investigate three ideas to successfully introduce multimodal metric learning for tag-based music retrieval: elaborate triplet sampling, acoustic and cultural music information, and domain-specific word embeddings.

Cross-Modal Retrieval Metric Learning +4

Paper
Code

Mood Classification Using Listening Data

1 code implementation • 22 Oct 2020 • Filip Korzeniowski, Oriol Nieto, Matthew McCallum, Minz Won, Sergio Oramas, Erik Schmidt

The mood of a song is a highly relevant feature for exploration and recommendation in large collections of music.

Classification General Classification

Paper
Code

Predicting Audio Advertisement Quality

no code implementations • 9 Feb 2018 • Samaneh Ebrahimi, Hossein Vahabi, Matthew Prockup, Oriol Nieto

In these platforms, which tend to host tens of thousands of unique audio advertisements (ads), providing high quality ads ensures a better user experience and results in longer user engagement.

Paper
Add Code

End-to-end learning for music audio tagging at scale

4 code implementations • 7 Nov 2017 • Jordi Pons, Oriol Nieto, Matthew Prockup, Erik Schmidt, Andreas Ehmann, Xavier Serra

The lack of data tends to limit the outcomes of deep learning research, particularly when dealing with end-to-end learning stacks processing raw data such as waveforms.

Sound Audio and Speech Processing

296

Paper
Code

A Deep Multimodal Approach for Cold-start Music Recommendation

1 code implementation • 29 Jun 2017 • Sergio Oramas, Oriol Nieto, Mohamed Sordo, Xavier Serra

Second, track embeddings are learned from the audio signal and available feedback data.

Music Recommendation

101

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.