Search Results for author: Emmanouil Benetos

Found 55 papers, 30 papers with code

MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models

1 code implementation2 Aug 2024 Benno Weck, Ilaria Manco, Emmanouil Benetos, Elio Quinton, George Fazekas, Dmitry Bogdanov

Motivated by this, we introduce MuChoMusic, a benchmark for evaluating music understanding in multimodal language models focused on audio.

Multiple-choice

Can LLMs "Reason" in Music? An Evaluation of LLMs' Capability of Music Understanding and Generation

no code implementations31 Jul 2024 Ziya Zhou, Yuhang Wu, Zhiyue Wu, Xinyue Zhang, Ruibin Yuan, Yinghao Ma, Lu Wang, Emmanouil Benetos, Wei Xue, Yike Guo

Yet scant research explores the details of how these LLMs perform on advanced music understanding and conditioned generation, especially from the multi-step reasoning perspective, which is a critical aspect in the conditioned, editable, and interactive human-computer co-creation process.

Explaining models relating objects and privacy

1 code implementation2 May 2024 Alessio Xompero, Myriam Bontonou, Jean-Michel Arbona, Emmanouil Benetos, Andrea Cavallaro

To explain the decision of these models, we use feature-attribution to identify and quantify which objects (and which of their features) are more relevant to privacy classification with respect to a reference input (i. e., no objects localised in an image) predicted as public.

ComposerX: Multi-Agent Symbolic Music Composition with LLMs

1 code implementation28 Apr 2024 Qixin Deng, Qikai Yang, Ruibin Yuan, Yipeng Huang, Yi Wang, Xubo Liu, Zeyue Tian, Jiahao Pan, Ge Zhang, Hanfeng Lin, Yizhi Li, Yinghao Ma, Jie Fu, Chenghua Lin, Emmanouil Benetos, Wenwu Wang, Guangyu Xia, Wei Xue, Yike Guo

Music composition represents the creative side of humanity, and itself is a complex task that requires abilities to understand and generate information with long dependency and harmony constraints.

In-Context Learning Music Generation

Mind the Domain Gap: a Systematic Analysis on Bioacoustic Sound Event Detection

1 code implementation27 Mar 2024 Jinhua Liang, Ines Nolasco, Burooj Ghani, Huy Phan, Emmanouil Benetos, Dan Stowell

A recent development in the field is the introduction of the task known as few-shot bioacoustic sound event detection, which aims to train a versatile animal sound detector using only a small set of audio samples.

Data Augmentation Domain Adaptation +3

Generalized Multi-Source Inference for Text Conditioned Music Diffusion Models

1 code implementation18 Mar 2024 Emilian Postolache, Giorgio Mariani, Luca Cosmo, Emmanouil Benetos, Emanuele Rodolà

Multi-Source Diffusion Models (MSDM) allow for compositional musical generation tasks: generating a set of coherent sources, creating accompaniments, and performing source separation.

WavCraft: Audio Editing and Generation with Large Language Models

1 code implementation14 Mar 2024 Jinhua Liang, huan zhang, Haohe Liu, Yin Cao, Qiuqiang Kong, Xubo Liu, Wenwu Wang, Mark D. Plumbley, Huy Phan, Emmanouil Benetos

We introduce WavCraft, a collective system that leverages large language models (LLMs) to connect diverse task-specific models for audio content creation and editing.

In-Context Learning

A Data-Driven Analysis of Robust Automatic Piano Transcription

no code implementations2 Feb 2024 Drew Edwards, Simon Dixon, Emmanouil Benetos, Akira Maezawa, Yuta Kusaka

Algorithms for automatic piano transcription have improved dramatically in recent years due to new datasets and modeling techniques.

Ranked #3 on Music Transcription on MAPS (using extra training data)

Data Augmentation Music Transcription

The Song Describer Dataset: a Corpus of Audio Captions for Music-and-Language Evaluation

1 code implementation16 Nov 2023 Ilaria Manco, Benno Weck, Seungheon Doh, Minz Won, Yixiao Zhang, Dmitry Bogdanov, Yusong Wu, Ke Chen, Philip Tovstogan, Emmanouil Benetos, Elio Quinton, György Fazekas, Juhan Nam

We introduce the Song Describer dataset (SDD), a new crowdsourced corpus of high-quality audio-caption pairs, designed for the evaluation of music-and-language models.

Music Captioning Music Generation +2

ATGNN: Audio Tagging Graph Neural Network

no code implementations2 Nov 2023 Shubhr Singh, Christian J. Steinmetz, Emmanouil Benetos, Huy Phan, Dan Stowell

Deep learning models such as CNNs and Transformers have achieved impressive performance for end-to-end audio tagging.

Audio Tagging Graph Neural Network

MERTech: Instrument Playing Technique Detection Using Self-Supervised Pretrained Model With Multi-Task Finetuning

1 code implementation15 Oct 2023 Dichucheng Li, Yinghao Ma, Weixing Wei, Qiuqiang Kong, Yulun Wu, Mingjin Che, Fan Xia, Emmanouil Benetos, Wei Li

Recognizing the significance of pitch in capturing the nuances of IPTs and the importance of onset in locating IPT events, we investigate multi-task finetuning with pitch and onset detection as auxiliary tasks.

Instrument Playing Technique Detection Onset Detection +1

MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response

1 code implementation15 Sep 2023 Zihao Deng, Yinghao Ma, Yudong Liu, Rongchen Guo, Ge Zhang, Wenhu Chen, Wenhao Huang, Emmanouil Benetos

Large Language Models (LLMs) have shown immense potential in multimodal applications, yet the convergence of textual and musical domains remains not well-explored.

Caption Generation Language Modelling +1

From West to East: Who can understand the music of the others better?

1 code implementation19 Jul 2023 Charilaos Papaioannou, Emmanouil Benetos, Alexandros Potamianos

This leads to research questions on whether these models can be used to learn representations for different music cultures and styles, or whether we can build similar music audio embedding models trained on data from different cultures or styles.

Transfer Learning

On the Effectiveness of Speech Self-supervised Learning for Music

no code implementations11 Jul 2023 Yinghao Ma, Ruibin Yuan, Yizhi Li, Ge Zhang, Xingran Chen, Hanzhi Yin, Chenghua Lin, Emmanouil Benetos, Anton Ragni, Norbert Gyenge, Ruibo Liu, Gus Xia, Roger Dannenberg, Yike Guo, Jie Fu

Our findings suggest that training with music data can generally improve performance on MIR tasks, even when models are trained using paradigms designed for speech.

Information Retrieval Music Information Retrieval +2

LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT

1 code implementation29 Jun 2023 Le Zhuo, Ruibin Yuan, Jiahao Pan, Yinghao Ma, Yizhi Li, Ge Zhang, Si Liu, Roger Dannenberg, Jie Fu, Chenghua Lin, Emmanouil Benetos, Wei Xue, Yike Guo

We introduce LyricWhiz, a robust, multilingual, and zero-shot automatic lyrics transcription method achieving state-of-the-art performance on various lyrics transcription datasets, even in challenging genres such as rock and metal.

Automatic Lyrics Transcription Language Modelling +3

Few-shot Class-incremental Audio Classification Using Dynamically Expanded Classifier with Self-attention Modified Prototypes

1 code implementation31 May 2023 Yanxiong Li, Wenchang Cao, Wei Xie, Jialong Li, Emmanouil Benetos

Labeled support samples and unlabeled query samples are used to train the prototype adaptation network and update the classifier, since they are informative for audio classification.

Audio Classification

Adapting Language-Audio Models as Few-Shot Audio Learners

no code implementations28 May 2023 Jinhua Liang, Xubo Liu, Haohe Liu, Huy Phan, Emmanouil Benetos, Mark D. Plumbley, Wenwu Wang

We presented the Treff adapter, a training-efficient adapter for CLAP, to boost zero-shot classification performance by making use of a small set of labelled data.

Audio Classification Few-Shot Learning +1

Learning Music Representations with wav2vec 2.0

no code implementations27 Oct 2022 Alessandro Ragano, Emmanouil Benetos, Andrew Hines

In addition, the results are superior to the pre-trained model on speech embeddings, demonstrating that wav2vec 2. 0 pre-trained on music data can be a promising music representation model.

Music Classification

Contrastive Audio-Language Learning for Music

1 code implementation25 Aug 2022 Ilaria Manco, Emmanouil Benetos, Elio Quinton, György Fazekas

In this work, we explore cross-modal learning in an attempt to bridge audio and language in the music domain.

Audio to Text Retrieval Descriptive +4

Deep Conditional Representation Learning for Drum Sample Retrieval by Vocalisation

1 code implementation10 Apr 2022 Alejandro Delgado, Charalampos Saitis, Emmanouil Benetos, Mark Sandler

Imitating musical instruments with the human voice is an efficient way of communicating ideas between music producers, from sketching melody lines to clarifying desired sonorities.

Representation Learning Retrieval

Exploring Transformer's potential on automatic piano transcription

no code implementations8 Apr 2022 Longshen Ou, Ziyi Guo, Emmanouil Benetos, Jiqing Han, Ye Wang

Most recent research about automatic music transcription (AMT) uses convolutional neural networks and recurrent neural networks to model the mapping from music signals to symbolic notation.

Music Transcription

A Comparison of Deep Learning MOS Predictors for Speech Synthesis Quality

no code implementations5 Apr 2022 Alessandro Ragano, Emmanouil Benetos, Michael Chinen, Helard B. Martinez, Chandan K. A. Reddy, Jan Skoglund, Andrew Hines

In this paper, we evaluate several MOS predictors based on wav2vec 2. 0 and the NISQA speech quality prediction model to explore the role of the training data, the influence of the system type, and the role of cross-domain features in SSL models.

Benchmarking Self-Supervised Learning +1

Learning music audio representations via weak language supervision

1 code implementation8 Dec 2021 Ilaria Manco, Emmanouil Benetos, Elio Quinton, Gyorgy Fazekas

To address this question, we design a multimodal architecture for music and language pre-training (MuLaP) optimised via a set of proxy tasks.

Audio Classification Information Retrieval +2

Joint Scattering for Automatic Chick Call Recognition

no code implementations8 Oct 2021 Changhong Wang, Emmanouil Benetos, Shuge Wang, Elisabetta Versace

Animal vocalisations contain important information about health, emotional state, and behaviour, thus can be potentially used for animal welfare monitoring.

More for Less: Non-Intrusive Speech Quality Assessment with Limited Annotations

no code implementations19 Aug 2021 Alessandro Ragano, Emmanouil Benetos, Andrew Hines

This paper indicates that multi-task learning combined with feature representations from unlabelled data is a promising approach to deal with the lack of large MOS annotated datasets.

Clustering Deep Clustering +1

Pitch-Informed Instrument Assignment Using a Deep Convolutional Network with Multiple Kernel Shapes

no code implementations28 Jul 2021 Carlos Lordelo, Emmanouil Benetos, Simon Dixon, Sven Ahlbäck

We also include ablation studies investigating the effects of the use of multiple kernel shapes and comparing different input representations for the audio and the note-related information.

MusCaps: Generating Captions for Music Audio

1 code implementation24 Apr 2021 Ilaria Manco, Emmanouil Benetos, Elio Quinton, Gyorgy Fazekas

Content-based music information retrieval has seen rapid progress with the adoption of deep learning.

Audio captioning Classification +4

The Effect of Spectrogram Reconstruction on Automatic Music Transcription: An Alternative Approach to Improve Transcription Accuracy

2 code implementations20 Oct 2020 Kin Wai Cheuk, Yin-Jyun Luo, Emmanouil Benetos, Dorien Herremans

We attempt to use only the pitch labels (together with spectrogram reconstruction loss) and explore how far this model can go without introducing supervised sub-tasks.

Music Transcription

Reliable Local Explanations for Machine Listening

1 code implementation15 May 2020 Saumitra Mishra, Emmanouil Benetos, Bob L. Sturm, Simon Dixon

One way to analyse the behaviour of machine learning models is through local explanations that highlight input features that maximally influence model predictions.

Memory Controlled Sequential Self Attention for Sound Recognition

1 code implementation13 May 2020 Arjun Pankajakshan, Helen L. Bear, Vinod Subramanian, Emmanouil Benetos

In this paper we investigate the importance of the extent of memory in sequential self attention for sound recognition.

Event Detection Sound Event Detection

Musical Features for Automatic Music Transcription Evaluation

no code implementations15 Apr 2020 Adrien Ycart, Lele Liu, Emmanouil Benetos, Marcus T. Pearce

This technical report gives a detailed, formal description of the features introduced in the paper: Adrien Ycart, Lele Liu, Emmanouil Benetos and Marcus T. Pearce.

Information Retrieval Music Information Retrieval +2

Audio Impairment Recognition Using a Correlation-Based Feature Representation

no code implementations22 Mar 2020 Alessandro Ragano, Emmanouil Benetos, Andrew Hines

Audio impairment recognition is based on finding noise in audio files and categorising the impairment type.

Modeling plate and spring reverberation using a DSP-informed deep neural network

1 code implementation22 Oct 2019 Marco A. Martínez Ramírez, Emmanouil Benetos, Joshua D. Reiss

Plate and spring reverberators are electromechanical systems first used and researched as means to substitute real room reverberation.

A general-purpose deep learning approach to model time-varying audio effects

no code implementations15 May 2019 Marco A. Martínez Ramírez, Emmanouil Benetos, Joshua D. Reiss

Audio processors whose parameters are modified periodically over time are often referred as time-varying or modulation based audio effects.

GAN-based Generation and Automatic Selection of Explanations for Neural Networks

no code implementations21 Apr 2019 Saumitra Mishra, Daniel Stoller, Emmanouil Benetos, Bob L. Sturm, Simon Dixon

However, this requires a careful selection of hyper-parameters to generate interpretable examples for each neuron of interest, and current methods rely on a manual, qualitative evaluation of each setting, which is prohibitively slow.

Ensemble Models for Spoofing Detection in Automatic Speaker Verification

1 code implementation9 Apr 2019 Bhusan Chettri, Daniel Stoller, Veronica Morfi, Marco A. Martínez Ramírez, Emmanouil Benetos, Bob L. Sturm

Our ensemble model outperforms all our single models and the baselines from the challenge for both attack types.

Audio and Speech Processing Sound

Optimal Neural Network Feature Selection for Spatial-Temporal Forecasting

no code implementations30 Apr 2018 Eurico Covas, Emmanouil Benetos

In this paper, we show empirical evidence on how to construct the optimal feature selection or input representation used by the input layer of a feedforward neural network for the propose of forecasting spatial-temporal signals.

feature selection

Sound Event Detection in Synthetic Audio: Analysis of the DCASE 2016 Task Results

no code implementations15 Nov 2017 Grégoire Lafay, Emmanouil Benetos, Mathieu Lagrange

As part of the 2016 public evaluation challenge on Detection and Classification of Acoustic Scenes and Events (DCASE 2016), the second task focused on evaluating sound event detection systems using synthetic mixtures of office sounds.

Event Detection General Classification +1

An End-to-End Neural Network for Polyphonic Piano Music Transcription

1 code implementation7 Aug 2015 Siddharth Sigtia, Emmanouil Benetos, Simon Dixon

We compare performance of the neural network based acoustic models with two popular unsupervised acoustic models.

Language Modelling Music Transcription +2

An evaluation framework for event detection using a morphological model of acoustic scenes

no code implementations31 Jan 2015 Mathieu Lagrange, Grégoire Lafay, Mathias Rossignol, Emmanouil Benetos, Axel Roebel

This paper introduces a model of environmental acoustic scenes which adopts a morphological approach by ab-stracting temporal structures of acoustic scenes.

Event Detection

A Hybrid Recurrent Neural Network For Music Transcription

no code implementations6 Nov 2014 Siddharth Sigtia, Emmanouil Benetos, Nicolas Boulanger-Lewandowski, Tillman Weyde, Artur S. d'Avila Garcez, Simon Dixon

We investigate the problem of incorporating higher-level symbolic score-like information into Automatic Music Transcription (AMT) systems to improve their performance.

Music Transcription

Cannot find the paper you are looking for? You can Submit a new open access paper.