Search Results for author: Emmanouil Benetos

Found 47 papers, 23 papers with code

MuPT: A Generative Symbolic Music Pretrained Transformer

no code implementations • 9 Apr 2024 • Xingwei Qu, Yuelin Bai, Yinghao Ma, Ziya Zhou, Ka Man Lo, Jiaheng Liu, Ruibin Yuan, Lejun Min, Xueling Liu, Tianyu Zhang, Xinrun Du, Shuyue Guo, Yiming Liang, Yizhi Li, Shangda Wu, Junting Zhou, Tianyu Zheng, Ziyang Ma, Fengze Han, Wei Xue, Gus Xia, Emmanouil Benetos, Xiang Yue, Chenghua Lin, Xu Tan, Stephen W. Huang, Wenhu Chen, Jie Fu, Ge Zhang

In this paper, we explore the application of Large Language Models (LLMs) to the pre-training of music.

Music Generation Music Modeling

Paper
Add Code

Mind the Domain Gap: a Systematic Analysis on Bioacoustic Sound Event Detection

1 code implementation • 27 Mar 2024 • Jinhua Liang, Ines Nolasco, Burooj Ghani, Huy Phan, Emmanouil Benetos, Dan Stowell

A recent development in the field is the introduction of the task known as few-shot bioacoustic sound event detection, which aims to train a versatile animal sound detector using only a small set of audio samples.

Data Augmentation Domain Adaptation +3

Paper
Code

Generalized Multi-Source Inference for Text Conditioned Music Diffusion Models

1 code implementation • 18 Mar 2024 • Emilian Postolache, Giorgio Mariani, Luca Cosmo, Emmanouil Benetos, Emanuele Rodolà

Multi-Source Diffusion Models (MSDM) allow for compositional musical generation tasks: generating a set of coherent sources, creating accompaniments, and performing source separation.

Paper
Code

WavCraft: Audio Editing and Generation with Natural Language Prompts

1 code implementation • 14 Mar 2024 • Jinhua Liang, huan zhang, Haohe Liu, Yin Cao, Qiuqiang Kong, Xubo Liu, Wenwu Wang, Mark D. Plumbley, Huy Phan, Emmanouil Benetos

We introduce WavCraft, a collective system that leverages large language models (LLMs) to connect diverse task-specific models for audio content creation and editing.

In-Context Learning

Paper
Code

ChatMusician: Understanding and Generating Music Intrinsically with LLM

1 code implementation • 25 Feb 2024 • Ruibin Yuan, Hanfeng Lin, Yi Wang, Zeyue Tian, Shangda Wu, Tianhao Shen, Ge Zhang, Yuhang Wu, Cong Liu, Ziya Zhou, Ziyang Ma, Liumeng Xue, Ziyu Wang, Qin Liu, Tianyu Zheng, Yizhi Li, Yinghao Ma, Yiming Liang, Xiaowei Chi, Ruibo Liu, Zili Wang, Pengfei Li, Jingcheng Wu, Chenghua Lin, Qifeng Liu, Tao Jiang, Wenhao Huang, Wenhu Chen, Emmanouil Benetos, Jie Fu, Gus Xia, Roger Dannenberg, Wei Xue, Shiyin Kang, Yike Guo

It is based on continual pre-training and finetuning LLaMA2 on a text-compatible music representation, ABC notation, and the music is treated as a second language.

Text Generation

146

Paper
Code

A Data-Driven Analysis of Robust Automatic Piano Transcription

no code implementations • 2 Feb 2024 • Drew Edwards, Simon Dixon, Emmanouil Benetos, Akira Maezawa, Yuta Kusaka

Algorithms for automatic piano transcription have improved dramatically in recent years due to new datasets and modeling techniques.

Data Augmentation

Paper
Add Code

Acoustic Prompt Tuning: Empowering Large Language Models with Audition Capabilities

1 code implementation • 30 Nov 2023 • Jinhua Liang, Xubo Liu, Wenwu Wang, Mark D. Plumbley, Huy Phan, Emmanouil Benetos

Moreover, we improve the framework of audio language model by using interleaved audio-text embeddings as the input sequence.

Audio Classification Few-Shot Audio Classification +2

Paper
Code

The Song Describer Dataset: a Corpus of Audio Captions for Music-and-Language Evaluation

1 code implementation • 16 Nov 2023 • Ilaria Manco, Benno Weck, Seungheon Doh, Minz Won, Yixiao Zhang, Dmitry Bogdanov, Yusong Wu, Ke Chen, Philip Tovstogan, Emmanouil Benetos, Elio Quinton, György Fazekas, Juhan Nam

We introduce the Song Describer dataset (SDD), a new crowdsourced corpus of high-quality audio-caption pairs, designed for the evaluation of music-and-language models.

Music Captioning Music Generation +2

115

Paper
Code

ATGNN: Audio Tagging Graph Neural Network

no code implementations • 2 Nov 2023 • Shubhr Singh, Christian J. Steinmetz, Emmanouil Benetos, Huy Phan, Dan Stowell

Deep learning models such as CNNs and Transformers have achieved impressive performance for end-to-end audio tagging.

Audio Tagging

Paper
Add Code

MERTech: Instrument Playing Technique Detection Using Self-Supervised Pretrained Model With Multi-Task Finetuning

1 code implementation • 15 Oct 2023 • Dichucheng Li, Yinghao Ma, Weixing Wei, Qiuqiang Kong, Yulun Wu, Mingjin Che, Fan Xia, Emmanouil Benetos, Wei Li

Recognizing the significance of pitch in capturing the nuances of IPTs and the importance of onset in locating IPT events, we investigate multi-task finetuning with pitch and onset detection as auxiliary tasks.

Instrument Playing Technique Detection Self-Supervised Learning

Paper
Code

MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response

1 code implementation • 15 Sep 2023 • Zihao Deng, Yinghao Ma, Yudong Liu, Rongchen Guo, Ge Zhang, Wenhu Chen, Wenhao Huang, Emmanouil Benetos

Large Language Models (LLMs) have shown immense potential in multimodal applications, yet the convergence of textual and musical domains remains not well-explored.

Caption Generation Language Modelling +1

Paper
Code

From West to East: Who can understand the music of the others better?

1 code implementation • 19 Jul 2023 • Charilaos Papaioannou, Emmanouil Benetos, Alexandros Potamianos

This leads to research questions on whether these models can be used to learn representations for different music cultures and styles, or whether we can build similar music audio embedding models trained on data from different cultures or styles.

Transfer Learning

Paper
Code

On the Effectiveness of Speech Self-supervised Learning for Music

no code implementations • 11 Jul 2023 • Yinghao Ma, Ruibin Yuan, Yizhi Li, Ge Zhang, Xingran Chen, Hanzhi Yin, Chenghua Lin, Emmanouil Benetos, Anton Ragni, Norbert Gyenge, Ruibo Liu, Gus Xia, Roger Dannenberg, Yike Guo, Jie Fu

Our findings suggest that training with music data can generally improve performance on MIR tasks, even when models are trained using paradigms designed for speech.

Information Retrieval Music Information Retrieval +2

Paper
Add Code

LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT

1 code implementation • 29 Jun 2023 • Le Zhuo, Ruibin Yuan, Jiahao Pan, Yinghao Ma, Yizhi Li, Ge Zhang, Si Liu, Roger Dannenberg, Jie Fu, Chenghua Lin, Emmanouil Benetos, Wenhu Chen, Wei Xue, Yike Guo

We introduce LyricWhiz, a robust, multilingual, and zero-shot automatic lyrics transcription method achieving state-of-the-art performance on various lyrics transcription datasets, even in challenging genres such as rock and metal.

Automatic Lyrics Transcription Language Modelling +3

Paper
Code

MARBLE: Music Audio Representation Benchmark for Universal Evaluation

1 code implementation • NeurIPS 2023 • Ruibin Yuan, Yinghao Ma, Yizhi Li, Ge Zhang, Xingran Chen, Hanzhi Yin, Le Zhuo, Yiqi Liu, Jiawen Huang, Zeyue Tian, Binyue Deng, Ningzhi Wang, Chenghua Lin, Emmanouil Benetos, Anton Ragni, Norbert Gyenge, Roger Dannenberg, Wenhu Chen, Gus Xia, Wei Xue, Si Liu, Shi Wang, Ruibo Liu, Yike Guo, Jie Fu

This is evident in the limited work on deep music representations, the scarcity of large-scale datasets, and the absence of a universal and community-driven benchmark.

Image Generation Information Retrieval +1

Paper
Code

Few-shot Class-incremental Audio Classification Using Dynamically Expanded Classifier with Self-attention Modified Prototypes

1 code implementation • 31 May 2023 • Yanxiong Li, Wenchang Cao, Wei Xie, Jialong Li, Emmanouil Benetos

Labeled support samples and unlabeled query samples are used to train the prototype adaptation network and update the classifier, since they are informative for audio classification.

Audio Classification

Paper
Code

MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training

1 code implementation • 31 May 2023 • Yizhi Li, Ruibin Yuan, Ge Zhang, Yinghao Ma, Xingran Chen, Hanzhi Yin, Chenghao Xiao, Chenghua Lin, Anton Ragni, Emmanouil Benetos, Norbert Gyenge, Roger Dannenberg, Ruibo Liu, Wenhu Chen, Gus Xia, Yemin Shi, Wenhao Huang, Zili Wang, Yike Guo, Jie Fu

Although SSL has been proven effective in speech and audio, its application to music audio has yet to be thoroughly explored.

Language Modelling Quantization +1

245

Paper
Code

Adapting Language-Audio Models as Few-Shot Audio Learners

no code implementations • 28 May 2023 • Jinhua Liang, Xubo Liu, Haohe Liu, Huy Phan, Emmanouil Benetos, Mark D. Plumbley, Wenwu Wang

We presented the Treff adapter, a training-efficient adapter for CLAP, to boost zero-shot classification performance by making use of a small set of labelled data.

Audio Classification Few-Shot Learning +1

Paper
Add Code

MAP-Music2Vec: A Simple and Effective Baseline for Self-Supervised Music Audio Representation Learning

no code implementations • 5 Dec 2022 • Yizhi Li, Ruibin Yuan, Ge Zhang, Yinghao Ma, Chenghua Lin, Xingran Chen, Anton Ragni, Hanzhi Yin, Zhijie Hu, Haoyu He, Emmanouil Benetos, Norbert Gyenge, Ruibo Liu, Jie Fu

The deep learning community has witnessed an exponentially growing interest in self-supervised learning (SSL).

Representation Learning Self-Supervised Learning

Paper
Add Code

Learning Music Representations with wav2vec 2.0

no code implementations • 27 Oct 2022 • Alessandro Ragano, Emmanouil Benetos, Andrew Hines

In addition, the results are superior to the pre-trained model on speech embeddings, demonstrating that wav2vec 2. 0 pre-trained on music data can be a promising music representation model.

Music Classification

Paper
Add Code

Contrastive Audio-Language Learning for Music

1 code implementation • 25 Aug 2022 • Ilaria Manco, Emmanouil Benetos, Elio Quinton, György Fazekas

In this work, we explore cross-modal learning in an attempt to bridge audio and language in the music domain.

Audio to Text Retrieval Descriptive +5

Paper
Code

Anomalous behaviour in loss-gradient based interpretability methods

no code implementations • 15 Jul 2022 • Vinod Subramanian, Siddharth Gururani, Emmanouil Benetos, Mark Sandler

Loss-gradients are used to interpret the decision making process of deep learning models.

Decision Making

Paper
Add Code

Deep Conditional Representation Learning for Drum Sample Retrieval by Vocalisation

1 code implementation • 10 Apr 2022 • Alejandro Delgado, Charalampos Saitis, Emmanouil Benetos, Mark Sandler

Imitating musical instruments with the human voice is an efficient way of communicating ideas between music producers, from sketching melody lines to clarifying desired sonorities.

Representation Learning Retrieval

Paper
Code

Exploring Transformer's potential on automatic piano transcription

no code implementations • 8 Apr 2022 • Longshen Ou, Ziyi Guo, Emmanouil Benetos, Jiqing Han, Ye Wang

Most recent research about automatic music transcription (AMT) uses convolutional neural networks and recurrent neural networks to model the mapping from music signals to symbolic notation.

Music Transcription

Paper
Add Code

A Comparison of Deep Learning MOS Predictors for Speech Synthesis Quality

no code implementations • 5 Apr 2022 • Alessandro Ragano, Emmanouil Benetos, Michael Chinen, Helard B. Martinez, Chandan K. A. Reddy, Jan Skoglund, Andrew Hines

In this paper, we evaluate several MOS predictors based on wav2vec 2. 0 and the NISQA speech quality prediction model to explore the role of the training data, the influence of the system type, and the role of cross-domain features in SSL models.

Benchmarking Self-Supervised Learning +1

Paper
Add Code

Learning music audio representations via weak language supervision

1 code implementation • 8 Dec 2021 • Ilaria Manco, Emmanouil Benetos, Elio Quinton, Gyorgy Fazekas

To address this question, we design a multimodal architecture for music and language pre-training (MuLaP) optimised via a set of proxy tasks.

Audio Classification Information Retrieval +2

Paper
Code

An evaluation of data augmentation methods for sound scene geotagging

no code implementations • 9 Oct 2021 • Helen L. Bear, Veronica Morfi, Emmanouil Benetos

Sound scene geotagging is a new topic of research which has evolved from acoustic scene classification.

Acoustic Scene Classification Classification +2

Paper
Add Code

Joint Scattering for Automatic Chick Call Recognition

no code implementations • 8 Oct 2021 • Changhong Wang, Emmanouil Benetos, Shuge Wang, Elisabetta Versace

Animal vocalisations contain important information about health, emotional state, and behaviour, thus can be potentially used for animal welfare monitoring.

Paper
Add Code

More for Less: Non-Intrusive Speech Quality Assessment with Limited Annotations

no code implementations • 19 Aug 2021 • Alessandro Ragano, Emmanouil Benetos, Andrew Hines

This paper indicates that multi-task learning combined with feature representations from unlabelled data is a promising approach to deal with the lack of large MOS annotated datasets.

Clustering Deep Clustering +1

Paper
Add Code

Pitch-Informed Instrument Assignment Using a Deep Convolutional Network with Multiple Kernel Shapes

no code implementations • 28 Jul 2021 • Carlos Lordelo, Emmanouil Benetos, Simon Dixon, Sven Ahlbäck

We also include ablation studies investigating the effects of the use of multiple kernel shapes and comparing different input representations for the audio and the note-related information.

Paper
Add Code

MusCaps: Generating Captions for Music Audio

1 code implementation • 24 Apr 2021 • Ilaria Manco, Emmanouil Benetos, Elio Quinton, Gyorgy Fazekas

Content-based music information retrieval has seen rapid progress with the adoption of deep learning.

Audio captioning Classification +3

Paper
Code

Adversarial Unsupervised Domain Adaptation for Harmonic-Percussive Source Separation

no code implementations • 3 Jan 2021 • Carlos Lordelo, Emmanouil Benetos, Simon Dixon, Sven Ahlbäck, Patrik Ohlsson

This paper addresses the problem of domain adaptation for the task of music source separation.

Music Source Separation Unsupervised Domain Adaptation

Paper
Add Code

The Effect of Spectrogram Reconstruction on Automatic Music Transcription: An Alternative Approach to Improve Transcription Accuracy

2 code implementations • 20 Oct 2020 • Kin Wai Cheuk, Yin-Jyun Luo, Emmanouil Benetos, Dorien Herremans

We attempt to use only the pitch labels (together with spectrogram reconstruction loss) and explore how far this model can go without introducing supervised sub-tasks.

Music Transcription

Paper
Code

Reliable Local Explanations for Machine Listening

1 code implementation • 15 May 2020 • Saumitra Mishra, Emmanouil Benetos, Bob L. Sturm, Simon Dixon

One way to analyse the behaviour of machine learning models is through local explanations that highlight input features that maximally influence model predictions.

Paper
Code

Memory Controlled Sequential Self Attention for Sound Recognition

1 code implementation • 13 May 2020 • Arjun Pankajakshan, Helen L. Bear, Vinod Subramanian, Emmanouil Benetos

In this paper we investigate the importance of the extent of memory in sequential self attention for sound recognition.

Event Detection Sound Event Detection

Paper
Code

Musical Features for Automatic Music Transcription Evaluation

no code implementations • 15 Apr 2020 • Adrien Ycart, Lele Liu, Emmanouil Benetos, Marcus T. Pearce

This technical report gives a detailed, formal description of the features introduced in the paper: Adrien Ycart, Lele Liu, Emmanouil Benetos and Marcus T. Pearce.

Information Retrieval Music Information Retrieval +2

Paper
Add Code

Audio Impairment Recognition Using a Correlation-Based Feature Representation

no code implementations • 22 Mar 2020 • Alessandro Ragano, Emmanouil Benetos, Andrew Hines

Audio impairment recognition is based on finding noise in audio files and categorising the impairment type.

Paper
Add Code

Modeling plate and spring reverberation using a DSP-informed deep neural network

1 code implementation • 22 Oct 2019 • Marco A. Martínez Ramírez, Emmanouil Benetos, Joshua D. Reiss

Plate and spring reverberators are electromechanical systems first used and researched as means to substitute real room reverberation.

Paper
Code

Adversarial Attacks in Sound Event Classification

no code implementations • 4 Jul 2019 • Vinod Subramanian, Emmanouil Benetos, Ning Xu, SKoT McDonald, Mark Sandler

In addition, we show that the adversarial attacks are very effective across the different models.

Adversarial Attack Classification +2

Paper
Add Code

A general-purpose deep learning approach to model time-varying audio effects

no code implementations • 15 May 2019 • Marco A. Martínez Ramírez, Emmanouil Benetos, Joshua D. Reiss

Audio processors whose parameters are modified periodically over time are often referred as time-varying or modulation based audio effects.

Paper
Add Code

GAN-based Generation and Automatic Selection of Explanations for Neural Networks

no code implementations • 21 Apr 2019 • Saumitra Mishra, Daniel Stoller, Emmanouil Benetos, Bob L. Sturm, Simon Dixon

However, this requires a careful selection of hyper-parameters to generate interpretable examples for each neuron of interest, and current methods rely on a manual, qualitative evaluation of each setting, which is prohibitively slow.

Paper
Add Code

Ensemble Models for Spoofing Detection in Automatic Speaker Verification

1 code implementation • 9 Apr 2019 • Bhusan Chettri, Daniel Stoller, Veronica Morfi, Marco A. Martínez Ramírez, Emmanouil Benetos, Bob L. Sturm

Our ensemble model outperforms all our single models and the baselines from the challenge for both attack types.

Audio and Speech Processing Sound

Paper
Code

Optimal Neural Network Feature Selection for Spatial-Temporal Forecasting

no code implementations • 30 Apr 2018 • Eurico Covas, Emmanouil Benetos

In this paper, we show empirical evidence on how to construct the optimal feature selection or input representation used by the input layer of a feedforward neural network for the propose of forecasting spatial-temporal signals.

feature selection

Paper
Add Code

Sound Event Detection in Synthetic Audio: Analysis of the DCASE 2016 Task Results

no code implementations • 15 Nov 2017 • Grégoire Lafay, Emmanouil Benetos, Mathieu Lagrange

As part of the 2016 public evaluation challenge on Detection and Classification of Acoustic Scenes and Events (DCASE 2016), the second task focused on evaluating sound event detection systems using synthetic mixtures of office sounds.

Event Detection General Classification +1

Paper
Add Code

An End-to-End Neural Network for Polyphonic Piano Music Transcription

1 code implementation • 7 Aug 2015 • Siddharth Sigtia, Emmanouil Benetos, Simon Dixon

We compare performance of the neural network based acoustic models with two popular unsupervised acoustic models.

Language Modelling Music Transcription +2

Paper
Code

An evaluation framework for event detection using a morphological model of acoustic scenes

no code implementations • 31 Jan 2015 • Mathieu Lagrange, Grégoire Lafay, Mathias Rossignol, Emmanouil Benetos, Axel Roebel

This paper introduces a model of environmental acoustic scenes which adopts a morphological approach by ab-stracting temporal structures of acoustic scenes.

Event Detection

Paper
Add Code

A Hybrid Recurrent Neural Network For Music Transcription

no code implementations • 6 Nov 2014 • Siddharth Sigtia, Emmanouil Benetos, Nicolas Boulanger-Lewandowski, Tillman Weyde, Artur S. d'Avila Garcez, Simon Dixon

We investigate the problem of incorporating higher-level symbolic score-like information into Automatic Music Transcription (AMT) systems to improve their performance.

Music Transcription

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.