1 code implementation • 26 Aug 2024 • Yinghao Ma, Anders Øland, Anton Ragni, Bleiz MacSen Del Sette, Charalampos Saitis, Chris Donahue, Chenghua Lin, Christos Plachouras, Emmanouil Benetos, Elona Shatri, Fabio Morreale, Ge Zhang, György Fazekas, Gus Xia, huan zhang, Ilaria Manco, Jiawen Huang, Julien Guinot, Liwei Lin, Luca Marinelli, Max W. Y. Lam, Megha Sharma, Qiuqiang Kong, Roger B. Dannenberg, Ruibin Yuan, Shangda Wu, Shih-Lun Wu, Shuqi Dai, Shun Lei, Shiyin Kang, Simon Dixon, Wenhu Chen, Wenhao Huang, Xingjian Du, Xingwei Qu, Xu Tan, Yizhi Li, Zeyue Tian, Zhiyong Wu, Zhizheng Wu, Ziyang Ma, Ziyu Wang
In recent years, foundation models (FMs) such as large language models (LLMs) and latent diffusion models (LDMs) have profoundly impacted diverse sectors, including music.
1 code implementation • 2 Aug 2024 • Benno Weck, Ilaria Manco, Emmanouil Benetos, Elio Quinton, George Fazekas, Dmitry Bogdanov
Motivated by this, we introduce MuChoMusic, a benchmark for evaluating music understanding in multimodal language models focused on audio.
no code implementations • 31 Jul 2024 • Ziya Zhou, Yuhang Wu, Zhiyue Wu, Xinyue Zhang, Ruibin Yuan, Yinghao Ma, Lu Wang, Emmanouil Benetos, Wei Xue, Yike Guo
Yet scant research explores the details of how these LLMs perform on advanced music understanding and conditioned generation, especially from the multi-step reasoning perspective, which is a critical aspect in the conditioned, editable, and interactive human-computer co-creation process.
1 code implementation • 5 Jul 2024 • Sungkyun Chang, Emmanouil Benetos, Holger Kirchhoff, Simon Dixon
Further testing on pop music recordings highlights the limitations of current models.
Ranked #1 on Multi-instrument Music Transcription on URMP (using extra training data)
1 code implementation • 25 Jun 2024 • Jiawen Huang, Emmanouil Benetos
Furthermore, we demonstrate that incorporating language information significantly enhances performance.
Automatic Lyrics Transcription Automatic Speech Recognition +1
1 code implementation • 29 May 2024 • Ge Zhang, Scott Qu, Jiaheng Liu, Chenchen Zhang, Chenghua Lin, Chou Leuang Yu, Danny Pan, Esther Cheng, Jie Liu, Qunshu Lin, Raven Yuan, Tuney Zheng, Wei Pang, Xinrun Du, Yiming Liang, Yinghao Ma, Yizhi Li, Ziyang Ma, Bill Lin, Emmanouil Benetos, Huan Yang, Junting Zhou, Kaijing Ma, Minghao Liu, Morry Niu, Noah Wang, Quehry Que, Ruibo Liu, Sine Liu, Shawn Guo, Soren Gao, Wangchunshu Zhou, Xinyue Zhang, Yizhi Zhou, YuBo Wang, Yuelin Bai, Yuhan Zhang, Yuxiang Zhang, Zenith Wang, Zhenzhu Yang, Zijian Zhao, Jiajun Zhang, Wanli Ouyang, Wenhao Huang, Wenhu Chen
To improve the transparency of LLMs, the research community has formed to open-source truly open LLMs (e. g., Pythia, Amber, OLMo), where more details (e. g., pre-training corpus and training code) are being provided.
1 code implementation • 2 May 2024 • Alessio Xompero, Myriam Bontonou, Jean-Michel Arbona, Emmanouil Benetos, Andrea Cavallaro
To explain the decision of these models, we use feature-attribution to identify and quantify which objects (and which of their features) are more relevant to privacy classification with respect to a reference input (i. e., no objects localised in an image) predicted as public.
1 code implementation • 28 Apr 2024 • Qixin Deng, Qikai Yang, Ruibin Yuan, Yipeng Huang, Yi Wang, Xubo Liu, Zeyue Tian, Jiahao Pan, Ge Zhang, Hanfeng Lin, Yizhi Li, Yinghao Ma, Jie Fu, Chenghua Lin, Emmanouil Benetos, Wenwu Wang, Guangyu Xia, Wei Xue, Yike Guo
Music composition represents the creative side of humanity, and itself is a complex task that requires abilities to understand and generate information with long dependency and harmony constraints.
no code implementations • 9 Apr 2024 • Xingwei Qu, Yuelin Bai, Yinghao Ma, Ziya Zhou, Ka Man Lo, Jiaheng Liu, Ruibin Yuan, Lejun Min, Xueling Liu, Tianyu Zhang, Xinrun Du, Shuyue Guo, Yiming Liang, Yizhi Li, Shangda Wu, Junting Zhou, Tianyu Zheng, Ziyang Ma, Fengze Han, Wei Xue, Gus Xia, Emmanouil Benetos, Xiang Yue, Chenghua Lin, Xu Tan, Stephen W. Huang, Jie Fu, Ge Zhang
In this paper, we explore the application of Large Language Models (LLMs) to the pre-training of music.
1 code implementation • 27 Mar 2024 • Jinhua Liang, Ines Nolasco, Burooj Ghani, Huy Phan, Emmanouil Benetos, Dan Stowell
A recent development in the field is the introduction of the task known as few-shot bioacoustic sound event detection, which aims to train a versatile animal sound detector using only a small set of audio samples.
1 code implementation • 18 Mar 2024 • Emilian Postolache, Giorgio Mariani, Luca Cosmo, Emmanouil Benetos, Emanuele Rodolà
Multi-Source Diffusion Models (MSDM) allow for compositional musical generation tasks: generating a set of coherent sources, creating accompaniments, and performing source separation.
1 code implementation • 14 Mar 2024 • Jinhua Liang, huan zhang, Haohe Liu, Yin Cao, Qiuqiang Kong, Xubo Liu, Wenwu Wang, Mark D. Plumbley, Huy Phan, Emmanouil Benetos
We introduce WavCraft, a collective system that leverages large language models (LLMs) to connect diverse task-specific models for audio content creation and editing.
1 code implementation • 25 Feb 2024 • Ruibin Yuan, Hanfeng Lin, Yi Wang, Zeyue Tian, Shangda Wu, Tianhao Shen, Ge Zhang, Yuhang Wu, Cong Liu, Ziya Zhou, Ziyang Ma, Liumeng Xue, Ziyu Wang, Qin Liu, Tianyu Zheng, Yizhi Li, Yinghao Ma, Yiming Liang, Xiaowei Chi, Ruibo Liu, Zili Wang, Pengfei Li, Jingcheng Wu, Chenghua Lin, Qifeng Liu, Tao Jiang, Wenhao Huang, Wenhu Chen, Emmanouil Benetos, Jie Fu, Gus Xia, Roger Dannenberg, Wei Xue, Shiyin Kang, Yike Guo
It is based on continual pre-training and finetuning LLaMA2 on a text-compatible music representation, ABC notation, and the music is treated as a second language.
no code implementations • 2 Feb 2024 • Drew Edwards, Simon Dixon, Emmanouil Benetos, Akira Maezawa, Yuta Kusaka
Algorithms for automatic piano transcription have improved dramatically in recent years due to new datasets and modeling techniques.
Ranked #3 on Music Transcription on MAPS (using extra training data)
1 code implementation • 30 Nov 2023 • Jinhua Liang, Xubo Liu, Wenwu Wang, Mark D. Plumbley, Huy Phan, Emmanouil Benetos
Moreover, we improve the framework of audio language model by using interleaved audio-text embeddings as the input sequence.
1 code implementation • 16 Nov 2023 • Ilaria Manco, Benno Weck, Seungheon Doh, Minz Won, Yixiao Zhang, Dmitry Bogdanov, Yusong Wu, Ke Chen, Philip Tovstogan, Emmanouil Benetos, Elio Quinton, György Fazekas, Juhan Nam
We introduce the Song Describer dataset (SDD), a new crowdsourced corpus of high-quality audio-caption pairs, designed for the evaluation of music-and-language models.
no code implementations • 2 Nov 2023 • Shubhr Singh, Christian J. Steinmetz, Emmanouil Benetos, Huy Phan, Dan Stowell
Deep learning models such as CNNs and Transformers have achieved impressive performance for end-to-end audio tagging.
1 code implementation • 15 Oct 2023 • Dichucheng Li, Yinghao Ma, Weixing Wei, Qiuqiang Kong, Yulun Wu, Mingjin Che, Fan Xia, Emmanouil Benetos, Wei Li
Recognizing the significance of pitch in capturing the nuances of IPTs and the importance of onset in locating IPT events, we investigate multi-task finetuning with pitch and onset detection as auxiliary tasks.
1 code implementation • 15 Sep 2023 • Zihao Deng, Yinghao Ma, Yudong Liu, Rongchen Guo, Ge Zhang, Wenhu Chen, Wenhao Huang, Emmanouil Benetos
Large Language Models (LLMs) have shown immense potential in multimodal applications, yet the convergence of textual and musical domains remains not well-explored.
1 code implementation • 19 Jul 2023 • Charilaos Papaioannou, Emmanouil Benetos, Alexandros Potamianos
This leads to research questions on whether these models can be used to learn representations for different music cultures and styles, or whether we can build similar music audio embedding models trained on data from different cultures or styles.
no code implementations • 11 Jul 2023 • Yinghao Ma, Ruibin Yuan, Yizhi Li, Ge Zhang, Xingran Chen, Hanzhi Yin, Chenghua Lin, Emmanouil Benetos, Anton Ragni, Norbert Gyenge, Ruibo Liu, Gus Xia, Roger Dannenberg, Yike Guo, Jie Fu
Our findings suggest that training with music data can generally improve performance on MIR tasks, even when models are trained using paradigms designed for speech.
1 code implementation • 29 Jun 2023 • Le Zhuo, Ruibin Yuan, Jiahao Pan, Yinghao Ma, Yizhi Li, Ge Zhang, Si Liu, Roger Dannenberg, Jie Fu, Chenghua Lin, Emmanouil Benetos, Wei Xue, Yike Guo
We introduce LyricWhiz, a robust, multilingual, and zero-shot automatic lyrics transcription method achieving state-of-the-art performance on various lyrics transcription datasets, even in challenging genres such as rock and metal.
1 code implementation • NeurIPS 2023 • Ruibin Yuan, Yinghao Ma, Yizhi Li, Ge Zhang, Xingran Chen, Hanzhi Yin, Le Zhuo, Yiqi Liu, Jiawen Huang, Zeyue Tian, Binyue Deng, Ningzhi Wang, Chenghua Lin, Emmanouil Benetos, Anton Ragni, Norbert Gyenge, Roger Dannenberg, Wenhu Chen, Gus Xia, Wei Xue, Si Liu, Shi Wang, Ruibo Liu, Yike Guo, Jie Fu
This is evident in the limited work on deep music representations, the scarcity of large-scale datasets, and the absence of a universal and community-driven benchmark.
1 code implementation • 31 May 2023 • Yizhi Li, Ruibin Yuan, Ge Zhang, Yinghao Ma, Xingran Chen, Hanzhi Yin, Chenghao Xiao, Chenghua Lin, Anton Ragni, Emmanouil Benetos, Norbert Gyenge, Roger Dannenberg, Ruibo Liu, Wenhu Chen, Gus Xia, Yemin Shi, Wenhao Huang, Zili Wang, Yike Guo, Jie Fu
Although SSL has been proven effective in speech and audio, its application to music audio has yet to be thoroughly explored.
1 code implementation • 31 May 2023 • Yanxiong Li, Wenchang Cao, Wei Xie, Jialong Li, Emmanouil Benetos
Labeled support samples and unlabeled query samples are used to train the prototype adaptation network and update the classifier, since they are informative for audio classification.
no code implementations • 28 May 2023 • Jinhua Liang, Xubo Liu, Haohe Liu, Huy Phan, Emmanouil Benetos, Mark D. Plumbley, Wenwu Wang
We presented the Treff adapter, a training-efficient adapter for CLAP, to boost zero-shot classification performance by making use of a small set of labelled data.
no code implementations • 5 Dec 2022 • Yizhi Li, Ruibin Yuan, Ge Zhang, Yinghao Ma, Chenghua Lin, Xingran Chen, Anton Ragni, Hanzhi Yin, Zhijie Hu, Haoyu He, Emmanouil Benetos, Norbert Gyenge, Ruibo Liu, Jie Fu
The deep learning community has witnessed an exponentially growing interest in self-supervised learning (SSL).
no code implementations • 27 Oct 2022 • Alessandro Ragano, Emmanouil Benetos, Andrew Hines
In addition, the results are superior to the pre-trained model on speech embeddings, demonstrating that wav2vec 2. 0 pre-trained on music data can be a promising music representation model.
1 code implementation • 25 Aug 2022 • Ilaria Manco, Emmanouil Benetos, Elio Quinton, György Fazekas
In this work, we explore cross-modal learning in an attempt to bridge audio and language in the music domain.
no code implementations • 15 Jul 2022 • Vinod Subramanian, Siddharth Gururani, Emmanouil Benetos, Mark Sandler
Loss-gradients are used to interpret the decision making process of deep learning models.
1 code implementation • 10 Apr 2022 • Alejandro Delgado, Charalampos Saitis, Emmanouil Benetos, Mark Sandler
Imitating musical instruments with the human voice is an efficient way of communicating ideas between music producers, from sketching melody lines to clarifying desired sonorities.
no code implementations • 8 Apr 2022 • Longshen Ou, Ziyi Guo, Emmanouil Benetos, Jiqing Han, Ye Wang
Most recent research about automatic music transcription (AMT) uses convolutional neural networks and recurrent neural networks to model the mapping from music signals to symbolic notation.
no code implementations • 5 Apr 2022 • Alessandro Ragano, Emmanouil Benetos, Michael Chinen, Helard B. Martinez, Chandan K. A. Reddy, Jan Skoglund, Andrew Hines
In this paper, we evaluate several MOS predictors based on wav2vec 2. 0 and the NISQA speech quality prediction model to explore the role of the training data, the influence of the system type, and the role of cross-domain features in SSL models.
1 code implementation • 8 Dec 2021 • Ilaria Manco, Emmanouil Benetos, Elio Quinton, Gyorgy Fazekas
To address this question, we design a multimodal architecture for music and language pre-training (MuLaP) optimised via a set of proxy tasks.
no code implementations • 9 Oct 2021 • Helen L. Bear, Veronica Morfi, Emmanouil Benetos
Sound scene geotagging is a new topic of research which has evolved from acoustic scene classification.
no code implementations • 8 Oct 2021 • Changhong Wang, Emmanouil Benetos, Shuge Wang, Elisabetta Versace
Animal vocalisations contain important information about health, emotional state, and behaviour, thus can be potentially used for animal welfare monitoring.
no code implementations • 19 Aug 2021 • Alessandro Ragano, Emmanouil Benetos, Andrew Hines
This paper indicates that multi-task learning combined with feature representations from unlabelled data is a promising approach to deal with the lack of large MOS annotated datasets.
no code implementations • 28 Jul 2021 • Carlos Lordelo, Emmanouil Benetos, Simon Dixon, Sven Ahlbäck
We also include ablation studies investigating the effects of the use of multiple kernel shapes and comparing different input representations for the audio and the note-related information.
1 code implementation • 24 Apr 2021 • Ilaria Manco, Emmanouil Benetos, Elio Quinton, Gyorgy Fazekas
Content-based music information retrieval has seen rapid progress with the adoption of deep learning.
no code implementations • 3 Jan 2021 • Carlos Lordelo, Emmanouil Benetos, Simon Dixon, Sven Ahlbäck, Patrik Ohlsson
This paper addresses the problem of domain adaptation for the task of music source separation.
2 code implementations • 20 Oct 2020 • Kin Wai Cheuk, Yin-Jyun Luo, Emmanouil Benetos, Dorien Herremans
We attempt to use only the pitch labels (together with spectrogram reconstruction loss) and explore how far this model can go without introducing supervised sub-tasks.
1 code implementation • 15 May 2020 • Saumitra Mishra, Emmanouil Benetos, Bob L. Sturm, Simon Dixon
One way to analyse the behaviour of machine learning models is through local explanations that highlight input features that maximally influence model predictions.
1 code implementation • 13 May 2020 • Arjun Pankajakshan, Helen L. Bear, Vinod Subramanian, Emmanouil Benetos
In this paper we investigate the importance of the extent of memory in sequential self attention for sound recognition.
no code implementations • 15 Apr 2020 • Adrien Ycart, Lele Liu, Emmanouil Benetos, Marcus T. Pearce
This technical report gives a detailed, formal description of the features introduced in the paper: Adrien Ycart, Lele Liu, Emmanouil Benetos and Marcus T. Pearce.
no code implementations • 22 Mar 2020 • Alessandro Ragano, Emmanouil Benetos, Andrew Hines
Audio impairment recognition is based on finding noise in audio files and categorising the impairment type.
1 code implementation • 22 Oct 2019 • Marco A. Martínez Ramírez, Emmanouil Benetos, Joshua D. Reiss
Plate and spring reverberators are electromechanical systems first used and researched as means to substitute real room reverberation.
no code implementations • 4 Jul 2019 • Vinod Subramanian, Emmanouil Benetos, Ning Xu, SKoT McDonald, Mark Sandler
In addition, we show that the adversarial attacks are very effective across the different models.
no code implementations • 15 May 2019 • Marco A. Martínez Ramírez, Emmanouil Benetos, Joshua D. Reiss
Audio processors whose parameters are modified periodically over time are often referred as time-varying or modulation based audio effects.
no code implementations • 21 Apr 2019 • Saumitra Mishra, Daniel Stoller, Emmanouil Benetos, Bob L. Sturm, Simon Dixon
However, this requires a careful selection of hyper-parameters to generate interpretable examples for each neuron of interest, and current methods rely on a manual, qualitative evaluation of each setting, which is prohibitively slow.
1 code implementation • 9 Apr 2019 • Bhusan Chettri, Daniel Stoller, Veronica Morfi, Marco A. Martínez Ramírez, Emmanouil Benetos, Bob L. Sturm
Our ensemble model outperforms all our single models and the baselines from the challenge for both attack types.
Audio and Speech Processing Sound
no code implementations • 30 Apr 2018 • Eurico Covas, Emmanouil Benetos
In this paper, we show empirical evidence on how to construct the optimal feature selection or input representation used by the input layer of a feedforward neural network for the propose of forecasting spatial-temporal signals.
no code implementations • 15 Nov 2017 • Grégoire Lafay, Emmanouil Benetos, Mathieu Lagrange
As part of the 2016 public evaluation challenge on Detection and Classification of Acoustic Scenes and Events (DCASE 2016), the second task focused on evaluating sound event detection systems using synthetic mixtures of office sounds.
1 code implementation • 7 Aug 2015 • Siddharth Sigtia, Emmanouil Benetos, Simon Dixon
We compare performance of the neural network based acoustic models with two popular unsupervised acoustic models.
no code implementations • 31 Jan 2015 • Mathieu Lagrange, Grégoire Lafay, Mathias Rossignol, Emmanouil Benetos, Axel Roebel
This paper introduces a model of environmental acoustic scenes which adopts a morphological approach by ab-stracting temporal structures of acoustic scenes.
no code implementations • 6 Nov 2014 • Siddharth Sigtia, Emmanouil Benetos, Nicolas Boulanger-Lewandowski, Tillman Weyde, Artur S. d'Avila Garcez, Simon Dixon
We investigate the problem of incorporating higher-level symbolic score-like information into Automatic Music Transcription (AMT) systems to improve their performance.