1 code implementation • 11 Feb 2025 • Abhinaba Roy, Renhang Liu, Tongyu Lu, Dorien Herremans
We introduce JamendoMaxCaps, a large-scale music-caption dataset featuring over 200, 000 freely licensed instrumental tracks from the renowned Jamendo platform.
1 code implementation • 6 Feb 2025 • Jaeyong Kang, Dorien Herremans
Moreover, knowledge distillation is employed to transfer the knowledge of teacher models trained on individual datasets to a student model, enhancing its ability to generalize across multiple tasks.
1 code implementation • 6 Feb 2025 • Keshav Bhandari, Sungkyun Chang, Tongyu Lu, Fareza R. Enus, Louis B. Bradshaw, Dorien Herremans, Simon Colton
However, in the realm of symbolic music, generating controllable and expressive performance-level style transfers for complete musical works remains challenging due to limited datasets, especially for genres such as jazz, and the lack of unified models that can handle multiple music generation tasks.
1 code implementation • 21 Dec 2024 • Keshav Bhandari, Abhinaba Roy, Kyra Wang, Geeta Puri, Simon Colton, Dorien Herremans
This paper introduces text2midi, an end-to-end model to generate MIDI files from textual descriptions.
1 code implementation • 1 Nov 2024 • Anuradha Chopra, Abhinaba Roy, Dorien Herremans
This paper introduces an extendable modular system that compiles a range of music feature extraction models to aid music information retrieval research.
no code implementations • 17 Oct 2024 • Jan Melechovsky, Ambuj Mehrish, Berrak Sisman, Dorien Herremans
Recent advancements in Text-to-Speech (TTS) systems have enabled the generation of natural and expressive speech from textual input.
1 code implementation • 15 Oct 2024 • Renhang Liu, Abhinaba Roy, Dorien Herremans
In this work, we present a novel method for music emotion recognition that leverages Large Language Model (LLM) embeddings for label alignment across multiple datasets and zero-shot prediction on novel categories.
no code implementations • 14 Sep 2024 • Megan Wei, Mateusz Modrzejewski, Aswin Sivaraman, Dorien Herremans
In tandem with the recent advancements in foundation model research, there has been a surge of generative music AI applications within the past few years.
1 code implementation • 13 Aug 2024 • Perry Lam, Huayun Zhang, Nancy F. Chen, Berrak Sisman, Dorien Herremans
We attain 25. 3% hanzi CER and 13. 0% pinyin CER with the JETS model.
no code implementations • 15 Jul 2024 • Jing Luo, Xinyu Yang, Dorien Herremans
Subsequently, we release BandControlNet, a conditional model based on parallel Transformers, to tackle the multiple music sequences and generate high-quality music samples that are conditioned to the given spatiotemporal control features.
no code implementations • 13 Jun 2024 • Kyra Wang, Dorien Herremans
Laughing, sighing, stuttering, and other forms of paralanguage do not contribute any direct lexical meaning to speech, but they provide crucial propositional context that aids semantic and pragmatic processes such as irony.
1 code implementation • 13 Jun 2024 • Joel Ong, Dorien Herremans
This paper introduces DeepUnifiedMom, a deep learning framework that enhances portfolio management through a multi-task learning approach and a multi-gate mixture of experts.
1 code implementation • 13 Jun 2024 • Jaeyong Kang, Dorien Herremans
Deep learning models for music have advanced drastically in recent years, but how good are machine learning models at capturing emotion, and what challenges are researchers facing?
2 code implementations • 4 Jun 2024 • Jan Melechovsky, Abhinaba Roy, Dorien Herremans
This work aims to enable research that combines LLMs with symbolic music by presenting, the first openly available large-scale MIDI dataset with text captions.
no code implementations • 3 Jun 2024 • Jan Melechovsky, Ambuj Mehrish, Berrak Sisman, Dorien Herremans
With rapid globalization, the need to build inclusive and representative speech technology cannot be overstated.
1 code implementation • 27 Feb 2024 • Dinh-Viet-Toan Le, Louis Bigo, Mikaela Keller, Dorien Herremans
Music has been frequently compared to language, as they share several similarities, including sequential representations of text and music.
4 code implementations • 14 Nov 2023 • Jan Melechovsky, Zixun Guo, Deepanway Ghosal, Navonil Majumder, Dorien Herremans, Soujanya Poria
Through extensive experiments, we show that the quality of the music generated by Mustango is state-of-the-art, and the controllability through music-specific text prompts greatly outperforms other models such as MusicGen and AudioLDM2.
Ranked #1 on
Text-to-Music Generation
on MusicBench
1 code implementation • 2 Nov 2023 • Jaeyong Kang, Soujanya Poria, Dorien Herremans
These distinct features are then employed as guiding input to our music generation model.
no code implementations • 8 Jun 2023 • Joel Ong, Dorien Herremans
The performance of existing TSMOM strategies, however, relies not only on the quality of the momentum signal but also on the efficacy of the volatility estimator.
no code implementations • 1 Feb 2023 • Kin Wai Cheuk, Keunwoo Choi, Qiuqiang Kong, Bochen Li, Minz Won, Ju-Chiang Wang, Yun-Ning Hung, Dorien Herremans
Jointist consists of an instrument recognition module that conditions the other two modules: a transcription module that outputs instrument-specific piano rolls, and a source separation module that utilizes instrument information and transcription results.
no code implementations • 14 Nov 2022 • Perry Lam, Huayun Zhang, Nancy F. Chen, Berrak Sisman, Dorien Herremans
Text-to-speech (TTS) models have achieved remarkable naturalness in recent years, yet like most deep neural models, they have more parameters than necessary.
1 code implementation • 7 Nov 2022 • Jan Melechovsky, Ambuj Mehrish, Berrak Sisman, Dorien Herremans
Accent plays a significant role in speech communication, influencing one's capability to understand as well as conveying a person's identity.
no code implementations • 11 Oct 2022 • Kin Wai Cheuk, Ryosuke Sawata, Toshimitsu Uesaka, Naoki Murata, Naoya Takahashi, Shusuke Takahashi, Dorien Herremans, Yuki Mitsufuji
In this paper we propose a novel generative approach, DiffRoll, to tackle automatic music transcription (AMT).
1 code implementation • 6 Oct 2022 • Dorien Herremans, Kah Wee Low
Our results show that the model outperforms existing state-of-the-art models when forecasting extreme volatility spikes for Bitcoin using CryptoQuant data as well as whale-alert tweets.
no code implementations • 22 Jun 2022 • Kin Wai Cheuk, Keunwoo Choi, Qiuqiang Kong, Bochen Li, Minz Won, Amy Hung, Ju-Chiang Wang, Dorien Herremans
However, its novelty necessitates a new perspective on how to evaluate such a model.
Ranked #4 on
Music Transcription
on Slakh2100
1 code implementation • 30 May 2022 • Yanzhao Zou, Dorien Herremans
In our hybrid model, we use sentence-level FinBERT embeddings, pretrained on financial lexicons, so as to capture the full contents of the tweets and feed it to the model in an understandable way.
3 code implementations • 6 Mar 2022 • Joseph Turian, Jordie Shier, Humair Raj Khan, Bhiksha Raj, Björn W. Schuller, Christian J. Steinmetz, Colin Malloy, George Tzanetakis, Gissel Velarde, Kirk McNally, Max Henry, Nicolas Pinto, Camille Noufi, Christian Clough, Dorien Herremans, Eduardo Fonseca, Jesse Engel, Justin Salamon, Philippe Esling, Pranay Manocha, Shinji Watanabe, Zeyu Jin, Yonatan Bisk
The aim of the HEAR benchmark is to develop a general-purpose audio representation that provides a strong basis for learning in a wide variety of tasks and scenarios.
1 code implementation • 19 Feb 2022 • Phoebe Chua, Dimos Makris, Dorien Herremans, Gemma Roig, Kat Agres
In this paper we present MusicVideos (MuVi), a novel dataset for affective multimedia content analysis to study how the auditory and visual modalities contribute to the perceived emotion of media.
no code implementations • 11 Feb 2022 • Rui Guo, Ivor Simpson, Chris Kiefer, Thor Magnusson, Dorien Herremans
We present a novel music generation framework for music infilling, with a user friendly interface.
1 code implementation • 9 Feb 2022 • Dimos Makris, Guo Zixun, Maximos Kaliakatsos-Papakostas, Dorien Herremans
The field of automatic music composition has seen great progress in recent years, specifically with the invention of transformer-based architectures.
1 code implementation • Sensors 2021 • Ha Thi Phuong Thao, B T Balamurali, Gemma Roig, Dorien Herremans
The models that use all visual, audio, and text features simultaneously as their inputs performed better than those using features extracted from each modality separately.
no code implementations • 11 Jul 2021 • Kin Wai Cheuk, Dorien Herremans, Li Su
Most of the current supervised automatic music transcription (AMT) models lack the ability to generalize.
no code implementations • 25 Jun 2021 • Dorien Herremans
This provides a unique and integrated approach that guides managers and lead developers through the various challenges in the implementation process.
no code implementations • 23 Jun 2021 • Balamurali B T, Hwan Ing Hee, Saumitra Kapoor, Oon Hoe Teoh, Sung Shin Teng, Khai Pin Lee, Dorien Herremans, Jer Ming Chen
The resulting trained model when trained for classifying two classes of coughs -- healthy or pathology (in general or belonging to a specific respiratory pathology), reaches accuracy exceeding 84\% when classifying cough to the label provided by the physicians' diagnosis.
1 code implementation • 27 Apr 2021 • Dimos Makris, Kat R. Agres, Dorien Herremans
In this paper, we present a novel approach for calculating the valence (the positivity or negativity of the perceived emotion) of a chord progression within a lead sheet, using pre-defined mood tags proposed by music experts.
no code implementations • 26 Feb 2021 • Abigail Lee-Leon, Chau Yuen, Dorien Herremans
Our proposed receiver system comprises of DBN based de-noising and classification of the received signal.
1 code implementation • 21 Oct 2020 • Ha Thi Phuong Thao, Balamurali B. T., Dorien Herremans, Gemma Roig
In this work, we propose different variants of the self-attention based network for emotion prediction from movies, which we call AttendAffectNet.
2 code implementations • 20 Oct 2020 • Kin Wai Cheuk, Yin-Jyun Luo, Emmanouil Benetos, Dorien Herremans
We attempt to use only the pitch labels (together with spectrogram reconstruction loss) and explore how far this model can go without introducing supervised sub-tasks.
no code implementations • 16 Oct 2020 • Dorien Herremans, Tom Bergmans
Billions of USD are invested in new artists and songs by the music industry every year.
1 code implementation • 13 Oct 2020 • Rui Guo, Ivor Simpson, Thor Magnusson, Chris Kiefer, Dorien Herremans
Many of the music generation systems based on neural networks are fully autonomous and do not offer control over the generation process.
Sound Symbolic Computation Audio and Speech Processing
no code implementations • 9 Sep 2020 • Fajilatun Nahar, Kat Agres, Balamurali BT, Dorien Herremans
We use this new dataset to train different classification models to distinguish the origin of the music in terms of these ethnic groups.
1 code implementation • 29 Jul 2020 • Hao Hao Tan, Dorien Herremans
Using arousal as an example of a high-level feature, we show that the "faders" of our model are disentangled and change linearly w. r. t.
no code implementations • 2 Jul 2020 • Kanish Garg, Ajeet Kumar Singh, Dorien Herremans, Brejesh lall
This initial image is then improved by conditioning on the text.
1 code implementation • 16 Jun 2020 • Hao Hao Tan, Yin-Jyun Luo, Dorien Herremans
We present a controllable neural audio synthesizer based on Gaussian Mixture Variational Autoencoders (GM-VAE), which can generate realistic piano performances in the audio domain that closely follows temporal conditions of two essential style features for piano performances: articulation and dynamics.
no code implementations • 16 Jun 2020 • Balamurali B. T, Edwin Jonathan Aslim, Yun Shu Lynn Ng, Tricia Li, Chuen Kuo, Jacob Shihang Chen, Dorien Herremans, Lay Guat Ng, Jer-Ming Chen
Information on liquid jet stream flow is crucial in many real world applications.
1 code implementation • 25 Jan 2020 • Kin Wai Cheuk, Kat Agres, Dorien Herremans
This paper thoroughly analyses the effect of different input representations on polyphonic multi-instrument music transcription.
Sound Audio and Speech Processing
1 code implementation • 27 Dec 2019 • Kin Wai Cheuk, Hans Anderson, Kat Agres, Dorien Herremans
First, it takes a lot of hard disk space to store different frequency domain representations.
no code implementations • 3 Dec 2019 • Yin-Jyun Luo, Chin-Chen Hsu, Kat Agres, Dorien Herremans
We propose a flexible framework that deals with both singer conversion and singers vocal technique conversion.
1 code implementation • 3 Oct 2019 • Rui Guo, Dorien Herremans, Thor Magnusson
We present a Python library, called Midi Miner, that can calculate tonal tension and classify different tracks.
1 code implementation • 1 Oct 2019 • Kin Wai Cheuk, Balamurali B. T., Gemma Roig, Dorien Herremans
When reducing the training data to only using the train set, our method results in 309 confusions for the Multi-target speaker identification task, which is 46% better than the baseline model.
1 code implementation • 16 Sep 2019 • Ha Thi Phuong Thao, Dorien Herremans, Gemma Roig
Interestingly, we also observe that the optical flow is more informative than the RGB in videos, and overall, models using audio features are more accurate than those based on video features when making the final prediction of evoked emotions.
no code implementations • 5 Sep 2019 • Abigail Lee-Leon, Chau Yuen, Dorien Herremans
The proposed method comprises of a ML based feature extraction method and classification technique.
no code implementations • 19 Jun 2019 • Yin-Jyun Luo, Kat Agres, Dorien Herremans
Specifically, we use two separate encoders to learn distinct latent spaces for timbre and pitch, which form Gaussian mixture components representing instrument identity and pitch, respectively.
no code implementations • 28 May 2019 • Balamurali BT, Kin Wah Edward Lin, Simon Lui, Jer-Ming Chen, Dorien Herremans
Finally, we evaluate the performance of our robust replay speaker detection system with a wide variety and different combinations of both extracted and machine learned audio features on the `out in the wild' ASVspoof 2017 dataset.
no code implementations • 17 May 2019 • Dorien herremans, David Martens, Kenneth Sörensen
Record companies invest billions of dollars in new talent around the globe each year.
1 code implementation • 12 Dec 2018 • Dorien Herremans, Elaine Chew
MorpheuS' novel framework has the ability to generate polyphonic pieces with a given tension profile and long- and short-term repeated pattern structures.
Sound Audio and Speech Processing
no code implementations • 11 Dec 2018 • Dorien Herremans, Ching-Hua Chuan, Elaine Chew
Digital advances have transformed the face of automatic music generation since its beginnings at the dawn of computing.
2 code implementations • 4 Dec 2018 • Kin Wah Edward Lin, Balamurali B. T., Enyan Koh, Simon Lui, Dorien Herremans
We present a unique neural network approach inspired by a technique that has revolutionized the field of vision: pixel-wise image classification, which we combine with cross entropy loss and pretraining of the CNN as an autoencoder on singing voice spectrograms.
no code implementations • 29 Nov 2018 • Ching-Hua Chuan, Kat Agres, Dorien Herremans
In this newly learned vector space, a metric based on cosine distance is able to distinguish between functional chord relationships, as well as harmonic associations in the music.
no code implementations • 28 Jun 2017 • Dorien Herremans, Ching-Hua Chuan
A visualization of the reduced vector space using t-distributed stochastic neighbor embedding shows that the resulting embedded vector space captures tonal relationships, even without any explicit information about the musical contents of the slices.
no code implementations • 27 Jun 2017 • Dorien Herremans, Ching-Hua Chuan
Proceedings of the First International Workshop on Deep Learning and Music, joint with IJCNN, Anchorage, US, May 17-18, 2017