Search Results for author: Dorien Herremans

Found 61 papers, 33 papers with code

JamendoMaxCaps: A Large Scale Music-caption Dataset with Imputed Metadata

1 code implementation11 Feb 2025 Abhinaba Roy, Renhang Liu, Tongyu Lu, Dorien Herremans

We introduce JamendoMaxCaps, a large-scale music-caption dataset featuring over 200, 000 freely licensed instrumental tracks from the renowned Jamendo platform.

Language Modeling Language Modelling +3

Towards Unified Music Emotion Recognition across Dimensional and Categorical Models

1 code implementation6 Feb 2025 Jaeyong Kang, Dorien Herremans

Moreover, knowledge distillation is employed to transfer the knowledge of teacher models trained on individual datasets to a student model, enhancing its ability to generalize across multiple tasks.

Emotion Recognition Knowledge Distillation +1

ImprovNet: Generating Controllable Musical Improvisations with Iterative Corruption Refinement

1 code implementation6 Feb 2025 Keshav Bhandari, Sungkyun Chang, Tongyu Lu, Fareza R. Enus, Louis B. Bradshaw, Dorien Herremans, Simon Colton

However, in the realm of symbolic music, generating controllable and expressive performance-level style transfers for complete musical works remains challenging due to limited datasets, especially for genres such as jazz, and the lack of unified models that can handle multiple music generation tasks.

Music Generation Style Transfer

Text2midi: Generating Symbolic Music from Captions

1 code implementation21 Dec 2024 Keshav Bhandari, Abhinaba Roy, Kyra Wang, Geeta Puri, Simon Colton, Dorien Herremans

This paper introduces text2midi, an end-to-end model to generate MIDI files from textual descriptions.

Decoder

MIRFLEX: Music Information Retrieval Feature Library for Extraction

1 code implementation1 Nov 2024 Anuradha Chopra, Abhinaba Roy, Dorien Herremans

This paper introduces an extendable modular system that compiles a range of music feature extraction models to aid music information retrieval research.

Benchmarking Information Retrieval +4

DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech

no code implementations17 Oct 2024 Jan Melechovsky, Ambuj Mehrish, Berrak Sisman, Dorien Herremans

Recent advancements in Text-to-Speech (TTS) systems have enabled the generation of natural and expressive speech from textual input.

Disentanglement Quantization +2

Leveraging LLM Embeddings for Cross Dataset Label Alignment and Zero Shot Music Emotion Prediction

1 code implementation15 Oct 2024 Renhang Liu, Abhinaba Roy, Dorien Herremans

In this work, we present a novel method for music emotion recognition that leverages Large Language Model (LLM) embeddings for label alignment across multiple datasets and zero-shot prediction on novel categories.

Emotion Recognition Language Modeling +3

Prevailing Research Areas for Music AI in the Era of Foundation Models

no code implementations14 Sep 2024 Megan Wei, Mateusz Modrzejewski, Aswin Sivaraman, Dorien Herremans

In tandem with the recent advancements in foundation model research, there has been a surge of generative music AI applications within the past few years.

Survey

BandControlNet: Parallel Transformers-based Steerable Popular Music Generation with Fine-Grained Spatiotemporal Features

no code implementations15 Jul 2024 Jing Luo, Xinyu Yang, Dorien Herremans

Subsequently, we release BandControlNet, a conditional model based on parallel Transformers, to tackle the multiple music sequences and generate high-quality music samples that are conditioned to the given spatiotemporal control features.

Music Generation

DisfluencySpeech -- Single-Speaker Conversational Speech Dataset with Paralanguage

no code implementations13 Jun 2024 Kyra Wang, Dorien Herremans

Laughing, sighing, stuttering, and other forms of paralanguage do not contribute any direct lexical meaning to speech, but they provide crucial propositional context that aids semantic and pragmatic processes such as irony.

Sentence Text to Speech

DeepUnifiedMom: Unified Time-series Momentum Portfolio Construction via Multi-Task Learning with Multi-Gate Mixture of Experts

1 code implementation13 Jun 2024 Joel Ong, Dorien Herremans

This paper introduces DeepUnifiedMom, a deep learning framework that enhances portfolio management through a multi-task learning approach and a multi-gate mixture of experts.

Management Multi-Task Learning +1

Are We There Yet? A Brief Survey of Music Emotion Prediction Datasets, Models and Outstanding Challenges

1 code implementation13 Jun 2024 Jaeyong Kang, Dorien Herremans

Deep learning models for music have advanced drastically in recent years, but how good are machine learning models at capturing emotion, and what challenges are researchers facing?

MidiCaps: A large-scale MIDI dataset with text captions

2 code implementations4 Jun 2024 Jan Melechovsky, Abhinaba Roy, Dorien Herremans

This work aims to enable research that combines LLMs with symbolic music by presenting, the first openly available large-scale MIDI dataset with text captions.

Information Retrieval Music Information Retrieval

Natural Language Processing Methods for Symbolic Music Generation and Information Retrieval: a Survey

1 code implementation27 Feb 2024 Dinh-Viet-Toan Le, Louis Bigo, Mikaela Keller, Dorien Herremans

Music has been frequently compared to language, as they share several similarities, including sequential representations of text and music.

Information Retrieval Music Generation +2

Mustango: Toward Controllable Text-to-Music Generation

4 code implementations14 Nov 2023 Jan Melechovsky, Zixun Guo, Deepanway Ghosal, Navonil Majumder, Dorien Herremans, Soujanya Poria

Through extensive experiments, we show that the quality of the music generated by Mustango is state-of-the-art, and the controllability through music-specific text prompts greatly outperforms other models such as MusicGen and AudioLDM2.

Data Augmentation Denoising +4

Constructing Time-Series Momentum Portfolios with Deep Multi-Task Learning

no code implementations8 Jun 2023 Joel Ong, Dorien Herremans

The performance of existing TSMOM strategies, however, relies not only on the quality of the momentum signal but also on the efficacy of the volatility estimator.

Multi-Task Learning Time Series

Jointist: Simultaneous Improvement of Multi-instrument Transcription and Music Source Separation via Joint Training

no code implementations1 Feb 2023 Kin Wai Cheuk, Keunwoo Choi, Qiuqiang Kong, Bochen Li, Minz Won, Ju-Chiang Wang, Yun-Ning Hung, Dorien Herremans

Jointist consists of an instrument recognition module that conditions the other two modules: a transcription module that outputs instrument-specific piano rolls, and a source separation module that utilizes instrument information and transcription results.

Chord Recognition Instrument Recognition +1

SNIPER Training: Single-Shot Sparse Training for Text-to-Speech

no code implementations14 Nov 2022 Perry Lam, Huayun Zhang, Nancy F. Chen, Berrak Sisman, Dorien Herremans

Text-to-speech (TTS) models have achieved remarkable naturalness in recent years, yet like most deep neural models, they have more parameters than necessary.

Text to Speech

Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder

1 code implementation7 Nov 2022 Jan Melechovsky, Ambuj Mehrish, Berrak Sisman, Dorien Herremans

Accent plays a significant role in speech communication, influencing one's capability to understand as well as conveying a person's identity.

Speech Synthesis Text to Speech +1

Forecasting Bitcoin volatility spikes from whale transactions and CryptoQuant data using Synthesizer Transformer models

1 code implementation6 Oct 2022 Dorien Herremans, Kah Wee Low

Our results show that the model outperforms existing state-of-the-art models when forecasting extreme volatility spikes for Bitcoin using CryptoQuant data as well as whale-alert tweets.

Explainable Artificial Intelligence (XAI) Management

PreBit -- A multimodal model with Twitter FinBERT embeddings for extreme price movement prediction of Bitcoin

1 code implementation30 May 2022 Yanzhao Zou, Dorien Herremans

In our hybrid model, we use sentence-level FinBERT embeddings, pretrained on financial lexicons, so as to capture the full contents of the tweets and feed it to the model in an understandable way.

Sentence

Predicting emotion from music videos: exploring the relative contribution of visual and auditory information to affective responses

1 code implementation19 Feb 2022 Phoebe Chua, Dimos Makris, Dorien Herremans, Gemma Roig, Kat Agres

In this paper we present MusicVideos (MuVi), a novel dataset for affective multimedia content analysis to study how the auditory and visual modalities contribute to the perceived emotion of media.

Descriptive Feature Importance +2

Conditional Drums Generation using Compound Word Representations

1 code implementation9 Feb 2022 Dimos Makris, Guo Zixun, Maximos Kaliakatsos-Papakostas, Dorien Herremans

The field of automatic music composition has seen great progress in recent years, specifically with the invention of transformer-based architectures.

Decoder

AttendAffectNet–Emotion Prediction of Movie Viewers Using Multimodal Fusion with Self-Attention

1 code implementation Sensors 2021 Ha Thi Phuong Thao, B T Balamurali, Gemma Roig, Dorien Herremans

The models that use all visual, audio, and text features simultaneously as their inputs performed better than those using features extracted from each modality separately.

Representation Learning

aiSTROM -- A roadmap for developing a successful AI strategy

no code implementations25 Jun 2021 Dorien Herremans

This provides a unique and integrated approach that guides managers and lead developers through the various challenges in the implementation process.

Cultural Vocal Bursts Intensity Prediction

Deep Neural Network Based Respiratory Pathology Classification Using Cough Sounds

no code implementations23 Jun 2021 Balamurali B T, Hwan Ing Hee, Saumitra Kapoor, Oon Hoe Teoh, Sung Shin Teng, Khai Pin Lee, Dorien Herremans, Jer Ming Chen

The resulting trained model when trained for classifying two classes of coughs -- healthy or pathology (in general or belonging to a specific respiratory pathology), reaches accuracy exceeding 84\% when classifying cough to the label provided by the physicians' diagnosis.

Classification Sound Classification

Generating Lead Sheets with Affect: A Novel Conditional seq2seq Framework

1 code implementation27 Apr 2021 Dimos Makris, Kat R. Agres, Dorien Herremans

In this paper, we present a novel approach for calculating the valence (the positivity or negativity of the perceived emotion) of a chord progression within a lead sheet, using pre-defined mood tags proposed by music experts.

Machine Translation Music Generation +1

Underwater Acoustic Communication Receiver Using Deep Belief Network

no code implementations26 Feb 2021 Abigail Lee-Leon, Chau Yuen, Dorien Herremans

Our proposed receiver system comprises of DBN based de-noising and classification of the received signal.

AttendAffectNet: Self-Attention based Networks for Predicting Affective Responses from Movies

1 code implementation21 Oct 2020 Ha Thi Phuong Thao, Balamurali B. T., Dorien Herremans, Gemma Roig

In this work, we propose different variants of the self-attention based network for emotion prediction from movies, which we call AttendAffectNet.

Prediction Relation

The Effect of Spectrogram Reconstruction on Automatic Music Transcription: An Alternative Approach to Improve Transcription Accuracy

2 code implementations20 Oct 2020 Kin Wai Cheuk, Yin-Jyun Luo, Emmanouil Benetos, Dorien Herremans

We attempt to use only the pitch labels (together with spectrogram reconstruction loss) and explore how far this model can go without introducing supervised sub-tasks.

Music Transcription

Hit Song Prediction Based on Early Adopter Data and Audio Features

no code implementations16 Oct 2020 Dorien Herremans, Tom Bergmans

Billions of USD are invested in new artists and songs by the music industry every year.

A variational autoencoder for music generation controlled by tonal tension

1 code implementation13 Oct 2020 Rui Guo, Ivor Simpson, Thor Magnusson, Chris Kiefer, Dorien Herremans

Many of the music generation systems based on neural networks are fully autonomous and do not offer control over the generation process.

Sound Symbolic Computation Audio and Speech Processing

A dataset and classification model for Malay, Hindi, Tamil and Chinese music

no code implementations9 Sep 2020 Fajilatun Nahar, Kat Agres, Balamurali BT, Dorien Herremans

We use this new dataset to train different classification models to distinguish the origin of the music in terms of these ethnic groups.

Classification General Classification

Music FaderNets: Controllable Music Generation Based On High-Level Features via Low-Level Feature Modelling

1 code implementation29 Jul 2020 Hao Hao Tan, Dorien Herremans

Using arousal as an example of a high-level feature, we show that the "faders" of our model are disentangled and change linearly w. r. t.

Clustering Disentanglement +3

Generative Modelling for Controllable Audio Synthesis of Expressive Piano Performance

1 code implementation16 Jun 2020 Hao Hao Tan, Yin-Jyun Luo, Dorien Herremans

We present a controllable neural audio synthesizer based on Gaussian Mixture Variational Autoencoders (GM-VAE), which can generate realistic piano performances in the audio domain that closely follows temporal conditions of two essential style features for piano performances: articulation and dynamics.

Audio Synthesis

The impact of Audio input representations on neural network based music transcription

1 code implementation25 Jan 2020 Kin Wai Cheuk, Kat Agres, Dorien Herremans

This paper thoroughly analyses the effect of different input representations on polyphonic multi-instrument music transcription.

Sound Audio and Speech Processing

nnAudio: An on-the-fly GPU Audio to Spectrogram Conversion Toolbox Using 1D Convolution Neural Networks

1 code implementation27 Dec 2019 Kin Wai Cheuk, Hans Anderson, Kat Agres, Dorien Herremans

First, it takes a lot of hard disk space to store different frequency domain representations.

Midi Miner -- A Python library for tonal tension and track classification

1 code implementation3 Oct 2019 Rui Guo, Dorien Herremans, Thor Magnusson

We present a Python library, called Midi Miner, that can calculate tonal tension and classify different tracks.

General Classification Music Generation

Latent space representation for multi-target speaker detection and identification with a sparse dataset using Triplet neural networks

1 code implementation1 Oct 2019 Kin Wai Cheuk, Balamurali B. T., Gemma Roig, Dorien Herremans

When reducing the training data to only using the train set, our method results in 309 confusions for the Multi-target speaker identification task, which is 46% better than the baseline model.

Speaker Identification Speaker Recognition +1

Multimodal Deep Models for Predicting Affective Responses Evoked by Movies

1 code implementation16 Sep 2019 Ha Thi Phuong Thao, Dorien Herremans, Gemma Roig

Interestingly, we also observe that the optical flow is more informative than the RGB in videos, and overall, models using audio features are more accurate than those based on video features when making the final prediction of evoked emotions.

Optical Flow Estimation

Learning Disentangled Representations of Timbre and Pitch for Musical Instrument Sounds Using Gaussian Mixture Variational Autoencoders

no code implementations19 Jun 2019 Yin-Jyun Luo, Kat Agres, Dorien Herremans

Specifically, we use two separate encoders to learn distinct latent spaces for timbre and pitch, which form Gaussian mixture components representing instrument identity and pitch, respectively.

Decoder

Towards robust audio spoofing detection: a detailed comparison of traditional and learned features

no code implementations28 May 2019 Balamurali BT, Kin Wah Edward Lin, Simon Lui, Jer-Ming Chen, Dorien Herremans

Finally, we evaluate the performance of our robust replay speaker detection system with a wide variety and different combinations of both extracted and machine learned audio features on the `out in the wild' ASVspoof 2017 dataset.

Speaker Verification

Dance Hit Song Prediction

no code implementations17 May 2019 Dorien herremans, David Martens, Kenneth Sörensen

Record companies invest billions of dollars in new talent around the globe each year.

General Classification Position +1

MorpheuS: generating structured music with constrained patterns and tension

1 code implementation12 Dec 2018 Dorien Herremans, Elaine Chew

MorpheuS' novel framework has the ability to generate polyphonic pieces with a given tension profile and long- and short-term repeated pattern structures.

Sound Audio and Speech Processing

A Functional Taxonomy of Music Generation Systems

no code implementations11 Dec 2018 Dorien Herremans, Ching-Hua Chuan, Elaine Chew

Digital advances have transformed the face of automatic music generation since its beginnings at the dawn of computing.

Music Generation

Singing Voice Separation Using a Deep Convolutional Neural Network Trained by Ideal Binary Mask and Cross Entropy

2 code implementations4 Dec 2018 Kin Wah Edward Lin, Balamurali B. T., Enyan Koh, Simon Lui, Dorien Herremans

We present a unique neural network approach inspired by a technique that has revolutionized the field of vision: pixel-wise image classification, which we combine with cross entropy loss and pretraining of the CNN as an autoencoder on singing voice spectrograms.

Data Augmentation General Classification +4

From Context to Concept: Exploring Semantic Relationships in Music with Word2Vec

no code implementations29 Nov 2018 Ching-Hua Chuan, Kat Agres, Dorien Herremans

In this newly learned vector space, a metric based on cosine distance is able to distinguish between functional chord relationships, as well as harmonic associations in the music.

Music Generation

Modeling Musical Context with Word2vec

no code implementations28 Jun 2017 Dorien Herremans, Ching-Hua Chuan

A visualization of the reduced vector space using t-distributed stochastic neighbor embedding shows that the resulting embedded vector space captures tonal relationships, even without any explicit information about the musical contents of the slices.

Proceedings of the First International Workshop on Deep Learning and Music

no code implementations27 Jun 2017 Dorien Herremans, Ching-Hua Chuan

Proceedings of the First International Workshop on Deep Learning and Music, joint with IJCNN, Anchorage, US, May 17-18, 2017

Deep Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.