Search Results for author: Daniel Stoller

Found 10 papers, 8 papers with code

LLark: A Multimodal Instruction-Following Language Model for Music

1 code implementation • 11 Oct 2023 • Josh Gardner, Simon Durand, Daniel Stoller, Rachel M. Bittner

Music has a unique and complex structure which is challenging for both expert humans and existing AI systems to understand, and presents unique challenges relative to other forms of audio.

Instruction Following Language Modelling

239

Paper
Code

Contrastive Learning-Based Audio to Lyrics Alignment for Multiple Languages

1 code implementation • 13 Jun 2023 • Simon Durand, Daniel Stoller, Sebastian Ewert

This way, we obtain a novel system that is simple to train end-to-end, can make use of weakly annotated training data, jointly learns a powerful text model, and is tailored to alignment.

Contrastive Learning speech-recognition +1

Paper
Code

Seq-U-Net: A One-Dimensional Causal U-Net for Efficient Sequence Modelling

1 code implementation • 14 Nov 2019 • Daniel Stoller, Mi Tian, Sebastian Ewert, Simon Dixon

In comparison to TCN and Wavenet, our network consistently saves memory and computation time, with speed-ups for training and inference of over 4x in the audio generation experiment in particular, while achieving a comparable performance in all tasks.

Ranked #2 on Music Modeling on Nottingham

Audio Generation Causal Language Modeling +2

Paper
Code

Training Generative Adversarial Networks from Incomplete Observations using Factorised Discriminators

1 code implementation • ICLR 2020 • Daniel Stoller, Sebastian Ewert, Simon Dixon

We apply our method to image generation, image segmentation and audio source separation, and obtain improved performance over a standard GAN when additional incomplete training examples are available.

Audio Source Separation Image Generation +3

Paper
Code

GAN-based Generation and Automatic Selection of Explanations for Neural Networks

no code implementations • 21 Apr 2019 • Saumitra Mishra, Daniel Stoller, Emmanouil Benetos, Bob L. Sturm, Simon Dixon

However, this requires a careful selection of hyper-parameters to generate interpretable examples for each neuron of interest, and current methods rely on a manual, qualitative evaluation of each setting, which is prohibitively slow.

Paper
Add Code

Ensemble Models for Spoofing Detection in Automatic Speaker Verification

1 code implementation • 9 Apr 2019 • Bhusan Chettri, Daniel Stoller, Veronica Morfi, Marco A. Martínez Ramírez, Emmanouil Benetos, Bob L. Sturm

Our ensemble model outperforms all our single models and the baselines from the challenge for both attack types.

Audio and Speech Processing Sound

Paper
Code

End-to-end Lyrics Alignment for Polyphonic Music Using an Audio-to-Character Recognition Model

2 code implementations • 18 Feb 2019 • Daniel Stoller, Simon Durand, Sebastian Ewert

Time-aligned lyrics can enrich the music listening experience by enabling karaoke, text-based song retrieval and intra-song navigation, and other applications.

Retrieval

Paper
Code

Wave-U-Net: A Multi-Scale Neural Network for End-to-End Audio Source Separation

9 code implementations • 8 Jun 2018 • Daniel Stoller, Sebastian Ewert, Simon Dixon

Models for audio source separation usually operate on the magnitude spectrum, which ignores phase information and makes separation performance dependant on hyper-parameters for the spectral front-end.

Ranked #27 on Music Source Separation on MUSDB18

Audio Source Separation Music Source Separation

791

Paper
Code

Jointly Detecting and Separating Singing Voice: A Multi-Task Approach

no code implementations • 5 Apr 2018 • Daniel Stoller, Sebastian Ewert, Simon Dixon

A main challenge in applying deep learning to music processing is the availability of training data.

Action Detection Activity Detection +1

Paper
Add Code

Adversarial Semi-Supervised Audio Source Separation applied to Singing Voice Extraction

3 code implementations • 31 Oct 2017 • Daniel Stoller, Sebastian Ewert, Simon Dixon

Based on this idea, we drive the separator towards outputs deemed as realistic by discriminator networks that are trained to tell apart real from separator samples.

Audio Source Separation Data Augmentation +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.