no code implementations • 27 Sep 2023 • Frank Cwitkowitz, Kin Wai Cheuk, Woosung Choi, Marco A. Martínez-Ramírez, Keisuke Toyama, Wei-Hsiang Liao, Yuki Mitsufuji
Several works have explored multi-instrument transcription as a means to bolster the performance of models on low-resource tasks, but these methods face the same data availability issues.
no code implementations • 1 Feb 2023 • Kin Wai Cheuk, Keunwoo Choi, Qiuqiang Kong, Bochen Li, Minz Won, Ju-Chiang Wang, Yun-Ning Hung, Dorien Herremans
Jointist consists of an instrument recognition module that conditions the other two modules: a transcription module that outputs instrument-specific piano rolls, and a source separation module that utilizes instrument information and transcription results.
no code implementations • 11 Oct 2022 • Kin Wai Cheuk, Ryosuke Sawata, Toshimitsu Uesaka, Naoki Murata, Naoya Takahashi, Shusuke Takahashi, Dorien Herremans, Yuki Mitsufuji
In this paper we propose a novel generative approach, DiffRoll, to tackle automatic music transcription (AMT).
no code implementations • 22 Jun 2022 • Kin Wai Cheuk, Keunwoo Choi, Qiuqiang Kong, Bochen Li, Minz Won, Amy Hung, Ju-Chiang Wang, Dorien Herremans
However, its novelty necessitates a new perspective on how to evaluate such a model.
no code implementations • 11 Jul 2021 • Kin Wai Cheuk, Dorien Herremans, Li Su
Most of the current supervised automatic music transcription (AMT) models lack the ability to generalize.
2 code implementations • 20 Oct 2020 • Kin Wai Cheuk, Yin-Jyun Luo, Emmanouil Benetos, Dorien Herremans
We attempt to use only the pitch labels (together with spectrogram reconstruction loss) and explore how far this model can go without introducing supervised sub-tasks.
1 code implementation • 25 Jan 2020 • Kin Wai Cheuk, Kat Agres, Dorien Herremans
This paper thoroughly analyses the effect of different input representations on polyphonic multi-instrument music transcription.
Sound Audio and Speech Processing
1 code implementation • 27 Dec 2019 • Kin Wai Cheuk, Hans Anderson, Kat Agres, Dorien Herremans
First, it takes a lot of hard disk space to store different frequency domain representations.
1 code implementation • 1 Oct 2019 • Kin Wai Cheuk, Balamurali B. T., Gemma Roig, Dorien Herremans
When reducing the training data to only using the train set, our method results in 309 confusions for the Multi-target speaker identification task, which is 46% better than the baseline model.