1 code implementation • 26 Aug 2024 • Yinghao Ma, Anders Øland, Anton Ragni, Bleiz MacSen Del Sette, Charalampos Saitis, Chris Donahue, Chenghua Lin, Christos Plachouras, Emmanouil Benetos, Elona Shatri, Fabio Morreale, Ge Zhang, György Fazekas, Gus Xia, huan zhang, Ilaria Manco, Jiawen Huang, Julien Guinot, Liwei Lin, Luca Marinelli, Max W. Y. Lam, Megha Sharma, Qiuqiang Kong, Roger B. Dannenberg, Ruibin Yuan, Shangda Wu, Shih-Lun Wu, Shuqi Dai, Shun Lei, Shiyin Kang, Simon Dixon, Wenhu Chen, Wenhao Huang, Xingjian Du, Xingwei Qu, Xu Tan, Yizhi Li, Zeyue Tian, Zhiyong Wu, Zhizheng Wu, Ziyang Ma, Ziyu Wang
In recent years, foundation models (FMs) such as large language models (LLMs) and latent diffusion models (LDMs) have profoundly impacted diverse sectors, including music.
1 code implementation • 23 Jul 2024 • Fang-Duo Tsai, Shih-Lun Wu, Haven Kim, Bo-Yu Chen, Hao-Chung Cheng, Yi-Hsuan Yang
Text-to-music models allow users to generate nearly realistic musical audio with textual commands.
1 code implementation • 16 Jun 2023 • Shih-Lun Wu, Yi-Hui Chou, Liangze Li
PhotoBook is a collaborative dialogue game where two players receive private, partially-overlapping sets of images and resolve which images they have in common.
no code implementations • 2 Jun 2023 • Yosuke Kashiwagi, Siddhant Arora, Hayato Futami, Jessica Huynh, Shih-Lun Wu, Yifan Peng, Brian Yan, Emiru Tsunoo, Shinji Watanabe
We reduce the model size by applying tensor decomposition to the Conformer and E-Branchformer architectures used in our E2E SLU models.
no code implementations • 2 May 2023 • Siddhant Arora, Hayato Futami, Shih-Lun Wu, Jessica Huynh, Yifan Peng, Yosuke Kashiwagi, Emiru Tsunoo, Brian Yan, Shinji Watanabe
Recently there have been efforts to introduce new benchmark tasks for spoken language understanding (SLU), like semantic parsing.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 2 May 2023 • Hayato Futami, Jessica Huynh, Siddhant Arora, Shih-Lun Wu, Yosuke Kashiwagi, Yifan Peng, Brian Yan, Emiru Tsunoo, Shinji Watanabe
In the track, we adopt a pipeline approach of ASR and NLU.
1 code implementation • 17 Sep 2022 • Shih-Lun Wu, Yi-Hsuan Yang
Even with strong sequence models like Transformers, generating expressive piano performances with long-range musical structures remains challenging.
1 code implementation • 7 Nov 2021 • Yi-Jen Shih, Shih-Lun Wu, Frank Zalkow, Meinard Müller, Yi-Hsuan Yang
To condition the generation process of such a model with a user-specified sequence, a popular approach is to take that conditioning sequence as a priming sequence and ask a Transformer decoder to generate a continuation.
Music Generation Representation Learning Sound Multimedia Audio and Speech Processing
1 code implementation • 18 May 2021 • Antoine Liutkus, Ondřej Cífka, Shih-Lun Wu, Umut Şimşekli, Yi-Hsuan Yang, Gaël Richard
Recent advances in Transformer models allow for unprecedented sequence lengths, due to linear space and time complexity.
1 code implementation • 10 May 2021 • Shih-Lun Wu, Yi-Hsuan Yang
Transformers and variational autoencoders (VAE) have been extensively employed for symbolic (e. g., MIDI) domain music generation.
no code implementations • 23 Nov 2020 • Shih-Lun Wu, Hsiao-Yen Tung, Yu-Lun Hsu
The quality grading of mangoes is a crucial task for mango growers as it vastly affects their profit.
2 code implementations • 4 Aug 2020 • Shih-Lun Wu, Yi-Hsuan Yang
This paper presents the Jazz Transformer, a generative model that utilizes a neural sequence model called the Transformer-XL for modeling lead sheets of Jazz music.