Search Results for author: Shiyin Kang

Found 17 papers, 5 papers with code

ChatMusician: Understanding and Generating Music Intrinsically with LLM

1 code implementation • 25 Feb 2024 • Ruibin Yuan, Hanfeng Lin, Yi Wang, Zeyue Tian, Shangda Wu, Tianhao Shen, Ge Zhang, Yuhang Wu, Cong Liu, Ziya Zhou, Ziyang Ma, Liumeng Xue, Ziyu Wang, Qin Liu, Tianyu Zheng, Yizhi Li, Yinghao Ma, Yiming Liang, Xiaowei Chi, Ruibo Liu, Zili Wang, Pengfei Li, Jingcheng Wu, Chenghua Lin, Qifeng Liu, Tao Jiang, Wenhao Huang, Wenhu Chen, Emmanouil Benetos, Jie Fu, Gus Xia, Roger Dannenberg, Wei Xue, Shiyin Kang, Yike Guo

It is based on continual pre-training and finetuning LLaMA2 on a text-compatible music representation, ABC notation, and the music is treated as a second language.

Text Generation

140

Paper
Code

SCNet: Sparse Compression Network for Music Source Separation

no code implementations • 24 Jan 2024 • Weinan Tong, Jiaxu Zhu, Jun Chen, Shiyin Kang, Tao Jiang, Yang Li, Zhiyong Wu, Helen Meng

We use a higher compression ratio on subbands with less information to improve the information density and focus on modeling subbands with more information.

Music Source Separation

Paper
Add Code

Multi-view MidiVAE: Fusing Track- and Bar-view Representations for Long Multi-track Symbolic Music Generation

no code implementations • 15 Jan 2024 • Zhiwei Lin, Jun Chen, Boshi Tang, Binzhu Sha, Jing Yang, Yaolong Ju, Fan Fan, Shiyin Kang, Zhiyong Wu, Helen Meng

Variational Autoencoders (VAEs) constitute a crucial component of neural symbolic music generation, among which some works have yielded outstanding results and attracted considerable attention.

Music Generation

Paper
Add Code

AdaMesh: Personalized Facial Expressions and Head Poses for Adaptive Speech-Driven 3D Facial Animation

no code implementations • 11 Oct 2023 • Liyang Chen, Weihong Bao, Shun Lei, Boshi Tang, Zhiyong Wu, Shiyin Kang, HaoZhi Huang

Existing works mostly neglect the person-specific talking style in generation, including facial expression and head pose styles.

Paper
Add Code

Towards Spontaneous Style Modeling with Semi-supervised Pre-training for Conversational Text-to-Speech Synthesis

no code implementations • 31 Aug 2023 • Weiqin Li, Shun Lei, Qiaochu Huang, Yixuan Zhou, Zhiyong Wu, Shiyin Kang, Helen Meng

The spontaneous behavior that often occurs in conversations makes speech more human-like compared to reading-style.

Expressive Speech Synthesis Sentence +1

Paper
Add Code

Towards Improving the Expressiveness of Singing Voice Synthesis with BERT Derived Semantic Information

no code implementations • 31 Aug 2023 • Shaohuan Zhou, Shun Lei, Weiya You, Deyi Tuo, Yuren You, Zhiyong Wu, Shiyin Kang, Helen Meng

This paper presents an end-to-end high-quality singing voice synthesis (SVS) system that uses bidirectional encoder representation from Transformers (BERT) derived semantic embeddings to improve the expressiveness of the synthesized singing voice.

Singing Voice Synthesis

Paper
Add Code

Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information

no code implementations • 31 Aug 2023 • Jie Chen, Changhe Song, Deyi Tuo, Xixin Wu, Shiyin Kang, Zhiyong Wu, Helen Meng

For text-to-speech (TTS) synthesis, prosodic structure prediction (PSP) plays an important role in producing natural and intelligible speech.

Multi-Task Learning

Paper
Add Code

CB-Conformer: Contextual biasing Conformer for biased word recognition

1 code implementation • 19 Apr 2023 • Yaoxun Xu, Baiji Liu, Qiaochu Huang and, Xingchen Song, Zhiyong Wu, Shiyin Kang, Helen Meng

In this work, we propose CB-Conformer to improve biased word recognition by introducing the Contextual Biasing Module and the Self-Adaptive Language Model to vanilla Conformer.

Automatic Speech Recognition Language Modelling +2

Paper
Code

Disentangleing Content and Fine-grained Prosody Information via Hybrid ASR Bottleneck Features for Voice Conversion

no code implementations • 24 Mar 2022 • Xintao Zhao, Feng Liu, Changhe Song, Zhiyong Wu, Shiyin Kang, Deyi Tuo, Helen Meng

In this paper, we proposed an any-to-one VC method using hybrid bottleneck features extracted from CTC-BNFs and CE-BNFs to complement each other advantages.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Towards Expressive Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis

no code implementations • 23 Mar 2022 • Shun Lei, Yixuan Zhou, Liyang Chen, Zhiyong Wu, Shiyin Kang, Helen Meng

In this paper, we propose a hierarchical framework to model speaking style from context.

Expressive Speech Synthesis Knowledge Distillation +1

Paper
Add Code

FullSubNet+: Channel Attention FullSubNet with Complex Spectrograms for Speech Enhancement

2 code implementations • 23 Mar 2022 • Jun Chen, Zilin Wang, Deyi Tuo, Zhiyong Wu, Shiyin Kang, Helen Meng

Previously proposed FullSubNet has achieved outstanding performance in Deep Noise Suppression (DNS) Challenge and attracted much attention.

Speech Enhancement

211

Paper
Code

Adversarially learning disentangled speech representations for robust multi-factor voice conversion

no code implementations • 30 Jan 2021 • Jie Wang, Jingbei Li, Xintao Zhao, Zhiyong Wu, Shiyin Kang, Helen Meng

To increase the robustness of highly controllable style transfer on multiple factors in VC, we propose a disentangled speech representation learning framework based on adversarial learning.

Representation Learning Style Transfer +1

Paper
Add Code

Speaker Independent and Multilingual/Mixlingual Speech-Driven Talking Head Generation Using Phonetic Posteriorgrams

no code implementations • 20 Jun 2020 • Huirong Huang, Zhiyong Wu, Shiyin Kang, Dongyang Dai, Jia Jia, Tianxiao Fu, Deyi Tuo, Guangzhi Lei, Peng Liu, Dan Su, Dong Yu, Helen Meng

Recent approaches mainly have following limitations: 1) most speaker-independent methods need handcrafted features that are time-consuming to design or unreliable; 2) there is no convincing method to support multilingual or mixlingual speech as input.

Talking Head Generation

Paper
Add Code

Audio-visual Recognition of Overlapped speech for the LRS2 dataset

no code implementations • 6 Jan 2020 • Jianwei Yu, Shi-Xiong Zhang, Jian Wu, Shahram Ghorbani, Bo Wu, Shiyin Kang, Shansong Liu, Xunying Liu, Helen Meng, Dong Yu

Experiments on overlapped speech simulated from the LRS2 dataset suggest the proposed AVSR system outperformed the audio only baseline LF-MMI DNN system by up to 29. 98\% absolute in word error rate (WER) reduction, and produced recognition performance comparable to a more complex pipelined system.

Ranked #4 on Audio-Visual Speech Recognition on LRS2

Audio-Visual Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

DurIAN: Duration Informed Attention Network For Multimodal Synthesis

4 code implementations • 4 Sep 2019 • Chengzhu Yu, Heng Lu, Na Hu, Meng Yu, Chao Weng, Kun Xu, Peng Liu, Deyi Tuo, Shiyin Kang, Guangzhi Lei, Dan Su, Dong Yu

In this paper, we present a generic and robust multimodal synthesis system that produces highly natural speech and facial expression simultaneously.

Speech Synthesis

181

Paper
Code

Maximizing Mutual Information for Tacotron

2 code implementations • 30 Aug 2019 • Peng Liu, Xixin Wu, Shiyin Kang, Guangzhi Li, Dan Su, Dong Yu

End-to-end speech synthesis methods already achieve close-to-human quality performance.

Attribute Speech Synthesis

Paper
Code

Neural Network Language Modeling with Letter-based Features and Importance Sampling

no code implementations • ICASSP 2018 • Hainan Xu, Ke Li, Yiming Wang, Jian Wang, Shiyin Kang, Xie Chen, Daniel Povey, Sanjeev Khudanpur

In this paper we describe an extension of the Kaldi software toolkit to support neural-based language modeling, intended for use in automatic speech recognition (ASR) and related tasks.

Ranked #36 on Speech Recognition on LibriSpeech test-other (using extra training data)

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.