Search Results for author: Yi-Cheng Lin

Found 14 papers, 6 papers with code

BreezyVoice: Adapting TTS for Taiwanese Mandarin with Enhanced Polyphone Disambiguation -- Challenges and Insights

no code implementations29 Jan 2025 Chan-Jan Hsu, Yi-Cheng Lin, Chia-Chun Lin, Wei-Chih Chen, Ho Lam Chung, Chen-An Li, Yi-Chang Chen, Chien-Yu Yu, Ming-Ji Lee, Chien-Cheng Chen, Ru-Heng Huang, Hung-Yi Lee, Da-Shan Shiu

We present BreezyVoice, a Text-to-Speech (TTS) system specifically adapted for Taiwanese Mandarin, highlighting phonetic control abilities to address the unique challenges of polyphone disambiguation in the language.

Language Modeling Language Modelling +3

Building a Taiwanese Mandarin Spoken Language Model: A First Attempt

1 code implementation11 Nov 2024 Chih-Kai Yang, Yu-Kuan Fu, Chen-An Li, Yi-Cheng Lin, Yu-Xiang Lin, Wei-Chih Chen, Ho Lam Chung, Chun-Yi Kuan, Wei-Ping Huang, Ke-Han Lu, Tzu-Quan Lin, Hsiu-Hsuan Wang, En-Pei Hu, Chan-Jan Hsu, Liang-Hsuan Tseng, I-Hsiang Chiu, Ulin Sanga, Xuanjun Chen, Po-chun Hsu, Shu-wen Yang, Hung-Yi Lee

This technical report presents our initial attempt to build a spoken large language model (LLM) for Taiwanese Mandarin, specifically tailored to enable real-time, speech-to-speech interaction in multi-turn conversations.

Decoder Language Modeling +2

Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec models

1 code implementation21 Sep 2024 Haibin Wu, Xuanjun Chen, Yi-Cheng Lin, KaiWei Chang, Jiawei Du, Ke-Han Lu, Alexander H. Liu, Ho-Lam Chung, Yuan-Kuei Wu, Dongchao Yang, Songxiang Liu, Yi-Chiao Wu, Xu Tan, James Glass, Shinji Watanabe, Hung-Yi Lee

Neural audio codec models are becoming increasingly important as they serve as tokenizers for audio, enabling efficient transmission or facilitating speech language modeling.

Language Modeling Language Modelling

Leveraging Joint Spectral and Spatial Learning with MAMBA for Multichannel Speech Enhancement

no code implementations16 Sep 2024 Wenze Ren, Haibin Wu, Yi-Cheng Lin, Xuanjun Chen, Rong Chao, Kuo-Hsuan Hung, You-Jin Li, Wen-Yuan Ting, Hsin-Min Wang, Yu Tsao

In multichannel speech enhancement, effectively capturing spatial and spectral information across different microphones is crucial for noise reduction.

Mamba Speech Enhancement

Listen and Speak Fairly: A Study on Semantic Gender Bias in Speech Integrated Large Language Models

1 code implementation9 Jul 2024 Yi-Cheng Lin, Tzu-Quan Lin, Chih-Kai Yang, Ke-Han Lu, Wei-Chih Chen, Chun-Yi Kuan, Hung-Yi Lee

Speech Integrated Large Language Models (SILLMs) combine large language models with speech perception to perform diverse tasks, such as emotion recognition to speaker verification, demonstrating universal audio understanding capability.

coreference-resolution Emotion Recognition +4

On the social bias of speech self-supervised models

no code implementations7 Jun 2024 Yi-Cheng Lin, Tzu-Quan Lin, Hsi-Che Lin, Andy T. Liu, Hung-Yi Lee

We probe how various factors, such as model architecture, size, and training methodologies, influence the propagation of social bias within these models.

Model Compression Self-Supervised Learning

Emo-bias: A Large Scale Evaluation of Social Bias on Speech Emotion Recognition

no code implementations7 Jun 2024 Yi-Cheng Lin, Haibin Wu, Huang-Cheng Chou, Chi-Chun Lee, Hung-Yi Lee

The rapid growth of Speech Emotion Recognition (SER) has diverse global applications, from improving human-computer interactions to aiding mental health diagnostics.

Self-Supervised Learning Speech Emotion Recognition

Towards audio language modeling - an overview

no code implementations20 Feb 2024 Haibin Wu, Xuanjun Chen, Yi-Cheng Lin, Kai-Wei Chang, Ho-Lam Chung, Alexander H. Liu, Hung-Yi Lee

Neural audio codecs are initially introduced to compress audio data into compact codes to reduce transmission latency.

Language Modeling Language Modelling

Codec-SUPERB: An In-Depth Analysis of Sound Codec Models

1 code implementation20 Feb 2024 Haibin Wu, Ho-Lam Chung, Yi-Cheng Lin, Yuan-Kuei Wu, Xuanjun Chen, Yu-Chi Pai, Hsiu-Hsuan Wang, Kai-Wei Chang, Alexander H. Liu, Hung-Yi Lee

The sound codec's dual roles in minimizing data transmission latency and serving as tokenizers underscore its critical importance.

Cannot find the paper you are looking for? You can Submit a new open access paper.