Search Results for author: Xuanjun Chen

Found 12 papers, 5 papers with code

A Preliminary Exploration with GPT-4o Voice Mode

no code implementations14 Feb 2025 Yu-Xiang Lin, Chih-Kai Yang, Wei-Chih Chen, Chen-An Li, Chien-yu Huang, Xuanjun Chen, Hung-Yi Lee

Additionally, GPT-4o's safety mechanisms cause it to decline tasks like speaker identification, age classification, MOS prediction, and audio deepfake detection.

Age Classification Audio Deepfake Detection +7

Building a Taiwanese Mandarin Spoken Language Model: A First Attempt

1 code implementation11 Nov 2024 Chih-Kai Yang, Yu-Kuan Fu, Chen-An Li, Yi-Cheng Lin, Yu-Xiang Lin, Wei-Chih Chen, Ho Lam Chung, Chun-Yi Kuan, Wei-Ping Huang, Ke-Han Lu, Tzu-Quan Lin, Hsiu-Hsuan Wang, En-Pei Hu, Chan-Jan Hsu, Liang-Hsuan Tseng, I-Hsiang Chiu, Ulin Sanga, Xuanjun Chen, Po-chun Hsu, Shu-wen Yang, Hung-Yi Lee

This technical report presents our initial attempt to build a spoken large language model (LLM) for Taiwanese Mandarin, specifically tailored to enable real-time, speech-to-speech interaction in multi-turn conversations.

Decoder Language Modeling +2

Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec models

1 code implementation21 Sep 2024 Haibin Wu, Xuanjun Chen, Yi-Cheng Lin, KaiWei Chang, Jiawei Du, Ke-Han Lu, Alexander H. Liu, Ho-Lam Chung, Yuan-Kuei Wu, Dongchao Yang, Songxiang Liu, Yi-Chiao Wu, Xu Tan, James Glass, Shinji Watanabe, Hung-Yi Lee

Neural audio codec models are becoming increasingly important as they serve as tokenizers for audio, enabling efficient transmission or facilitating speech language modeling.

Language Modeling Language Modelling

Leveraging Joint Spectral and Spatial Learning with MAMBA for Multichannel Speech Enhancement

no code implementations16 Sep 2024 Wenze Ren, Haibin Wu, Yi-Cheng Lin, Xuanjun Chen, Rong Chao, Kuo-Hsuan Hung, You-Jin Li, Wen-Yuan Ting, Hsin-Min Wang, Yu Tsao

In multichannel speech enhancement, effectively capturing spatial and spectral information across different microphones is crucial for noise reduction.

Mamba Speech Enhancement

Singing Voice Graph Modeling for SingFake Detection

1 code implementation5 Jun 2024 Xuanjun Chen, Haibin Wu, Jyh-Shing Roger Jang, Hung-Yi Lee

Detecting singing voice deepfakes, or SingFake, involves determining the authenticity and copyright of a singing voice.

DeepFake Detection Face Swapping +1

Codec-SUPERB: An In-Depth Analysis of Sound Codec Models

1 code implementation20 Feb 2024 Haibin Wu, Ho-Lam Chung, Yi-Cheng Lin, Yuan-Kuei Wu, Xuanjun Chen, Yu-Chi Pai, Hsiu-Hsuan Wang, Kai-Wei Chang, Alexander H. Liu, Hung-Yi Lee

The sound codec's dual roles in minimizing data transmission latency and serving as tokenizers underscore its critical importance.

Towards audio language modeling - an overview

no code implementations20 Feb 2024 Haibin Wu, Xuanjun Chen, Yi-Cheng Lin, Kai-Wei Chang, Ho-Lam Chung, Alexander H. Liu, Hung-Yi Lee

Neural audio codecs are initially introduced to compress audio data into compact codes to reduce transmission latency.

Language Modeling Language Modelling

Multimodal Transformer Distillation for Audio-Visual Synchronization

2 code implementations27 Oct 2022 Xuanjun Chen, Haibin Wu, Chung-Che Wang, Hung-Yi Lee, Jyh-Shing Roger Jang

This paper proposed an MTDVocaLiST model, which is trained by our proposed multimodal Transformer distillation (MTD) loss.

Audio-Visual Synchronization

Push-Pull: Characterizing the Adversarial Robustness for Audio-Visual Active Speaker Detection

no code implementations3 Oct 2022 Xuanjun Chen, Haibin Wu, Helen Meng, Hung-Yi Lee, Jyh-Shing Roger Jang

Audio-visual active speaker detection (AVASD) is well-developed, and now is an indispensable front-end for several multi-modal applications.

Active Speaker Detection Adversarial Robustness +1

Adversarial Speaker Distillation for Countermeasure Model on Automatic Speaker Verification

no code implementations31 Mar 2022 Yen-Lun Liao, Xuanjun Chen, Chung-Che Wang, Jyh-Shing Roger Jang

The countermeasure (CM) model is developed to protect ASV systems from spoof attacks and prevent resulting personal information leakage in Automatic Speaker Verification (ASV) system.

Knowledge Distillation Speaker Verification

Cannot find the paper you are looking for? You can Submit a new open access paper.