Search Results for author: Jyh-Shing Roger Jang

Found 25 papers, 7 papers with code

Towards Generalized Source Tracing for Codec-Based Deepfake Speech

no code implementations8 Jun 2025 Xuanjun Chen, I-Ming Lin, Lin Zhang, Haibin Wu, Hung-Yi Lee, Jyh-Shing Roger Jang

Recent attempts at source tracing for codec-based deepfake speech (CodecFake), generated by neural audio codec-based speech generation (CoSG) models, have exhibited suboptimal performance.

Face Swapping

Knowledge Retrieval Based on Generative AI

no code implementations8 Jan 2025 Te-Lun Yang, Jyi-Shane Liu, Yuen-Hsien Tseng, Jyh-Shing Roger Jang

Using TTQA and TMMLU+ as evaluation datasets, the system employs BGE-M3 for dense vector retrieval to obtain highly relevant search results and BGE-reranker to reorder these results based on query relevance.

Multiple-choice Question Answering +3

BLAPose: Enhancing 3D Human Pose Estimation with Bone Length Adjustment

1 code implementation28 Oct 2024 Chih-Hsiang Hsu, Jyh-Shing Roger Jang

Our results demonstrate that existing 3D human pose estimation models can be significantly enhanced through this adjustment process.

3D Human Pose Estimation

Improving Real-Time Music Accompaniment Separation with MMDenseNet

no code implementations30 Jun 2024 Chun-Hsiang Wang, Chung-Che Wang, Jun-You Wang, Jyh-Shing Roger Jang, Yen-Hsun Chu

Source-to-distortion ratio, real-time factor, and optimal latency are employed to evaluate the performance.

Music Source Separation

Singing Voice Graph Modeling for SingFake Detection

1 code implementation5 Jun 2024 Xuanjun Chen, Haibin Wu, Jyh-Shing Roger Jang, Hung-Yi Lee

Detecting singing voice deepfakes, or SingFake, involves determining the authenticity and copyright of a singing voice.

DeepFake Detection Face Swapping +1

Novel Preprocessing Technique for Data Embedding in Engineering Code Generation Using Large Language Model

no code implementations27 Nov 2023 Yu-Chen Lin, Akhilesh Kumar, Norman Chang, Wenliang Zhang, Muhammad Zakir, Rucha Apte, Haiyang He, Chao Wang, Jyh-Shing Roger Jang

We present four main contributions to enhance the performance of Large Language Models (LLMs) in generating domain-specific code: (i) utilizing LLM-based data splitting and data renovation techniques to improve the semantic representation of embeddings' space; (ii) introducing the Chain of Density for Renovation Credibility (CoDRC), driven by LLMs, and the Adaptive Text Renovation (ATR) algorithm for assessing data renovation reliability; (iii) developing the Implicit Knowledge Expansion and Contemplation (IKEC) Prompt technique; and (iv) effectively refactoring existing scripts to generate new and high-quality scripts with LLMs.

Code Generation Language Modeling +4

Adapting pretrained speech model for Mandarin lyrics transcription and alignment

1 code implementation21 Nov 2023 Jun-You Wang, Chon-In Leong, Yu-Chen Lin, Li Su, Jyh-Shing Roger Jang

With the use of data augmentation and source separation model, results show that the proposed method achieves a character error rate of less than 18% on a Mandarin polyphonic dataset for lyrics transcription, and a mean absolute error of 0. 071 seconds for lyrics alignment.

Automatic Lyrics Transcription Data Augmentation

WC-SBERT: Zero-Shot Text Classification via SBERT with Self-Training for Wikipedia Categories

1 code implementation28 Jul 2023 Te-Yu Chi, Yu-Meng Tang, Chia-Wen Lu, Qiu-Xia Zhang, Jyh-Shing Roger Jang

To achieve this objective, we propose a novel self-training strategy that uses labels rather than text for training, significantly reducing the model's training time.

text-classification Text Classification +1

Multimodal Transformer Distillation for Audio-Visual Synchronization

2 code implementations27 Oct 2022 Xuanjun Chen, Haibin Wu, Chung-Che Wang, Hung-Yi Lee, Jyh-Shing Roger Jang

This paper proposed an MTDVocaLiST model, which is trained by our proposed multimodal Transformer distillation (MTD) loss.

Audio-Visual Synchronization

Push-Pull: Characterizing the Adversarial Robustness for Audio-Visual Active Speaker Detection

no code implementations3 Oct 2022 Xuanjun Chen, Haibin Wu, Helen Meng, Hung-Yi Lee, Jyh-Shing Roger Jang

Audio-visual active speaker detection (AVASD) is well-developed, and now is an indispensable front-end for several multi-modal applications.

Active Speaker Detection Adversarial Robustness +1

Adversarial Speaker Distillation for Countermeasure Model on Automatic Speaker Verification

no code implementations31 Mar 2022 Yen-Lun Liao, Xuanjun Chen, Chung-Che Wang, Jyh-Shing Roger Jang

The countermeasure (CM) model is developed to protect ASV systems from spoof attacks and prevent resulting personal information leakage in Automatic Speaker Verification (ASV) system.

Knowledge Distillation Speaker Verification

Learning to match transient sound events using attentional similarity for few-shot sound recognition

1 code implementation4 Dec 2018 Szu-Yu Chou, Kai-Hsiang Cheng, Jyh-Shing Roger Jang, Yi-Hsuan Yang

In this paper, we introduce a novel attentional similarity module for the problem of few-shot sound recognition.

Sound Audio and Speech Processing

Cannot find the paper you are looking for? You can Submit a new open access paper.