Search Results for author: Haibin Wu

Found 35 papers, 13 papers with code

CodecFake: Enhancing Anti-Spoofing Models Against Deepfake Audios from Codec-Based Speech Synthesis Systems

no code implementations11 Jun 2024 Haibin Wu, Yuan Tseng, Hung-Yi Lee

Additionally, we verify that anti-spoofing models trained on commonly used datasets cannot detect synthesized speech from current codec-based speech generation systems.

Audio Synthesis Face Swapping +1

Emo-bias: A Large Scale Evaluation of Social Bias on Speech Emotion Recognition

no code implementations7 Jun 2024 Yi-Cheng Lin, Haibin Wu, Huang-Cheng Chou, Chi-Chun Lee, Hung-Yi Lee

The rapid growth of Speech Emotion Recognition (SER) has diverse global applications, from improving human-computer interactions to aiding mental health diagnostics.

Self-Supervised Learning Speech Emotion Recognition

Singing Voice Graph Modeling for SingFake Detection

1 code implementation5 Jun 2024 Xuanjun Chen, Haibin Wu, Jyh-Shing Roger Jang, Hung-Yi Lee

Detecting singing voice deepfakes, or SingFake, involves determining the authenticity and copyright of a singing voice.

DeepFake Detection Face Swapping

Codec-SUPERB: An In-Depth Analysis of Sound Codec Models

1 code implementation20 Feb 2024 Haibin Wu, Ho-Lam Chung, Yi-Cheng Lin, Yuan-Kuei Wu, Xuanjun Chen, Yu-Chi Pai, Hsiu-Hsuan Wang, Kai-Wei Chang, Alexander H. Liu, Hung-Yi Lee

The sound codec's dual roles in minimizing data transmission latency and serving as tokenizers underscore its critical importance.

Towards audio language modeling - an overview

no code implementations20 Feb 2024 Haibin Wu, Xuanjun Chen, Yi-Cheng Lin, Kai-Wei Chang, Ho-Lam Chung, Alexander H. Liu, Hung-Yi Lee

Neural audio codecs are initially introduced to compress audio data into compact codes to reduce transmission latency.

Language Modelling

Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech

1 code implementation18 Sep 2023 Chien-yu Huang, Ke-Han Lu, Shih-Heng Wang, Chi-Yuan Hsiao, Chun-Yi Kuan, Haibin Wu, Siddhant Arora, Kai-Wei Chang, Jiatong Shi, Yifan Peng, Roshan Sharma, Shinji Watanabe, Bhiksha Ramakrishnan, Shady Shehata, Hung-Yi Lee

To achieve comprehensive coverage of diverse speech tasks and harness instruction tuning, we invite the community to collaborate and contribute, facilitating the dynamic growth of the benchmark.

SpeechGen: Unlocking the Generative Power of Speech Language Models with Prompts

no code implementations3 Jun 2023 Haibin Wu, Kai-Wei Chang, Yuan-Kuei Wu, Hung-Yi Lee

In this paper, we present pioneering research that explores the application of prompt tuning to stimulate speech LMs for various generation tasks, within a unified framework called SpeechGen, with around 10M trainable parameters.

Open-Ended Question Answering

Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with a Sidecar Separator

no code implementations25 May 2023 Lingwei Meng, Jiawen Kang, Mingyu Cui, Haibin Wu, Xixin Wu, Helen Meng

Extending on this, we incorporate a diarization branch into the Sidecar, allowing for unified modeling of both ASR and diarization with a negligible overhead of only 768 parameters.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

The defender's perspective on automatic speaker verification: An overview

no code implementations22 May 2023 Haibin Wu, Jiawen Kang, Lingwei Meng, Helen Meng, Hung-Yi Lee

Automatic speaker verification (ASV) plays a critical role in security-sensitive environments.

Speaker Verification

Rethinking complex-valued deep neural networks for monaural speech enhancement

no code implementations11 Jan 2023 Haibin Wu, Ke Tan, Buye Xu, Anurag Kumar, Daniel Wong

By comparing complex- and real-valued versions of fundamental building blocks in the recently developed gated convolutional recurrent network (GCRN), we show how different mechanisms for basic blocks affect the performance.

Open-Ended Question Answering Speech Enhancement

Multimodal Transformer Distillation for Audio-Visual Synchronization

2 code implementations27 Oct 2022 Xuanjun Chen, Haibin Wu, Chung-Che Wang, Hung-Yi Lee, Jyh-Shing Roger Jang

This paper proposed an MTDVocaLiST model, which is trained by our proposed multimodal Transformer distillation (MTD) loss.

Audio-Visual Synchronization

Tackling Spoofing-Aware Speaker Verification with Multi-Model Fusion

no code implementations18 Jun 2022 Haibin Wu, Jiawen Kang, Lingwei Meng, Yang Zhang, Xixin Wu, Zhiyong Wu, Hung-Yi Lee, Helen Meng

However, previous works show that state-of-the-art ASV models are seriously vulnerable to voice spoofing attacks, and the recently proposed high-performance spoofing countermeasure (CM) models only focus solely on the standalone anti-spoofing tasks, and ignore the subsequent speaker verification process.

Open-Ended Question Answering Speaker Verification

Human-AI Shared Control via Policy Dissection

1 code implementation31 May 2022 Quanyi Li, Zhenghao Peng, Haibin Wu, Lan Feng, Bolei Zhou

Inspired by the neuroscience approach to investigate the motor cortex in primates, we develop a simple yet effective frequency-based approach called \textit{Policy Dissection} to align the intermediate representation of the learned neural controller with the kinematic attributes of the agent behavior.

Autonomous Driving Reinforcement Learning (RL)

Spoofing-Aware Speaker Verification by Multi-Level Fusion

no code implementations29 Mar 2022 Haibin Wu, Lingwei Meng, Jiawen Kang, Jinchao Li, Xu Li, Xixin Wu, Hung-Yi Lee, Helen Meng

In the second-level fusion, the CM score and ASV scores directly from ASV systems will be concatenated into a prediction block for the final decision.

Speaker Verification

The CUHK-TENCENT speaker diarization system for the ICASSP 2022 multi-channel multi-party meeting transcription challenge

no code implementations4 Feb 2022 Naijun Zheng, Na Li, Xixin Wu, Lingwei Meng, Jiawen Kang, Haibin Wu, Chao Weng, Dan Su, Helen Meng

This paper describes our speaker diarization system submitted to the Multi-channel Multi-party Meeting Transcription (M2MeT) challenge, where Mandarin meeting data were recorded in multi-channel format for diarization and automatic speech recognition (ASR) tasks.

Action Detection Activity Detection +6

Characterizing the adversarial vulnerability of speech self-supervised learning

no code implementations8 Nov 2021 Haibin Wu, Bo Zheng, Xu Li, Xixin Wu, Hung-Yi Lee, Helen Meng

As the paradigm of the self-supervised learning upstream model followed by downstream tasks arouses more attention in the speech community, characterizing the adversarial robustness of such paradigm is of high priority.

Adversarial Robustness Benchmarking +2

Revisiting the Monotonicity Constraint in Cooperative Multi-Agent Reinforcement Learning

no code implementations29 Sep 2021 Jian Hu, Siyang Jiang, Seth Austin Harding, Haibin Wu, Shih-wei Liao

QMIX, a popular MARL algorithm based on the monotonicity constraint, has been used as a baseline for the benchmark environments, such as Starcraft Multi-Agent Challenge (SMAC), Predator-Prey (PP).

reinforcement-learning Reinforcement Learning (RL) +2

Voting for the right answer: Adversarial defense for speaker verification

1 code implementation15 Jun 2021 Haibin Wu, Yang Zhang, Zhiyong Wu, Dong Wang, Hung-Yi Lee

Automatic speaker verification (ASV) is a well developed technology for biometric identification, and has been ubiquitous implemented in security-critic applications, such as banking and access control.

Adversarial Defense Speaker Verification

Tackling Variabilities in Autonomous Driving

no code implementations21 Apr 2021 Yuqiong Qi, Yang Hu, Haibin Wu, Shen Li, Haiyu Mao, Xiaochun Ye, Dongrui Fan, Ninghui Sun

In this work, we aim to extensively explore the above system design challenges and these challenges motivate us to propose a comprehensive framework that synergistically handles the heterogeneous hardware accelerator design principles, system design criteria, and task scheduling mechanism.

Autonomous Driving Reinforcement Learning (RL) +1

Rethinking the Implementation Matters in Cooperative Multi-Agent Reinforcement Learning

2 code implementations6 Feb 2021 Jian Hu, Siyang Jiang, Seth Austin Harding, Haibin Wu, Shih-wei Liao

Multi-Agent Reinforcement Learning (MARL) has seen revolutionary breakthroughs with its successful application to multi-agent cooperative tasks such as computer games and robot swarms.

reinforcement-learning Reinforcement Learning (RL) +3

QR-MIX: Distributional Value Function Factorisation for Cooperative Multi-Agent Reinforcement Learning

no code implementations9 Sep 2020 Jian Hu, Seth Austin Harding, Haibin Wu, Siyue Hu, Shih-wei Liao

Existing methods such as Value Decomposition Network (VDN) and QMIX estimate the value of long-term returns as a scalar that does not contain the information of randomness.

reinforcement-learning Reinforcement Learning (RL) +2

Defense for Black-box Attacks on Anti-spoofing Models by Self-Supervised Learning

5 code implementations5 Jun 2020 Haibin Wu, Andy T. Liu, Hung-Yi Lee

To explore this issue, we proposed to employ Mockingjay, a self-supervised learning based model, to protect anti-spoofing models against adversarial attacks in the black-box scenario.

Self-Supervised Learning Speaker Verification +1

Defense against adversarial attacks on spoofing countermeasures of ASV

no code implementations6 Mar 2020 Haibin Wu, Songxiang Liu, Helen Meng, Hung-Yi Lee

Various forefront countermeasure methods for automatic speaker verification (ASV) with considerable performance in anti-spoofing are proposed in the ASVspoof 2019 challenge.

Speaker Verification

Adversarial Attacks on Spoofing Countermeasures of automatic speaker verification

1 code implementation19 Oct 2019 Songxiang Liu, Haibin Wu, Hung-Yi Lee, Helen Meng

High-performance spoofing countermeasure systems for automatic speaker verification (ASV) have been proposed in the ASVspoof 2019 challenge.

Speaker Verification

Cannot find the paper you are looking for? You can Submit a new open access paper.