Search Results for author: Haibin Wu

Found 31 papers, 11 papers with code

A Large-Scale Evaluation of Speech Foundation Models

no code implementations • 15 Apr 2024 • Shu-wen Yang, Heng-Jui Chang, Zili Huang, Andy T. Liu, Cheng-I Lai, Haibin Wu, Jiatong Shi, Xuankai Chang, Hsiang-Sheng Tsai, Wen-Chin Huang, Tzu-hsun Feng, Po-Han Chi, Yist Y. Lin, Yung-Sung Chuang, Tzu-Hsien Huang, Wei-Cheng Tseng, Kushal Lakhotia, Shang-Wen Li, Abdelrahman Mohamed, Shinji Watanabe, Hung-Yi Lee

In this work, we establish the Speech processing Universal PERformance Benchmark (SUPERB) to study the effectiveness of the paradigm for speech.

Benchmarking

Paper
Add Code

EMO-SUPERB: An In-depth Look at Speech Emotion Recognition

1 code implementation • 20 Feb 2024 • Haibin Wu, Huang-Cheng Chou, Kai-Wei Chang, Lucas Goncalves, Jiawei Du, Jyh-Shing Roger Jang, Chi-Chun Lee, Hung-Yi Lee

Speech emotion recognition (SER) is a pivotal technology for human-computer interaction systems.

Self-Supervised Learning Speech Emotion Recognition

Paper
Code

Codec-SUPERB: An In-Depth Analysis of Sound Codec Models

1 code implementation • 20 Feb 2024 • Haibin Wu, Ho-Lam Chung, Yi-Cheng Lin, Yuan-Kuei Wu, Xuanjun Chen, Yu-Chi Pai, Hsiu-Hsuan Wang, Kai-Wei Chang, Alexander H. Liu, Hung-Yi Lee

The sound codec's dual roles in minimizing data transmission latency and serving as tokenizers underscore its critical importance.

138

Paper
Code

Towards audio language modeling - an overview

no code implementations • 20 Feb 2024 • Haibin Wu, Xuanjun Chen, Yi-Cheng Lin, Kai-Wei Chang, Ho-Lam Chung, Alexander H. Liu, Hung-Yi Lee

Neural audio codecs are initially introduced to compress audio data into compact codes to reduce transmission latency.

Language Modelling

Paper
Add Code

Scalable Ensemble-based Detection Method against Adversarial Attacks for speaker verification

no code implementations • 14 Dec 2023 • Haibin Wu, Heng-Cheng Kuo, Yu Tsao, Hung-Yi Lee

Automatic speaker verification (ASV) is highly susceptible to adversarial attacks.

Speaker Verification

Paper
Add Code

AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models

1 code implementation • 19 Sep 2023 • Yuan Tseng, Layne Berry, Yi-Ting Chen, I-Hsiang Chiu, Hsuan-Hao Lin, Max Liu, Puyuan Peng, Yi-Jen Shih, Hung-Yu Wang, Haibin Wu, Po-Yao Huang, Chun-Mao Lai, Shang-Wen Li, David Harwath, Yu Tsao, Shinji Watanabe, Abdelrahman Mohamed, Chi-Luen Feng, Hung-Yi Lee

Audio-visual representation learning aims to develop systems with human-like perception by utilizing correlation between auditory and visual information.

audio-visual learning Representation Learning

Paper
Code

Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech

1 code implementation • 18 Sep 2023 • Chien-yu Huang, Ke-Han Lu, Shih-Heng Wang, Chi-Yuan Hsiao, Chun-Yi Kuan, Haibin Wu, Siddhant Arora, Kai-Wei Chang, Jiatong Shi, Yifan Peng, Roshan Sharma, Shinji Watanabe, Bhiksha Ramakrishnan, Shady Shehata, Hung-Yi Lee

To achieve comprehensive coverage of diverse speech tasks and harness instruction tuning, we invite the community to collaborate and contribute, facilitating the dynamic growth of the benchmark.

Paper
Code

SpeechGen: Unlocking the Generative Power of Speech Language Models with Prompts

no code implementations • 3 Jun 2023 • Haibin Wu, Kai-Wei Chang, Yuan-Kuei Wu, Hung-Yi Lee

In this paper, we present pioneering research that explores the application of prompt tuning to stimulate speech LMs for various generation tasks, within a unified framework called SpeechGen, with around 10M trainable parameters.

Open-Ended Question Answering

Paper
Add Code

Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with a Sidecar Separator

no code implementations • 25 May 2023 • Lingwei Meng, Jiawen Kang, Mingyu Cui, Haibin Wu, Xixin Wu, Helen Meng

Extending on this, we incorporate a diarization branch into the Sidecar, allowing for unified modeling of both ASR and diarization with a negligible overhead of only 768 parameters.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

The defender's perspective on automatic speaker verification: An overview

no code implementations • 22 May 2023 • Haibin Wu, Jiawen Kang, Lingwei Meng, Helen Meng, Hung-Yi Lee

Automatic speaker verification (ASV) plays a critical role in security-sensitive environments.

Speaker Verification

Paper
Add Code

Rethinking complex-valued deep neural networks for monaural speech enhancement

no code implementations • 11 Jan 2023 • Haibin Wu, Ke Tan, Buye Xu, Anurag Kumar, Daniel Wong

By comparing complex- and real-valued versions of fundamental building blocks in the recently developed gated convolutional recurrent network (GCRN), we show how different mechanisms for basic blocks affect the performance.

Open-Ended Question Answering Speech Enhancement

Paper
Add Code

Multimodal Transformer Distillation for Audio-Visual Synchronization

2 code implementations • 27 Oct 2022 • Xuanjun Chen, Haibin Wu, Chung-Che Wang, Hung-Yi Lee, Jyh-Shing Roger Jang

This paper proposed an MTDVocaLiST model, which is trained by our proposed multimodal Transformer distillation (MTD) loss.

Audio-Visual Synchronization

Paper
Code

SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning

no code implementations • 16 Oct 2022 • Tzu-hsun Feng, Annie Dong, Ching-Feng Yeh, Shu-wen Yang, Tzu-Quan Lin, Jiatong Shi, Kai-Wei Chang, Zili Huang, Haibin Wu, Xuankai Chang, Shinji Watanabe, Abdelrahman Mohamed, Shang-Wen Li, Hung-Yi Lee

We present the SUPERB challenge at SLT 2022, which aims at learning self-supervised speech representation for better performance, generalization, and efficiency.

Audio Generation Representation Learning +2

Paper
Add Code

Push-Pull: Characterizing the Adversarial Robustness for Audio-Visual Active Speaker Detection

no code implementations • 3 Oct 2022 • Xuanjun Chen, Haibin Wu, Helen Meng, Hung-Yi Lee, Jyh-Shing Roger Jang

Audio-visual active speaker detection (AVASD) is well-developed, and now is an indispensable front-end for several multi-modal applications.

Adversarial Robustness Audio-Visual Active Speaker Detection

Paper
Add Code

Tackling Spoofing-Aware Speaker Verification with Multi-Model Fusion

no code implementations • 18 Jun 2022 • Haibin Wu, Jiawen Kang, Lingwei Meng, Yang Zhang, Xixin Wu, Zhiyong Wu, Hung-Yi Lee, Helen Meng

However, previous works show that state-of-the-art ASV models are seriously vulnerable to voice spoofing attacks, and the recently proposed high-performance spoofing countermeasure (CM) models only focus solely on the standalone anti-spoofing tasks, and ignore the subsequent speaker verification process.

Open-Ended Question Answering Speaker Verification

Paper
Add Code

Human-AI Shared Control via Policy Dissection

1 code implementation • 31 May 2022 • Quanyi Li, Zhenghao Peng, Haibin Wu, Lan Feng, Bolei Zhou

Inspired by the neuroscience approach to investigate the motor cortex in primates, we develop a simple yet effective frequency-based approach called \textit{Policy Dissection} to align the intermediate representation of the learned neural controller with the kinematic attributes of the agent behavior.

Autonomous Driving Reinforcement Learning (RL)

193

Paper
Code

Spoofing-Aware Speaker Verification by Multi-Level Fusion

no code implementations • 29 Mar 2022 • Haibin Wu, Lingwei Meng, Jiawen Kang, Jinchao Li, Xu Li, Xixin Wu, Hung-Yi Lee, Helen Meng

In the second-level fusion, the CM score and ASV scores directly from ASV systems will be concatenated into a prediction block for the final decision.

Speaker Verification

Paper
Add Code

Partially Fake Audio Detection by Self-attention-based Fake Span Discovery

no code implementations • 14 Feb 2022 • Haibin Wu, Heng-Cheng Kuo, Naijun Zheng, Kuo-Hsuan Hung, Hung-Yi Lee, Yu Tsao, Hsin-Min Wang, Helen Meng

Also ADD 2022 is the first challenge to propose the partially fake audio detection task.

Open-Ended Question Answering Speech Synthesis +1

Paper
Add Code

The CUHK-TENCENT speaker diarization system for the ICASSP 2022 multi-channel multi-party meeting transcription challenge

no code implementations • 4 Feb 2022 • Naijun Zheng, Na Li, Xixin Wu, Lingwei Meng, Jiawen Kang, Haibin Wu, Chao Weng, Dan Su, Helen Meng

This paper describes our speaker diarization system submitted to the Multi-channel Multi-party Meeting Transcription (M2MeT) challenge, where Mandarin meeting data were recorded in multi-channel format for diarization and automatic speech recognition (ASR) tasks.

Action Detection Activity Detection +6

Paper
Add Code

Characterizing the adversarial vulnerability of speech self-supervised learning

no code implementations • 8 Nov 2021 • Haibin Wu, Bo Zheng, Xu Li, Xixin Wu, Hung-Yi Lee, Helen Meng

As the paradigm of the self-supervised learning upstream model followed by downstream tasks arouses more attention in the speech community, characterizing the adversarial robustness of such paradigm is of high priority.

Adversarial Robustness Benchmarking +2

Paper
Add Code

Revisiting the Monotonicity Constraint in Cooperative Multi-Agent Reinforcement Learning

no code implementations • 29 Sep 2021 • Jian Hu, Siyang Jiang, Seth Austin Harding, Haibin Wu, Shih-wei Liao

QMIX, a popular MARL algorithm based on the monotonicity constraint, has been used as a baseline for the benchmark environments, such as Starcraft Multi-Agent Challenge (SMAC), Predator-Prey (PP).

reinforcement-learning Reinforcement Learning (RL) +2

Paper
Add Code

Adversarial Sample Detection for Speaker Verification by Neural Vocoders

1 code implementation • 1 Jul 2021 • Haibin Wu, Po-chun Hsu, Ji Gao, Shanshan Zhang, Shen Huang, Jian Kang, Zhiyong Wu, Helen Meng, Hung-Yi Lee

We also show that the neural vocoder adopted in the detection framework is dataset-independent.

Speaker Verification

Paper
Code

Voting for the right answer: Adversarial defense for speaker verification

1 code implementation • 15 Jun 2021 • Haibin Wu, Yang Zhang, Zhiyong Wu, Dong Wang, Hung-Yi Lee

Automatic speaker verification (ASV) is a well developed technology for biometric identification, and has been ubiquitous implemented in security-critic applications, such as banking and access control.

Adversarial Defense Speaker Verification

Paper
Code

Improving the Adversarial Robustness for Speaker Verification by Self-Supervised Learning

no code implementations • 1 Jun 2021 • Haibin Wu, Xu Li, Andy T. Liu, Zhiyong Wu, Helen Meng, Hung-Yi Lee

This work is among the first to perform adversarial defense for ASV without knowing the specific attack algorithms.

Adversarial Defense Adversarial Robustness +2

Paper
Add Code

Tackling Variabilities in Autonomous Driving

no code implementations • 21 Apr 2021 • Yuqiong Qi, Yang Hu, Haibin Wu, Shen Li, Haiyu Mao, Xiaochun Ye, Dongrui Fan, Ninghui Sun

In this work, we aim to extensively explore the above system design challenges and these challenges motivate us to propose a comprehensive framework that synergistically handles the heterogeneous hardware accelerator design principles, system design criteria, and task scheduling mechanism.

Autonomous Driving Reinforcement Learning (RL) +1

Paper
Add Code

Adversarial defense for automatic speaker verification by cascaded self-supervised learning models

no code implementations • 14 Feb 2021 • Haibin Wu, Xu Li, Andy T. Liu, Zhiyong Wu, Helen Meng, Hung-Yi Lee

Automatic speaker verification (ASV) is one of the core technologies in biometric identification.

Adversarial Defense Open-Ended Question Answering +2

Paper
Add Code

Rethinking the Implementation Matters in Cooperative Multi-Agent Reinforcement Learning

2 code implementations • 6 Feb 2021 • Jian Hu, Siyang Jiang, Seth Austin Harding, Haibin Wu, Shih-wei Liao

Multi-Agent Reinforcement Learning (MARL) has seen revolutionary breakthroughs with its successful application to multi-agent cooperative tasks such as computer games and robot swarms.

reinforcement-learning Reinforcement Learning (RL) +3

555

Paper
Code

QR-MIX: Distributional Value Function Factorisation for Cooperative Multi-Agent Reinforcement Learning

no code implementations • 9 Sep 2020 • Jian Hu, Seth Austin Harding, Haibin Wu, Siyue Hu, Shih-wei Liao

Existing methods such as Value Decomposition Network (VDN) and QMIX estimate the value of long-term returns as a scalar that does not contain the information of randomness.

reinforcement-learning Reinforcement Learning (RL) +2

Paper
Add Code

Defense for Black-box Attacks on Anti-spoofing Models by Self-Supervised Learning

5 code implementations • 5 Jun 2020 • Haibin Wu, Andy T. Liu, Hung-Yi Lee

To explore this issue, we proposed to employ Mockingjay, a self-supervised learning based model, to protect anti-spoofing models against adversarial attacks in the black-box scenario.

Self-Supervised Learning Speaker Verification +1

2,084

Paper
Code

Defense against adversarial attacks on spoofing countermeasures of ASV

no code implementations • 6 Mar 2020 • Haibin Wu, Songxiang Liu, Helen Meng, Hung-Yi Lee

Various forefront countermeasure methods for automatic speaker verification (ASV) with considerable performance in anti-spoofing are proposed in the ASVspoof 2019 challenge.

Speaker Verification

Paper
Add Code

Adversarial Attacks on Spoofing Countermeasures of automatic speaker verification

1 code implementation • 19 Oct 2019 • Songxiang Liu, Haibin Wu, Hung-Yi Lee, Helen Meng

High-performance spoofing countermeasure systems for automatic speaker verification (ASV) have been proposed in the ASVspoof 2019 challenge.

Speaker Verification

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.