Search Results for author: Szu-Wei Fu

Found 33 papers, 16 papers with code

Self-Supervised Speech Quality Estimation and Enhancement Using Only Clean Speech

1 code implementation • 26 Feb 2024 • Szu-Wei Fu, Kuo-Hsuan Hung, Yu Tsao, Yu-Chiang Frank Wang

To improve the robustness of the encoder for SE, a novel self-distillation mechanism combined with adversarial training is introduced.

Quantization Speech Enhancement

Paper
Code

Multi-objective Non-intrusive Hearing-aid Speech Assessment Model

no code implementations • 15 Nov 2023 • Hsin-Tien Chiang, Szu-Wei Fu, Hsin-Min Wang, Yu Tsao, John H. L. Hansen

Furthermore, we demonstrated that incorporating SSL models resulted in greater transferability to OOD dataset.

Paper
Add Code

A Study on Incorporating Whisper for Robust Speech Assessment

1 code implementation • 22 Sep 2023 • Ryandhimas E. Zezario, Yu-Wen Chen, Szu-Wei Fu, Yu Tsao, Hsin-Min Wang, Chiou-Shann Fuh

The first part of this study investigates the correlation between the embedding features of Whisper and two self-supervised learning (SSL) models with subjective quality and intelligibility scores.

Self-Supervised Learning

Paper
Code

Study on the Correlation between Objective Evaluations and Subjective Speech Quality and Intelligibility

no code implementations • 10 Jul 2023 • Hsin-Tien Chiang, Kuo-Hsuan Hung, Szu-Wei Fu, Heng-Cheng Kuo, Ming-Hsueh Tsai, Yu Tsao

Moreover, new objective measures are proposed that combine current objective measures using deep learning techniques to predict subjective quality and intelligibility.

Paper
Add Code

QuAVF: Quality-aware Audio-Visual Fusion for Ego4D Talking to Me Challenge

1 code implementation • 30 Jun 2023 • Hsi-Che Lin, Chien-Yi Wang, Min-Hung Chen, Szu-Wei Fu, Yu-Chiang Frank Wang

This technical report describes our QuAVF@NTU-NVIDIA submission to the Ego4D Talking to Me (TTM) Challenge 2023.

Paper
Code

Improving Meeting Inclusiveness using Speech Interruption Analysis

no code implementations • 2 Apr 2023 • Szu-Wei Fu, Yaran Fan, Yasaman Hosseinkashi, Jayant Gupchup, Ross Cutler

In order to drive adoption of its usage to improve inclusiveness (and participation), we present a machine learning-based system that predicts when a meeting participant attempts to obtain the floor, but fails to interrupt (termed a `failed interruption').

Paper
Add Code

Real-time Speech Interruption Analysis: From Cloud to Client Deployment

no code implementations • 24 Oct 2022 • Quchen Fu, Szu-Wei Fu, Yaran Fan, Yu Wu, Zhuo Chen, Jayant Gupchup, Ross Cutler

Meetings are an essential form of communication for all types of organizations, and remote collaboration systems have been much more widely used since the COVID-19 pandemic.

Paper
Add Code

MTI-Net: A Multi-Target Speech Intelligibility Prediction Model

no code implementations • 7 Apr 2022 • Ryandhimas E. Zezario, Szu-Wei Fu, Fei Chen, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao

Recently, deep learning (DL)-based non-intrusive speech assessment models have attracted great attention.

Multi-Task Learning Self-Supervised Learning

Paper
Add Code

Boosting Self-Supervised Embeddings for Speech Enhancement

1 code implementation • 7 Apr 2022 • Kuo-Hsuan Hung, Szu-Wei Fu, Huan-Hsin Tseng, Hsin-Tien Chiang, Yu Tsao, Chii-Wann Lin

We further study the relationship between the noise robustness of SSL representation via clean-noisy distance (CN distance) and the layer importance for SE.

Ranked #9 on Speech Enhancement on VoiceBank + DEMAND

Self-Supervised Learning Speech Enhancement

Paper
Code

Perceptual Contrast Stretching on Target Feature for Speech Enhancement

1 code implementation • 31 Mar 2022 • Rong Chao, Cheng Yu, Szu-Wei Fu, Xugang Lu, Yu Tsao

Specifically, the contrast of target features is stretched based on perceptual importance, thereby improving the overall SE performance.

Ranked #5 on Speech Enhancement on VoiceBank + DEMAND

Speech Enhancement

Paper
Code

OSSEM: one-shot speaker adaptive speech enhancement using meta learning

no code implementations • 10 Nov 2021 • Cheng Yu, Szu-Wei Fu, Tsun-An Hsieh, Yu Tsao, Mirco Ravanelli

Although deep learning (DL) has achieved notable progress in speech enhancement (SE), further research is still required for a DL-based SE system to adapt effectively and efficiently to particular speakers.

Meta-Learning Speech Enhancement

Paper
Add Code

SEOFP-NET: Compression and Acceleration of Deep Neural Networks for Speech Enhancement Using Sign-Exponent-Only Floating-Points

no code implementations • 8 Nov 2021 • Yu-Chen Lin, Cheng Yu, Yi-Te Hsu, Szu-Wei Fu, Yu Tsao, Tei-Wei Kuo

In this paper, a novel sign-exponent-only floating-point network (SEOFP-NET) technique is proposed to compress the model size and accelerate the inference time for speech enhancement, a regression task of speech signal processing.

Model Compression regression +1

Paper
Add Code

Deep Learning-based Non-Intrusive Multi-Objective Speech Assessment Model with Cross-Domain Features

1 code implementation • 3 Nov 2021 • Ryandhimas E. Zezario, Szu-Wei Fu, Fei Chen, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao

In this study, we propose a cross-domain multi-objective speech assessment model called MOSA-Net, which can estimate multiple speech assessment metrics simultaneously.

Speech Enhancement

Paper
Code

MetricGAN-U: Unsupervised speech enhancement/ dereverberation based only on noisy/ reverberated speech

2 code implementations • 12 Oct 2021 • Szu-Wei Fu, Cheng Yu, Kuo-Hsuan Hung, Mirco Ravanelli, Yu Tsao

Most of the deep learning-based speech enhancement models are learned in a supervised manner, which implies that pairs of noisy and clean speech are required during training.

Speech Enhancement

7,877

Paper
Code

SpeechBrain: A General-Purpose Speech Toolkit

4 code implementations • 8 Jun 2021 • Mirco Ravanelli, Titouan Parcollet, Peter Plantinga, Aku Rouhe, Samuele Cornell, Loren Lugosch, Cem Subakan, Nauman Dawalatabad, Abdelwahab Heba, Jianyuan Zhong, Ju-chieh Chou, Sung-Lin Yeh, Szu-Wei Fu, Chien-Feng Liao, Elena Rastorgueva, François Grondin, William Aris, Hwidong Na, Yan Gao, Renato de Mori, Yoshua Bengio

SpeechBrain is an open-source and all-in-one speech toolkit.

Language Identification Spoken Language Understanding

7,879

Paper
Code

MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement

3 code implementations • 8 Apr 2021 • Szu-Wei Fu, Cheng Yu, Tsun-An Hsieh, Peter Plantinga, Mirco Ravanelli, Xugang Lu, Yu Tsao

The discrepancy between the cost function used for training a speech enhancement model and human auditory perception usually makes the quality of enhanced speech unsatisfactory.

Ranked #12 on Speech Enhancement on VoiceBank + DEMAND

Speech Enhancement

7,879

Paper
Code

STOI-Net: A Deep Learning based Non-Intrusive Speech Intelligibility Assessment Model

1 code implementation • 9 Nov 2020 • Ryandhimas E. Zezario, Szu-Wei Fu, Chiou-Shann Fuh, Yu Tsao, Hsin-Min Wang

To overcome this limitation, we propose a deep learning-based non-intrusive speech intelligibility assessment model, namely STOI-Net.

Paper
Code

Improving Perceptual Quality by Phone-Fortified Perceptual Loss using Wasserstein Distance for Speech Enhancement

1 code implementation • 28 Oct 2020 • Tsun-An Hsieh, Cheng Yu, Szu-Wei Fu, Xugang Lu, Yu Tsao

Speech enhancement (SE) aims to improve speech quality and intelligibility, which are both related to a smooth transition in speech segments that may carry linguistic information, e. g. phones and syllables.

Ranked #12 on Speech Enhancement on VoiceBank + DEMAND

Speech Enhancement

Paper
Code

CITISEN: A Deep Learning-Based Speech Signal-Processing Mobile Application

1 code implementation • 21 Aug 2020 • Yu-Wen Chen, Kuo-Hsuan Hung, You-Jin Li, Alexander Chao-Fu Kang, Ya-Hsin Lai, Kai-Chun Liu, Szu-Wei Fu, Syu-Siang Wang, Yu Tsao

The CITISEN provides three functions: speech enhancement (SE), model adaptation (MA), and background noise conversion (BNC), allowing CITISEN to be used as a platform for utilizing and evaluating SE models and flexibly extend the models to address various noise environments and users.

Acoustic Scene Classification Data Augmentation +2

Paper
Code

Boosting Objective Scores of a Speech Enhancement Model by MetricGAN Post-processing

no code implementations • 18 Jun 2020 • Szu-Wei Fu, Chien-Feng Liao, Tsun-An Hsieh, Kuo-Hsuan Hung, Syu-Siang Wang, Cheng Yu, Heng-Cheng Kuo, Ryandhimas E. Zezario, You-Jin Li, Shang-Yi Chuang, Yen-Ju Lu, Yu Tsao

The Transformer architecture has demonstrated a superior ability compared to recurrent neural networks in many different natural language processing applications.

Speech Enhancement

Paper
Add Code

iMetricGAN: Intelligibility Enhancement for Speech-in-Noise using Generative Adversarial Network-based Metric Learning

1 code implementation • Interspeech 2020 • Haoyu Li, Szu-Wei Fu, Yu Tsao, Junichi Yamagishi

The intelligibility of natural speech is seriously degraded when exposed to adverse noisy environments.

Audio and Speech Processing Sound

Paper
Code

Time-Domain Multi-modal Bone/air Conducted Speech Enhancement

no code implementations • 22 Nov 2019 • Cheng Yu, Kuo-Hsuan Hung, Syu-Siang Wang, Szu-Wei Fu, Yu Tsao, Jeih-weih Hung

Previous studies have proven that integrating video signals, as a complementary modality, can facilitate improved performance for speech enhancement (SE).

Ensemble Learning Speech Enhancement

Paper
Add Code

Seeing Voices in Noise: A Study of Audiovisual-Enhanced Vocoded Speech Intelligibility in Cochlear Implant Simulation

no code implementations • 26 Sep 2019 • Rung-Yu Tseng, Tao-Wei Wang, Szu-Wei Fu, Yu Tsao, Chia-Ying Lee

Speech perception is a key to verbal communication.

Speech Enhancement Sound Audio and Speech Processing

Paper
Add Code

Improving the Intelligibility of Electric and Acoustic Stimulation Speech Using Fully Convolutional Networks Based Speech Enhancement

no code implementations • 26 Sep 2019 • Natalie Yu-Hsien Wang, Hsiao-Lan Sharon Wang, Tao-Wei Wang, Szu-Wei Fu, Xugan Lu, Yu Tsao, Hsin-Min Wang

Recently, a time-domain speech enhancement algorithm based on the fully convolutional neural networks (FCN) with a short-time objective intelligibility (STOI)-based objective function (termed FCN(S) in short) has received increasing attention due to its simple structure and effectiveness of restoring clean speech signals from noisy counterparts.

Denoising Speech Enhancement +1 Sound Audio and Speech Processing

Paper
Add Code

Increasing Compactness Of Deep Learning Based Speech Enhancement Models With Parameter Pruning And Quantization Techniques

no code implementations • 31 May 2019 • Jyun-Yi Wu, Cheng Yu, Szu-Wei Fu, Chih-Ting Liu, Shao-Yi Chien, Yu Tsao

In addition, a parameter quantization (PQ) technique was applied to reduce the size of a neural network by representing weights with fewer cluster centroids.

Denoising Quantization +1

Paper
Add Code

MetricGAN: Generative Adversarial Networks based Black-box Metric Scores Optimization for Speech Enhancement

5 code implementations • 13 May 2019 • Szu-Wei Fu, Chien-Feng Liao, Yu Tsao, Shou-De Lin

Adversarial loss in a conditional generative adversarial network (GAN) is not designed to directly optimize evaluation metrics of a target task, and thus, may not always guide the generator in a GAN to generate data with improved metric scores.

Ranked #21 on Speech Enhancement on VoiceBank + DEMAND

Generative Adversarial Network Speech Enhancement

485

Paper
Code

Learning with Learned Loss Function: Speech Enhancement with Quality-Net to Improve Perceptual Evaluation of Speech Quality

1 code implementation • 6 May 2019 • Szu-Wei Fu, Chien-Feng Liao, Yu Tsao

Utilizing a human-perception-related objective function to train a speech enhancement model has become a popular topic recently.

Speech Enhancement

Paper
Code

MOSNet: Deep Learning based Objective Assessment for Voice Conversion

6 code implementations • 17 Apr 2019 • Chen-Chou Lo, Szu-Wei Fu, Wen-Chin Huang, Xin Wang, Junichi Yamagishi, Yu Tsao, Hsin-Min Wang

In this paper, we propose deep learning-based assessment models to predict human ratings of converted speech.

Voice Conversion

830

Paper
Code

A study on speech enhancement using exponent-only floating point quantized neural network (EOFP-QNN)

no code implementations • 17 Aug 2018 • Yi-Te Hsu, Yu-Chen Lin, Szu-Wei Fu, Yu Tsao, Tei-Wei Kuo

We evaluated the proposed EOFP quantization technique on two types of neural networks, namely, bidirectional long short-term memory (BLSTM) and fully convolutional neural network (FCN), on a speech enhancement task.

Quantization regression +1

Paper
Add Code

Quality-Net: An End-to-End Non-intrusive Speech Quality Assessment Model based on BLSTM

no code implementations • 16 Aug 2018 • Szu-Wei Fu, Yu Tsao, Hsin-Te Hwang, Hsin-Min Wang

The evaluation of utterance-level quality in Quality-Net is based on the frame-level assessment.

Speech Enhancement

Paper
Add Code

End-to-End Waveform Utterance Enhancement for Direct Evaluation Metrics Optimization by Fully Convolutional Neural Networks

no code implementations • 12 Sep 2017 • Szu-Wei Fu, Tao-Wei Wang, Yu Tsao, Xugang Lu, Hisashi Kawai

For example, in measuring speech intelligibility, most of the evaluation metric is based on a short-time objective intelligibility (STOI) measure, while the frame based minimum mean square error (MMSE) between estimated and clean speech is widely used in optimizing the model.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Complex spectrogram enhancement by convolutional neural network with multi-metrics learning

no code implementations • 27 Apr 2017 • Szu-Wei Fu, Ting-yao Hu, Yu Tsao, Xugang Lu

This paper aims to address two issues existing in the current speech enhancement methods: 1) the difficulty of phase estimations; 2) a single objective function cannot consider multiple metrics simultaneously.

Speech Enhancement

Paper
Add Code

Raw Waveform-based Speech Enhancement by Fully Convolutional Networks

no code implementations • 7 Mar 2017 • Szu-Wei Fu, Yu Tsao, Xugang Lu, Hisashi Kawai

Because the fully connected layers, which are involved in deep neural networks (DNN) and convolutional neural networks (CNN), may not accurately characterize the local information of speech signals, particularly with high frequency components, we employed fully convolutional layers to model the waveform.

Denoising Speech Enhancement

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.