Search Results for author: Hsin-Min Wang

Found 94 papers, 30 papers with code

A Flexible and Extensible Framework for Multiple Answer Modes Question Answering

no code implementations • ROCLING 2021 • Cheng-Chung Fan, Chia-Chih Kuo, Shang-Bao Luo, Pei-Jun Liao, Kuang-Yu Chang, Chiao-Wei Hsu, Meng-Tse Wu, Shih-Hong Tsai, Tzu-Man Wu, Aleksandra Smolka, Chao-Chun Liang, Hsin-Min Wang, Kuan-Yu Chen, Yu Tsao, Keh-Yih Su

Only a few of them adopt several answer generation modules for providing different mechanisms; however, they either lack an aggregation mechanism to merge the answers from various modules, or are too complicated to be implemented with neural networks.

Answer Generation Question Answering

Paper
Add Code

Is Character Trigram Overlapping Ratio Still the Best Similarity Measure for Aligning Sentences in a Paraphrased Corpus?

no code implementations • ROCLING 2022 • Aleksandra Smolka, Hsin-Min Wang, Jason S. Chang, Keh-Yih Su

This paper studies if the character trigram is still a suitable similarity measure for the task of aligning sentences in a paragraph paraphrasing corpus.

Sentence Text Simplification

Paper
Add Code

Chinese Movie Dialogue Question Answering Dataset

no code implementations • ROCLING 2022 • Shang-Bao Luo, Cheng-Chung Fan, Kuan-Yu Chen, Yu Tsao, Hsin-Min Wang, Keh-Yih Su

This paper also provides a baseline system and shows its performance on this dataset.

Information Retrieval Question Answering +1

Paper
Add Code

Mining Commonsense and Domain Knowledge from Math Word Problems

no code implementations • ROCLING 2021 • Shih-hung Tsai, Chao-Chun Liang, Hsin-Min Wang, Keh-Yih Su

We construct two math datasets and show the effectiveness of our algorithms that they can retrieve the required knowledge for problem-solving.

Math

Paper
Add Code

Answering Chinese Elementary School Social Studies Multiple Choice Questions

no code implementations • IJCLCLP 2021 • Chao-Chun Liang, Daniel Lee, Meng-Tse Wu, Hsin-Min Wang, Keh-Yih Su

Multiple-choice

Paper
Add Code

Influences of Prosodic Feature Replacement on the Perceived Singing Voice Identity

no code implementations • ROCLING 2019 • Kuan-Yi Kang, Yi-Wen Liu, Hsin-Min Wang

Paper
Add Code

Unmasking Illusions: Understanding Human Perception of Audiovisual Deepfakes

no code implementations • 7 May 2024 • Ammarah Hashmi, Sahibzada Adil Shahzad, Chia-Wen Lin, Yu Tsao, Hsin-Min Wang

Can we humans correctly perceive the authenticity of the content of the videos we watch?

DeepFake Detection Face Swapping

Paper
Add Code

SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data

1 code implementation • 10 Feb 2024 • Hsuan-Fu Wang, Yi-Jen Shih, Heng-Jui Chang, Layne Berry, Puyuan Peng, Hung-Yi Lee, Hsin-Min Wang, David Harwath

Second, we propose a new hybrid architecture that merges the cascaded and parallel architectures of SpeechCLIP into a multi-task learning framework.

Keyword Extraction Multi-Task Learning +2

Paper
Code

HAAQI-Net: A non-intrusive neural music quality assessment model for hearing aids

1 code implementation • 2 Jan 2024 • Dyah A. M. G. Wisnu, Epri W. Pratiwi, Stefano Rini, Ryandhimas E. Zezario, Hsin-Min Wang, Yu Tsao

This paper introduces HAAQI-Net, a non-intrusive deep learning model for music quality assessment tailored to hearing aid users.

Music Quality Assessment

Paper
Code

D4AM: A General Denoising Framework for Downstream Acoustic Models

1 code implementation • 28 Nov 2023 • Chi-Chang Lee, Yu Tsao, Hsin-Min Wang, Chu-Song Chen

To our knowledge, this is the first work that deploys an effective combination scheme of regression (denoising) and classification (ASR) objectives to derive a general pre-processor applicable to various unseen ASR systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Code

LC4SV: A Denoising Framework Learning to Compensate for Unseen Speaker Verification Models

no code implementations • 28 Nov 2023 • Chi-Chang Lee, Hong-Wei Chen, Chu-Song Chen, Hsin-Min Wang, Tsung-Te Liu, Yu Tsao

The performance of speaker verification (SV) models may drop dramatically in noisy environments.

Denoising Speaker Verification +1

Paper
Add Code

Multi-objective Non-intrusive Hearing-aid Speech Assessment Model

no code implementations • 15 Nov 2023 • Hsin-Tien Chiang, Szu-Wei Fu, Hsin-Min Wang, Yu Tsao, John H. L. Hansen

Furthermore, we demonstrated that incorporating SSL models resulted in greater transferability to OOD dataset.

Paper
Add Code

AV-Lip-Sync+: Leveraging AV-HuBERT to Exploit Multimodal Inconsistency for Video Deepfake Detection

no code implementations • 5 Nov 2023 • Sahibzada Adil Shahzad, Ammarah Hashmi, Yan-Tsung Peng, Yu Tsao, Hsin-Min Wang

This study proposes a new method based on a multi-modal self-supervised-learning (SSL) feature extractor to exploit inconsistency between audio and visual modalities for multi-modal video forgery detection.

DeepFake Detection Face Swapping +2

Paper
Add Code

AVTENet: Audio-Visual Transformer-based Ensemble Network Exploiting Multiple Experts for Video Deepfake Detection

no code implementations • 19 Oct 2023 • Ammarah Hashmi, Sahibzada Adil Shahzad, Chia-Wen Lin, Yu Tsao, Hsin-Min Wang

For a detailed analysis, we evaluate AVTENet, its variants, and several existing methods on multiple test sets of the FakeAVCeleb dataset.

DeepFake Detection Face Swapping

Paper
Add Code

The VoiceMOS Challenge 2023: Zero-shot Subjective Speech Quality Prediction for Multiple Domains

no code implementations • 4 Oct 2023 • Erica Cooper, Wen-Chin Huang, Yu Tsao, Hsin-Min Wang, Tomoki Toda, Junichi Yamagishi

We present the second edition of the VoiceMOS Challenge, a scientific event that aims to promote the study of automatic prediction of the mean opinion score (MOS) of synthesized and processed speech.

Speech Synthesis Text-To-Speech Synthesis

Paper
Add Code

A Study on Incorporating Whisper for Robust Speech Assessment

1 code implementation • 22 Sep 2023 • Ryandhimas E. Zezario, Yu-Wen Chen, Szu-Wei Fu, Yu Tsao, Hsin-Min Wang, Chiou-Shann Fuh

This research introduces an enhanced version of the multi-objective speech assessment model--MOSA-Net+, by leveraging the acoustic features from Whisper, a large-scaled weakly supervised model.

Self-Supervised Learning

Paper
Code

Deep Complex U-Net with Conformer for Audio-Visual Speech Enhancement

no code implementations • 20 Sep 2023 • Shafique Ahmed, Chia-Wei Chen, Wenze Ren, Chin-Jou Li, Ernie Chu, Jun-Cheng Chen, Amir Hussain, Hsin-Min Wang, Yu Tsao, Jen-Cheng Hou

Recent studies have increasingly acknowledged the advantages of incorporating visual data into speech enhancement (SE) systems.

Decoder Speech Enhancement

Paper
Add Code

Utilizing Whisper to Enhance Multi-Branched Speech Intelligibility Prediction Model for Hearing Aids

no code implementations • 18 Sep 2023 • Ryandhimas E. Zezario, Fei Chen, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao

Automated assessment of speech intelligibility in hearing aid (HA) devices is of great importance.

Multi-Task Learning Self-Supervised Learning

Paper
Add Code

Multi-Task Pseudo-Label Learning for Non-Intrusive Speech Quality Assessment Model

no code implementations • 18 Aug 2023 • Ryandhimas E. Zezario, Bo-Ren Brian Bai, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao

This study proposes a multi-task pseudo-label learning (MPL)-based non-intrusive speech quality assessment model called MTQ-Net.

Multi-Task Learning Pseudo Label

Paper
Add Code

BASPRO: a balanced script producer for speech corpus collection based on the genetic algorithm

1 code implementation • 11 Dec 2022 • Yu-Wen Chen, Hsin-Min Wang, Yu Tsao

We converted the script into a speech corpus using two text-to-speech systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Code

Multimodal Forgery Detection Using Ensemble Learning

1 code implementation • Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2022 • Ammarah Hashmi, Sahibzada Adil Shahzad, Wasim Ahmad, Chia Wen Lin, Yu Tsao, Hsin-Min Wang

The recent rapid revolution in Artificial Intelligence (AI) technology has enabled the creation of hyper-realistic deepfakes, and detecting deepfake videos (also known as AIsynthesized videos) has become a critical task.

Ranked #1 on Multimodal Forgery Detection on FakeAVCeleb (using extra training data)

Ensemble Learning Face Swapping +1

Paper
Code

Lip Sync Matters: A Novel Multimodal Forgery Detector

1 code implementation • APSIPA ASC 2022 2022 • Sahibzada Adil Shahzad, Ammarah Hashmi, Sarwar Khan, Yan-Tsung Peng, Yu Tsao, Hsin-Min Wang

Deepfake technology has advanced a lot, but it is a double-sided sword for the community.

Ranked #1 on DeepFake Detection on FakeAVCeleb (Accuracy (%) metric)

DeepFake Detection Face Swapping +1

Paper
Code

A Training and Inference Strategy Using Noisy and Enhanced Speech as Target for Speech Enhancement without Clean Speech

2 code implementations • 27 Oct 2022 • Li-Wei Chen, Yao-Fei Cheng, Hung-Shin Lee, Yu Tsao, Hsin-Min Wang

The lack of clean speech is a practical challenge to the development of speech enhancement systems, which means that there is an inevitable mismatch between their training criterion and evaluation metric.

Speech Enhancement

Paper
Code

CasNet: Investigating Channel Robustness for Speech Separation

1 code implementation • 27 Oct 2022 • Fan-Lin Wang, Yao-Fei Cheng, Hung-Shin Lee, Yu Tsao, Hsin-Min Wang

In this study, inheriting the use of our previously constructed TAT-2mix corpus, we address the channel mismatch problem by proposing a channel-aware audio separation network (CasNet), a deep learning framework for end-to-end time-domain speech separation.

Speech Separation

Paper
Code

Mandarin Singing Voice Synthesis with Denoising Diffusion Probabilistic Wasserstein GAN

no code implementations • 21 Sep 2022 • Yin-Ping Cho, Yu Tsao, Hsin-Min Wang, Yi-Wen Liu

Singing voice synthesis (SVS) is the computer production of a human-like singing voice from given musical scores.

Denoising Generative Adversarial Network +1

Paper
Add Code

NASTAR: Noise Adaptive Speech Enhancement with Target-Conditional Resampling

no code implementations • 18 Jun 2022 • Chi-Chang Lee, Cheng-Hung Hu, Yu-Chen Lin, Chu-Song Chen, Hsin-Min Wang, Yu Tsao

NASTAR uses a feedback mechanism to simulate adaptive training data via a noise extractor and a retrieval model.

Retrieval Speech Enhancement

Paper
Add Code

A Study of Using Cepstrogram for Countermeasure Against Replay Attacks

1 code implementation • 9 Apr 2022 • Shih-kuang Lee, Yu Tsao, Hsin-Min Wang

This study investigated the cepstrogram properties and demonstrated their effectiveness as powerful countermeasures against replay attacks.

Paper
Code

MBI-Net: A Non-Intrusive Multi-Branched Speech Intelligibility Prediction Model for Hearing Aids

no code implementations • 7 Apr 2022 • Ryandhimas E. Zezario, Fei Chen, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao

In this study, we propose a multi-branched speech intelligibility prediction model (MBI-Net), for predicting the subjective intelligibility scores of HA users.

Paper
Add Code

MTI-Net: A Multi-Target Speech Intelligibility Prediction Model

no code implementations • 7 Apr 2022 • Ryandhimas E. Zezario, Szu-Wei Fu, Fei Chen, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao

Recently, deep learning (DL)-based non-intrusive speech assessment models have attracted great attention.

Multi-Task Learning Self-Supervised Learning

Paper
Add Code

Filter-based Discriminative Autoencoders for Children Speech Recognition

no code implementations • 1 Apr 2022 • Chiang-Lin Tai, Hung-Shin Lee, Yu Tsao, Hsin-Min Wang

Children speech recognition is indispensable but challenging due to the diversity of children's speech.

Decoder Domain Adaptation +2

Paper
Add Code

Disentangling the Impacts of Language and Channel Variability on Speech Separation Networks

1 code implementation • 30 Mar 2022 • Fan-Lin Wang, Hung-Shin Lee, Yu Tsao, Hsin-Min Wang

However, domain mismatch between training/test situations due to factors, such as speaker, content, channel, and environment, remains a severe problem for speech separation.

Speech Separation

Paper
Code

Generation of Speaker Representations Using Heterogeneous Training Batch Assembly

no code implementations • 30 Mar 2022 • Yu-Huai Peng, Hung-Shin Lee, Pin-Tuan Huang, Hsin-Min Wang

In traditional speaker diarization systems, a well-trained speaker model is a key component to extract representations from consecutive and partially overlapping segments in a long speech session.

speaker-diarization Speaker Diarization

Paper
Add Code

Subspace-based Representation and Learning for Phonotactic Spoken Language Recognition

no code implementations • 28 Mar 2022 • Hung-Shin Lee, Yu Tsao, Shyh-Kang Jeng, Hsin-Min Wang

Phonotactic constraints can be employed to distinguish languages by representing a speech utterance as a multinomial distribution or phone events.

Paper
Add Code

Speech-enhanced and Noise-aware Networks for Robust Speech Recognition

1 code implementation • 25 Mar 2022 • Hung-Shin Lee, Pin-Yuan Chen, Yao-Fei Cheng, Yu Tsao, Hsin-Min Wang

In this paper, a noise-aware training framework based on two cascaded neural structures is proposed to jointly optimize speech enhancement and speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Code

Chain-based Discriminative Autoencoders for Speech Recognition

no code implementations • 25 Mar 2022 • Hung-Shin Lee, Pin-Tuan Huang, Yao-Fei Cheng, Hsin-Min Wang

For application to robust speech recognition, we further extend c-DcAE to hierarchical and parallel structures, resulting in hc-DcAE and pc-DcAE.

Decoder Robust Speech Recognition +1

Paper
Add Code

Partially Fake Audio Detection by Self-attention-based Fake Span Discovery

no code implementations • 14 Feb 2022 • Haibin Wu, Heng-Cheng Kuo, Naijun Zheng, Kuo-Hsuan Hung, Hung-Yi Lee, Yu Tsao, Hsin-Min Wang, Helen Meng

Also ADD 2022 is the first challenge to propose the partially fake audio detection task.

Open-Ended Question Answering Speech Synthesis +1

Paper
Add Code

EMGSE: Acoustic/EMG Fusion for Multimodal Speech Enhancement

no code implementations • 14 Feb 2022 • Kuan-Chen Wang, Kai-Chun Liu, Hsin-Min Wang, Yu Tsao

Multimodal learning has been proven to be an effective method to improve speech enhancement (SE) performance, especially in challenging situations such as low signal-to-noise ratios, speech noise, or unseen noise types.

Electromyography (EMG) Speech Enhancement

Paper
Add Code

HASA-net: A non-intrusive hearing-aid speech assessment network

no code implementations • 10 Nov 2021 • Hsin-Tien Chiang, Yi-Chiao Wu, Cheng Yu, Tomoki Toda, Hsin-Min Wang, Yih-Chun Hu, Yu Tsao

Without the need of a clean reference, non-intrusive speech assessment methods have caught great attention for objective evaluations.

Paper
Add Code

Deep Learning-based Non-Intrusive Multi-Objective Speech Assessment Model with Cross-Domain Features

1 code implementation • 3 Nov 2021 • Ryandhimas E. Zezario, Szu-Wei Fu, Fei Chen, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao

In this study, we propose a cross-domain multi-objective speech assessment model called MOSA-Net, which can estimate multiple speech assessment metrics simultaneously.

Speech Enhancement

Paper
Code

Speech Enhancement-assisted Voice Conversion in Noisy Environments

no code implementations • 19 Oct 2021 • Yun-Ju Chan, Chiang-Jen Peng, Syu-Siang Wang, Hsin-Min Wang, Yu Tsao, Tai-Shih Chi

Numerous voice conversion (VC) techniques have been proposed for the conversion of voices among different speakers.

Speech Enhancement Voice Conversion

Paper
Add Code

Time Alignment using Lip Images for Frame-based Electrolaryngeal Voice Conversion

no code implementations • 8 Sep 2021 • Yi-Syuan Liou, Wen-Chin Huang, Ming-Chi Yen, Shu-Wei Tsai, Yu-Huai Peng, Tomoki Toda, Yu Tsao, Hsin-Min Wang

Voice conversion (VC) is an effective approach to electrolaryngeal (EL) speech enhancement, a task that aims to improve the quality of the artificial voice from an electrolarynx device.

Dynamic Time Warping Speech Enhancement +1

Paper
Add Code

SVSNet: An End-to-end Speaker Voice Similarity Assessment Model

no code implementations • 20 Jul 2021 • Cheng-Hung Hu, Yu-Huai Peng, Junichi Yamagishi, Yu Tsao, Hsin-Min Wang

Neural evaluation metrics derived for numerous speech generation tasks have recently attracted great attention.

Voice Conversion Voice Similarity

Paper
Add Code

Dual-Path Filter Network: Speaker-Aware Modeling for Speech Separation

no code implementations • 14 Jun 2021 • Fan-Lin Wang, Yu-Huai Peng, Hung-Shin Lee, Hsin-Min Wang

DPFN is composed of two parts: the speaker module and the separation module.

Speech Separation

Paper
Add Code

Relational Data Selection for Data Augmentation of Speaker-dependent Multi-band MelGAN Vocoder

no code implementations • 10 Jun 2021 • Yi-Chiao Wu, Cheng-Hung Hu, Hung-Shin Lee, Yu-Huai Peng, Wen-Chin Huang, Yu Tsao, Hsin-Min Wang, Tomoki Toda

Nowadays, neural vocoders can generate very high-fidelity speech when a bunch of training data is available.

Data Augmentation Speaker Verification

Paper
Add Code

A Preliminary Study of a Two-Stage Paradigm for Preserving Speaker Identity in Dysarthric Voice Conversion

no code implementations • 2 Jun 2021 • Wen-Chin Huang, Kazuhiro Kobayashi, Yu-Huai Peng, Ching-Feng Liu, Yu Tsao, Hsin-Min Wang, Tomoki Toda

First, a powerful parallel sequence-to-sequence model converts the input dysarthric speech into a normal speech of a reference speaker as an intermediate product, and a nonparallel, frame-wise VC model realized with a variational autoencoder then converts the speaker identity of the reference speech back to that of the patient while assumed to be capable of preserving the enhanced quality.

Voice Conversion

Paper
Add Code

Sequence to General Tree: Knowledge-Guided Geometry Word Problem Solving

1 code implementation • ACL 2021 • Shih-hung Tsai, Chao-Chun Liang, Hsin-Min Wang, Keh-Yih Su

With the recent advancements in deep learning, neural solvers have gained promising results in solving math word problems.

Math

Paper
Code

AlloST: Low-resource Speech Translation without Source Transcription

1 code implementation • 1 May 2021 • Yao-Fei Cheng, Hung-Shin Lee, Hsin-Min Wang

In this study, we survey methods to improve ST performance without using source transcription, and propose a learning framework that utilizes a language-independent universal phone recognizer.

Decoder Translation

Paper
Code

The AS-NU System for the M2VoC Challenge

no code implementations • 7 Apr 2021 • Cheng-Hung Hu, Yi-Chiao Wu, Wen-Chin Huang, Yu-Huai Peng, Yu-Wen Chen, Pin-Jui Ku, Tomoki Toda, Yu Tsao, Hsin-Min Wang

The first track focuses on using a small number of 100 target utterances for voice cloning, while the second track focuses on using only 5 target utterances for voice cloning.

Voice Cloning

Paper
Add Code

Speech Recognition by Simply Fine-tuning BERT

no code implementations • 30 Jan 2021 • Wen-Chin Huang, Chia-Hua Wu, Shang-Bao Luo, Kuan-Yu Chen, Hsin-Min Wang, Tomoki Toda

We propose a simple method for automatic speech recognition (ASR) by fine-tuning BERT, which is a language model (LM) trained on large-scale unlabeled text data and can generate rich contextual representations.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Speech Enhancement with Zero-Shot Model Selection

1 code implementation • 17 Dec 2020 • Ryandhimas E. Zezario, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao

Experimental results confirmed that the proposed ZMOS approach can achieve better performance in both seen and unseen noise types compared to the baseline systems and other model selection systems, which indicates the effectiveness of the proposed approach in providing robust SE performance.

Ensemble Learning Model Selection +2

Paper
Code

STOI-Net: A Deep Learning based Non-Intrusive Speech Intelligibility Assessment Model

1 code implementation • 9 Nov 2020 • Ryandhimas E. Zezario, Szu-Wei Fu, Chiou-Shann Fuh, Yu Tsao, Hsin-Min Wang

To overcome this limitation, we propose a deep learning-based non-intrusive speech intelligibility assessment model, namely STOI-Net.

Paper
Code

The Academia Sinica Systems of Voice Conversion for VCC2020

no code implementations • 6 Oct 2020 • Yu-Huai Peng, Cheng-Hung Hu, Alexander Kang, Hung-Shin Lee, Pin-Yuan Chen, Yu Tsao, Hsin-Min Wang

This paper describes the Academia Sinica systems for the two tasks of Voice Conversion Challenge 2020, namely voice conversion within the same language (Task 1) and cross-lingual voice conversion (Task 2).

Task 2 Voice Conversion

Paper
Add Code

Improved Lite Audio-Visual Speech Enhancement

1 code implementation • 30 Aug 2020 • Shang-Yi Chuang, Hsin-Min Wang, Yu Tsao

Experimental results confirm that compared to conventional AVSE systems, iLAVSE can effectively overcome the aforementioned three practical issues and can improve enhancement performance.

Speech Enhancement

Paper
Code

SERIL: Noise Adaptive Speech Enhancement using Regularization-based Incremental Learning

1 code implementation • 24 May 2020 • Chi-Chang Lee, Yu-Chen Lin, Hsuan-Tien Lin, Hsin-Min Wang, Yu Tsao

The results verify that the SERIL model can effectively adjust itself to new noise environments while overcoming the catastrophic forgetting issue.

Incremental Learning Speech Enhancement

Paper
Code

Lite Audio-Visual Speech Enhancement

1 code implementation • 24 May 2020 • Shang-Yi Chuang, Yu Tsao, Chen-Chou Lo, Hsin-Min Wang

Previous studies have confirmed the effectiveness of incorporating visual information into speech enhancement (SE) systems.

Data Compression Denoising +1

Paper
Code

WaveCRN: An Efficient Convolutional Recurrent Neural Network for End-to-end Speech Enhancement

1 code implementation • 6 Apr 2020 • Tsun-An Hsieh, Hsin-Min Wang, Xugang Lu, Yu Tsao

In WaveCRN, the speech locality feature is captured by a convolutional neural network (CNN), while the temporal sequential property of the locality feature is modeled by stacked simple recurrent units (SRU).

Denoising Speech Denoising +2

Paper
Code

Unsupervised Representation Disentanglement using Cross Domain Features and Adversarial Learning in Variational Autoencoder based Voice Conversion

1 code implementation • 22 Jan 2020 • Wen-Chin Huang, Hao Luo, Hsin-Te Hwang, Chen-Chou Lo, Yu-Huai Peng, Yu Tsao, Hsin-Min Wang

In this paper, we extend the CDVAE-VC framework by incorporating the concept of adversarial learning, in order to further increase the degree of disentanglement, thereby improving the quality and similarity of converted speech.

Disentanglement Voice Conversion

Paper
Code

Speech Enhancement based on Denoising Autoencoder with Multi-branched Encoders

1 code implementation • 6 Jan 2020 • Cheng Yu, Ryandhimas E. Zezario, Jonathan Sherman, Yi-Yen Hsieh, Xugang Lu, Hsin-Min Wang, Yu Tsao

The DSDT is built based on a prior knowledge of speech and noisy conditions (the speaker, environment, and signal factors are considered in this paper), where each component of the multi-branched encoder performs a particular mapping from noisy to clean speech along the branch in the DSDT.

Decoder Denoising +1

Paper
Code

Distributed Microphone Speech Enhancement based on Deep Learning

no code implementations • 19 Nov 2019 • Syu-Siang Wang, Yu-You Liang, Jeih-weih Hung, Yu Tsao, Hsin-Min Wang, Shih-Hau Fang

Speech-related applications deliver inferior performance in complex noise environments.

Speech Enhancement

Paper
Add Code

ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech

no code implementations • 5 Nov 2019 • Xin Wang, Junichi Yamagishi, Massimiliano Todisco, Hector Delgado, Andreas Nautsch, Nicholas Evans, Md Sahidullah, Ville Vestman, Tomi Kinnunen, Kong Aik Lee, Lauri Juvela, Paavo Alku, Yu-Huai Peng, Hsin-Te Hwang, Yu Tsao, Hsin-Min Wang, Sebastien Le Maguer, Markus Becker, Fergus Henderson, Rob Clark, Yu Zhang, Quan Wang, Ye Jia, Kai Onuma, Koji Mushika, Takashi Kaneda, Yuan Jiang, Li-Juan Liu, Yi-Chiao Wu, Wen-Chin Huang, Tomoki Toda, Kou Tanaka, Hirokazu Kameoka, Ingmar Steiner, Driss Matrouf, Jean-Francois Bonastre, Avashna Govender, Srikanth Ronanki, Jing-Xuan Zhang, Zhen-Hua Ling

Spoofing attacks within a logical access (LA) scenario are generated with the latest speech synthesis and voice conversion technologies, including state-of-the-art neural acoustic and waveform model techniques.

Person Recognition Speaker Verification +2

Paper
Add Code

Multichannel Speech Enhancement by Raw Waveform-mapping using Fully Convolutional Networks

no code implementations • 26 Sep 2019 • Chang-Le Liu, Sze-Wei Fu, You-Jin Li, Jen-Wei Huang, Hsin-Min Wang, Yu Tsao

We also propose an extended version of SDFCN, called the residual SDFCN (termed rSDFCN).

Denoising Speech Enhancement

Paper
Add Code

Improving the Intelligibility of Electric and Acoustic Stimulation Speech Using Fully Convolutional Networks Based Speech Enhancement

no code implementations • 26 Sep 2019 • Natalie Yu-Hsien Wang, Hsiao-Lan Sharon Wang, Tao-Wei Wang, Szu-Wei Fu, Xugan Lu, Yu Tsao, Hsin-Min Wang

Recently, a time-domain speech enhancement algorithm based on the fully convolutional neural networks (FCN) with a short-time objective intelligibility (STOI)-based objective function (termed FCN(S) in short) has received increasing attention due to its simple structure and effectiveness of restoring clean speech signals from noisy counterparts.

Denoising Speech Enhancement +1 Sound Audio and Speech Processing

Paper
Add Code

Improving Automatic Jazz Melody Generation by Transfer Learning Techniques

1 code implementation • 26 Aug 2019 • Hsiao-Tzu Hung, Chung-Yang Wang, Yi-Hsuan Yang, Hsin-Min Wang

In this paper, we tackle the problem of transfer learning for Jazz automatic generation.

Music Generation Transfer Learning

Paper
Code

Investigation of F0 conditioning and Fully Convolutional Networks in Variational Autoencoder based Voice Conversion

1 code implementation • 2 May 2019 • Wen-Chin Huang, Yi-Chiao Wu, Chen-Chou Lo, Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda, Yu Tsao, Hsin-Min Wang

Such hypothesis implies that during the conversion phase, the latent codes and the converted features in VAE based VC are in fact source F0 dependent.

Decoder Disentanglement +1

Paper
Code

MOSNet: Deep Learning based Objective Assessment for Voice Conversion

6 code implementations • 17 Apr 2019 • Chen-Chou Lo, Szu-Wei Fu, Wen-Chin Huang, Xin Wang, Junichi Yamagishi, Yu Tsao, Hsin-Min Wang

In this paper, we propose deep learning-based assessment models to predict human ratings of converted speech.

Voice Conversion

837

Paper
Code

Refined WaveNet Vocoder for Variational Autoencoder Based Voice Conversion

no code implementations • 27 Nov 2018 • Wen-Chin Huang, Yi-Chiao Wu, Hsin-Te Hwang, Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda, Yu Tsao, Hsin-Min Wang

Conventional WaveNet vocoders are trained with natural acoustic features but conditioned on the converted features in the conversion stage for VC, and such a mismatch often causes significant quality and similarity degradation.

Voice Conversion

Paper
Add Code

WaveNet 聲碼器及其於語音轉換之應用 (WaveNet Vocoder and its Applications in Voice Conversion) [In Chinese]

no code implementations • ROCLINGIJCLCLP 2018 • Wen-Chin Huang, Chen-Chou Lo, Hsin-Te Hwang, Yu Tsao, Hsin-Min Wang

Voice Conversion

Paper
Add Code

Voice Conversion Based on Cross-Domain Features Using Variational Auto Encoders

1 code implementation • 29 Aug 2018 • Wen-Chin Huang, Hsin-Te Hwang, Yu-Huai Peng, Yu Tsao, Hsin-Min Wang

An effective approach to non-parallel voice conversion (VC) is to utilize deep neural networks (DNNs), specifically variational auto encoders (VAEs), to model the latent structure of speech in an unsupervised manner.

Voice Conversion

Paper
Code

Quality-Net: An End-to-End Non-intrusive Speech Quality Assessment Model based on BLSTM

no code implementations • 16 Aug 2018 • Szu-Wei Fu, Yu Tsao, Hsin-Te Hwang, Hsin-Min Wang

The evaluation of utterance-level quality in Quality-Net is based on the frame-level assessment.

Speech Enhancement

Paper
Add Code

Noise Adaptive Speech Enhancement using Domain Adversarial Training

1 code implementation • 19 Jul 2018 • Chien-Feng Liao, Yu Tsao, Hung-Yi Lee, Hsin-Min Wang

The proposed noise adaptive SE system contains an encoder-decoder-based enhancement model and a domain discriminator model.

Sound Audio and Speech Processing

Paper
Code

語音文件檢索使用類神經網路技術 (On the Use of Neural Network Modeling Techniques for Spoken Document Retrieval) [In Chinese]

no code implementations • ROCLINGIJCLCLP 2017 • Tien-Hong Lo, Ying-Wen Chen, Kuan-Yu Chen, Hsin-Min Wang, Berlin Chen

Retrieval

Paper
Add Code

基於鑑別式自編碼解碼器之錄音回放攻擊偵測系統 (A Replay Spoofing Detection System Based on Discriminative Autoencoders) [In Chinese]

no code implementations • ROCLINGIJCLCLP 2017 • Yu-Ding Lu, Hung-Shin Lee, Yu Tsao, Hsin-Min Wang

Speaker Verification

Paper
Add Code

基於i-vector與PLDA並使用GMM-HMM強制對位之自動語者分段標記系統 (Speaker Diarization based on I-vector PLDA Scoring and using GMM-HMM Forced Alignment) [In Chinese]

no code implementations • ROCLINGIJCLCLP 2017 • Cheng-Jo Ray Chang, Hung-Shin Lee, Hsin-Min Wang, Jyh-Shing Roger Jang

speaker-diarization Speaker Diarization

Paper
Add Code

使用查詢意向探索與類神經網路於語音文件檢索之研究 (Exploring Query Intent and Neural Network modeling Techniques for Spoken Document Retrieval) [In Chinese]

no code implementations • ROCLINGIJCLCLP 2017 • Tien-Hong Lo, Ying-Wen Chen, Berlin Chen, Kuan-Yu Chen, Hsin-Min Wang

Retrieval

Paper
Add Code

Audio-Visual Speech Enhancement Using Multimodal Deep Convolutional Neural Networks

no code implementations • 1 Sep 2017 • Jen-Cheng Hou, Syu-Siang Wang, Ying-Hui Lai, Yu Tsao, Hsiu-Wen Chang, Hsin-Min Wang

Precisely speaking, the proposed AVDCNN model is structured as an audio-visual encoder-decoder network, in which audio and visual data are first processed using individual CNNs, and then fused into a joint network to generate enhanced speech (the primary task) and reconstructed images (the secondary task) at the output layer.

Decoder Multi-Task Learning +1

Paper
Add Code

當代非監督式方法之比較於節錄式語音摘要 (An Empirical Comparison of Contemporary Unsupervised Approaches for Extractive Speech Summarization) [In Chinese]

no code implementations • ROCLINGIJCLCLP 2017 • Shih-Hung Liu, Kuan-Yu Chen, Kai-Wun Shih, Berlin Chen, Hsin-Min Wang, Wen-Lian Hsu

Information Retrieval Language Modelling

Paper
Add Code

Voice Conversion from Unaligned Corpora using Variational Autoencoding Wasserstein Generative Adversarial Networks

1 code implementation • 4 Apr 2017 • Chin-Cheng Hsu, Hsin-Te Hwang, Yi-Chiao Wu, Yu Tsao, Hsin-Min Wang

Building a voice conversion (VC) system from non-parallel speech corpora is challenging but highly valuable in real application scenarios.

Generative Adversarial Network Voice Conversion

145

Paper
Code

Audio-Visual Speech Enhancement Using Multimodal Deep Convolutional Neural Networks

no code implementations • 30 Mar 2017 • Jen-Cheng Hou, Syu-Siang Wang, Ying-Hui Lai, Yu Tsao, Hsiu-Wen Chang, Hsin-Min Wang

Decoder Multi-Task Learning +1

Paper
Add Code

Learning to Distill: The Essence Vector Modeling Framework

no code implementations • COLING 2016 • Kuan-Yu Chen, Shih-Hung Liu, Berlin Chen, Hsin-Min Wang

The D-EV model not only inherits the advantages of the EV model but also can infer a more robust representation for a given spoken paragraph against imperfect speech recognition.

Denoising Document Embedding +6

Paper
Add Code

Dictionary Update for NMF-based Voice Conversion Using an Encoder-Decoder Network

no code implementations • 13 Oct 2016 • Chin-Cheng Hsu, Hsin-Te Hwang, Yi-Chiao Wu, Yu Tsao, Hsin-Min Wang

In this paper, we propose a dictionary update method for Nonnegative Matrix Factorization (NMF) with high dimensional data in a spectral conversion (SC) task.

Decoder Speech Enhancement +2

Paper
Add Code

Voice Conversion from Non-parallel Corpora Using Variational Auto-encoder

4 code implementations • 13 Oct 2016 • Chin-Cheng Hsu, Hsin-Te Hwang, Yi-Chiao Wu, Yu Tsao, Hsin-Min Wang

We propose a flexible framework for spectral conversion (SC) that facilitates training with unaligned corpora.

Decoder Voice Conversion

505

Paper
Code

運用序列到序列生成架構於重寫式自動摘要(Exploiting Sequence-to-Sequence Generation Framework for Automatic Abstractive Summarization)[In Chinese]

no code implementations • ROCLINGIJCLCLP 2016 • Yu-Lun Hsieh, Shih-Hung Liu, Kuan-Yu Chen, Hsin-Min Wang, Wen-Lian Hsu, Berlin Chen

Abstractive Text Summarization

Paper
Add Code

Novel Word Embedding and Translation-based Language Modeling for Extractive Speech Summarization

no code implementations • 22 Jul 2016 • Kuan-Yu Chen, Shih-Hung Liu, Berlin Chen, Hsin-Min Wang, Hsin-Hsi Chen

Word embedding methods revolve around learning continuous distributed vector representations of words with neural networks, which can capture semantic and/or syntactic cues, and in turn be used to induce similarity measures among words, sentences and documents in context.

Language Modelling Representation Learning +1

Paper
Add Code

Improved Spoken Document Summarization with Coverage Modeling Techniques

no code implementations • 20 Jan 2016 • Kuan-Yu Chen, Shih-Hung Liu, Berlin Chen, Hsin-Min Wang

In addition to MMR, there is only a dearth of research concentrating on reducing redundancy or increasing diversity for the spoken document summarization task, as far as we are aware.

Document Summarization Extractive Summarization +1

Paper
Add Code

節錄式語音文件摘要使用表示法學習技術 (Extractive Spoken Document Summarization with Representation Learning Techniques) [In Chinese]

no code implementations • ROCLINGIJCLCLP 2015 • Kai-Wun Shih, Kuan-Yu Chen, Shih-Hung Liu, Hsin-Min Wang, Berlin Chen

Document Summarization Representation Learning

Paper
Add Code

調變頻譜分解技術於強健語音辨識之研究 (Investigating Modulation Spectrum Factorization Techniques for Robust Speech Recognition) [In Chinese]

no code implementations • ROCLINGIJCLCLP 2015 • Ting-Hao Chang, Hsiao-Tsung Hung, Kuan-Yu Chen, Hsin-Min Wang, Berlin Chen

Robust Speech Recognition speech-recognition

Paper
Add Code

表示法學習技術於節錄式語音文件摘要之研究(A Study on Representation Learning Techniques for Extractive Spoken Document Summarization) [In Chinese]

no code implementations • ROCLINGIJCLCLP 2015 • Kai-Wun Shih, Berlin Chen, Kuan-Yu Chen, Shih-Hung Liu, Hsin-Min Wang

Document Summarization Representation Learning +1

Paper
Add Code

調變頻譜分解之改良於強健性語音辨識(Several Refinements of Modulation Spectrum Factorization for Robust Speech Recognition) [In Chinese]

no code implementations • ROCLINGIJCLCLP 2015 • Ting-Hao Chang, Hsiao-Tsung Hung, Kuan-Yu Chen, Hsin-Min Wang, Berlin Chen

Robust Speech Recognition speech-recognition

Paper
Add Code

Leveraging Word Embeddings for Spoken Document Summarization

no code implementations • 14 Jun 2015 • Kuan-Yu Chen, Shih-Hung Liu, Hsin-Min Wang, Berlin Chen, Hsin-Hsi Chen

Owing to the rapidly growing multimedia content available on the Internet, extractive spoken document summarization, with the purpose of automatically selecting a set of representative sentences from a spoken document to concisely express the most important theme of the document, has been an active area of research and experimentation.

Document Summarization Sentence +1