Search Results for author: Hsin-Min Wang

Found 73 papers, 20 papers with code

Mining Commonsense and Domain Knowledge from Math Word Problems

no code implementations ROCLING 2021 Shih-hung Tsai, Chao-Chun Liang, Hsin-Min Wang, Keh-Yih Su

We construct two math datasets and show the effectiveness of our algorithms that they can retrieve the required knowledge for problem-solving.

A Flexible and Extensible Framework for Multiple Answer Modes Question Answering

no code implementations ROCLING 2021 Cheng-Chung Fan, Chia-Chih Kuo, Shang-Bao Luo, Pei-Jun Liao, Kuang-Yu Chang, Chiao-Wei Hsu, Meng-Tse Wu, Shih-Hong Tsai, Tzu-Man Wu, Aleksandra Smolka, Chao-Chun Liang, Hsin-Min Wang, Kuan-Yu Chen, Yu Tsao, Keh-Yih Su

Only a few of them adopt several answer generation modules for providing different mechanisms; however, they either lack an aggregation mechanism to merge the answers from various modules, or are too complicated to be implemented with neural networks.

Answer Generation Question Answering

NASTAR: Noise Adaptive Speech Enhancement with Target-Conditional Resampling

no code implementations18 Jun 2022 Chi-Chang Lee, Cheng-Hung Hu, Yu-Chen Lin, Chu-Song Chen, Hsin-Min Wang, Yu Tsao

NASTAR uses a feedback mechanism to simulate adaptive training data via a noise extractor and a retrieval model.

Speech Enhancement

A Study of Using Cepstrogram for Countermeasure Against Replay Attacks

1 code implementation9 Apr 2022 Shih-kuang Lee, Yu Tsao, Hsin-Min Wang

Our LCNN-based single and fusion systems with the cepstrogram feature outperform the corresponding LCNN-based systems without using the cepstrogram feature and several state-of-the-art (SOTA) single and fusion systems in the literature.

MBI-Net: A Non-Intrusive Multi-Branched Speech Intelligibility Prediction Model for Hearing Aids

no code implementations7 Apr 2022 Ryandhimas E. Zezario, Fei Chen, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao

In this study, we propose a multi-branched speech intelligibility prediction model (MBI-Net), for predicting the subjective intelligibility scores of HA users.

Generation of Speaker Representations Using Heterogeneous Training Batch Assembly

no code implementations30 Mar 2022 Yu-Huai Peng, Hung-Shin Lee, Pin-Tuan Huang, Hsin-Min Wang

In traditional speaker diarization systems, a well-trained speaker model is a key component to extract representations from consecutive and partially overlapping segments in a long speech session.

Speaker Diarization

Disentangling the Impacts of Language and Channel Variability on Speech Separation Networks

1 code implementation30 Mar 2022 Fan-Lin Wang, Hung-Shin Lee, Yu Tsao, Hsin-Min Wang

However, domain mismatch between training/test situations due to factors, such as speaker, content, channel, and environment, remains a severe problem for speech separation.

Speech Separation

Subspace-based Representation and Learning for Phonotactic Spoken Language Recognition

no code implementations28 Mar 2022 Hung-Shin Lee, Yu Tsao, Shyh-Kang Jeng, Hsin-Min Wang

Phonotactic constraints can be employed to distinguish languages by representing a speech utterance as a multinomial distribution or phone events.

Speech-enhanced and Noise-aware Networks for Robust Speech Recognition

1 code implementation25 Mar 2022 Hung-Shin Lee, Pin-Yuan Chen, Yu Tsao, Hsin-Min Wang

In this paper, a noise-aware training framework based on two cascaded neural structures is proposed to jointly optimize speech enhancement and speech recognition.

Automatic Speech Recognition Robust Speech Recognition +1

Chain-based Discriminative Autoencoders for Speech Recognition

no code implementations25 Mar 2022 Hung-Shin Lee, Pin-Tuan Huang, Yao-Fei Cheng, Hsin-Min Wang

For application to robust speech recognition, we further extend c-DcAE to hierarchical and parallel structures, resulting in hc-DcAE and pc-DcAE.

Robust Speech Recognition

EMGSE: Acoustic/EMG Fusion for Multimodal Speech Enhancement

no code implementations14 Feb 2022 Kuan-Chen Wang, Kai-Chun Liu, Hsin-Min Wang, Yu Tsao

Multimodal learning has been proven to be an effective method to improve speech enhancement (SE) performance, especially in challenging situations such as low signal-to-noise ratios, speech noise, or unseen noise types.

Electromyography (EMG) Speech Enhancement

HASA-net: A non-intrusive hearing-aid speech assessment network

no code implementations10 Nov 2021 Hsin-Tien Chiang, Yi-Chiao Wu, Cheng Yu, Tomoki Toda, Hsin-Min Wang, Yih-Chun Hu, Yu Tsao

Without the need of a clean reference, non-intrusive speech assessment methods have caught great attention for objective evaluations.

Deep Learning-based Non-Intrusive Multi-Objective Speech Assessment Model with Cross-Domain Features

1 code implementation3 Nov 2021 Ryandhimas E. Zezario, Szu-Wei Fu, Fei Chen, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao

In this study, we propose a cross-domain multi-objective speech assessment model called MOSA-Net, which can estimate multiple speech assessment metrics simultaneously.

Speech Enhancement

Time Alignment using Lip Images for Frame-based Electrolaryngeal Voice Conversion

no code implementations8 Sep 2021 Yi-Syuan Liou, Wen-Chin Huang, Ming-Chi Yen, Shu-Wei Tsai, Yu-Huai Peng, Tomoki Toda, Yu Tsao, Hsin-Min Wang

Voice conversion (VC) is an effective approach to electrolaryngeal (EL) speech enhancement, a task that aims to improve the quality of the artificial voice from an electrolarynx device.

Dynamic Time Warping Speech Enhancement +1

SVSNet: An End-to-end Speaker Voice Similarity Assessment Model

no code implementations20 Jul 2021 Cheng-Hung Hu, Yu-Huai Peng, Junichi Yamagishi, Yu Tsao, Hsin-Min Wang

Neural evaluation metrics derived for numerous speech generation tasks have recently attracted great attention.

Voice Conversion

A Preliminary Study of a Two-Stage Paradigm for Preserving Speaker Identity in Dysarthric Voice Conversion

no code implementations2 Jun 2021 Wen-Chin Huang, Kazuhiro Kobayashi, Yu-Huai Peng, Ching-Feng Liu, Yu Tsao, Hsin-Min Wang, Tomoki Toda

First, a powerful parallel sequence-to-sequence model converts the input dysarthric speech into a normal speech of a reference speaker as an intermediate product, and a nonparallel, frame-wise VC model realized with a variational autoencoder then converts the speaker identity of the reference speech back to that of the patient while assumed to be capable of preserving the enhanced quality.

Voice Conversion

Sequence to General Tree: Knowledge-Guided Geometry Word Problem Solving

1 code implementation ACL 2021 Shih-hung Tsai, Chao-Chun Liang, Hsin-Min Wang, Keh-Yih Su

With the recent advancements in deep learning, neural solvers have gained promising results in solving math word problems.

AlloST: Low-resource Speech Translation without Source Transcription

1 code implementation1 May 2021 Yao-Fei Cheng, Hung-Shin Lee, Hsin-Min Wang

In this study, we survey methods to improve ST performance without using source transcription, and propose a learning framework that utilizes a language-independent universal phone recognizer.

Translation

The AS-NU System for the M2VoC Challenge

no code implementations7 Apr 2021 Cheng-Hung Hu, Yi-Chiao Wu, Wen-Chin Huang, Yu-Huai Peng, Yu-Wen Chen, Pin-Jui Ku, Tomoki Toda, Yu Tsao, Hsin-Min Wang

The first track focuses on using a small number of 100 target utterances for voice cloning, while the second track focuses on using only 5 target utterances for voice cloning.

Speech Recognition by Simply Fine-tuning BERT

no code implementations30 Jan 2021 Wen-Chin Huang, Chia-Hua Wu, Shang-Bao Luo, Kuan-Yu Chen, Hsin-Min Wang, Tomoki Toda

We propose a simple method for automatic speech recognition (ASR) by fine-tuning BERT, which is a language model (LM) trained on large-scale unlabeled text data and can generate rich contextual representations.

Automatic Speech Recognition

Speech Enhancement with Zero-Shot Model Selection

1 code implementation17 Dec 2020 Ryandhimas E. Zezario, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao

Experimental results confirmed that the proposed ZMOS approach can achieve better performance in both seen and unseen noise types compared to the baseline systems and other model selection systems, which indicates the effectiveness of the proposed approach in providing robust SE performance.

Ensemble Learning Model Selection +2

STOI-Net: A Deep Learning based Non-Intrusive Speech Intelligibility Assessment Model

1 code implementation9 Nov 2020 Ryandhimas E. Zezario, Szu-Wei Fu, Chiou-Shann Fuh, Yu Tsao, Hsin-Min Wang

To overcome this limitation, we propose a deep learning-based non-intrusive speech intelligibility assessment model, namely STOI-Net.

The Academia Sinica Systems of Voice Conversion for VCC2020

no code implementations6 Oct 2020 Yu-Huai Peng, Cheng-Hung Hu, Alexander Kang, Hung-Shin Lee, Pin-Yuan Chen, Yu Tsao, Hsin-Min Wang

This paper describes the Academia Sinica systems for the two tasks of Voice Conversion Challenge 2020, namely voice conversion within the same language (Task 1) and cross-lingual voice conversion (Task 2).

Voice Conversion

Improved Lite Audio-Visual Speech Enhancement

1 code implementation30 Aug 2020 Shang-Yi Chuang, Hsin-Min Wang, Yu Tsao

Experimental results confirm that compared to conventional AVSE systems, iLAVSE can effectively overcome the aforementioned three practical issues and can improve enhancement performance.

Speech Enhancement

SERIL: Noise Adaptive Speech Enhancement using Regularization-based Incremental Learning

1 code implementation24 May 2020 Chi-Chang Lee, Yu-Chen Lin, Hsuan-Tien Lin, Hsin-Min Wang, Yu Tsao

The results verify that the SERIL model can effectively adjust itself to new noise environments while overcoming the catastrophic forgetting issue.

Incremental Learning Speech Enhancement

Lite Audio-Visual Speech Enhancement

1 code implementation24 May 2020 Shang-Yi Chuang, Yu Tsao, Chen-Chou Lo, Hsin-Min Wang

Previous studies have confirmed the effectiveness of incorporating visual information into speech enhancement (SE) systems.

Data Compression Denoising +1

WaveCRN: An Efficient Convolutional Recurrent Neural Network for End-to-end Speech Enhancement

1 code implementation6 Apr 2020 Tsun-An Hsieh, Hsin-Min Wang, Xugang Lu, Yu Tsao

In WaveCRN, the speech locality feature is captured by a convolutional neural network (CNN), while the temporal sequential property of the locality feature is modeled by stacked simple recurrent units (SRU).

Denoising Speech Denoising +2

Unsupervised Representation Disentanglement using Cross Domain Features and Adversarial Learning in Variational Autoencoder based Voice Conversion

1 code implementation22 Jan 2020 Wen-Chin Huang, Hao Luo, Hsin-Te Hwang, Chen-Chou Lo, Yu-Huai Peng, Yu Tsao, Hsin-Min Wang

In this paper, we extend the CDVAE-VC framework by incorporating the concept of adversarial learning, in order to further increase the degree of disentanglement, thereby improving the quality and similarity of converted speech.

Disentanglement Voice Conversion

Speech Enhancement based on Denoising Autoencoder with Multi-branched Encoders

no code implementations6 Jan 2020 Cheng Yu, Ryandhimas E. Zezario, Jonathan Sherman, Yi-Yen Hsieh, Xugang Lu, Hsin-Min Wang, Yu Tsao

The DSDT is built based on a prior knowledge of speech and noisy conditions (the speaker, environment, and signal factors are considered in this paper), where each component of the multi-branched encoder performs a particular mapping from noisy to clean speech along the branch in the DSDT.

Denoising Speech Enhancement

Improving the Intelligibility of Electric and Acoustic Stimulation Speech Using Fully Convolutional Networks Based Speech Enhancement

no code implementations26 Sep 2019 Natalie Yu-Hsien Wang, Hsiao-Lan Sharon Wang, Tao-Wei Wang, Szu-Wei Fu, Xugan Lu, Yu Tsao, Hsin-Min Wang

Recently, a time-domain speech enhancement algorithm based on the fully convolutional neural networks (FCN) with a short-time objective intelligibility (STOI)-based objective function (termed FCN(S) in short) has received increasing attention due to its simple structure and effectiveness of restoring clean speech signals from noisy counterparts.

Denoising Speech Enhancement +1 Sound Audio and Speech Processing

MOSNet: Deep Learning based Objective Assessment for Voice Conversion

6 code implementations17 Apr 2019 Chen-Chou Lo, Szu-Wei Fu, Wen-Chin Huang, Xin Wang, Junichi Yamagishi, Yu Tsao, Hsin-Min Wang

In this paper, we propose deep learning-based assessment models to predict human ratings of converted speech.

Voice Conversion

Refined WaveNet Vocoder for Variational Autoencoder Based Voice Conversion

no code implementations27 Nov 2018 Wen-Chin Huang, Yi-Chiao Wu, Hsin-Te Hwang, Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda, Yu Tsao, Hsin-Min Wang

Conventional WaveNet vocoders are trained with natural acoustic features but conditioned on the converted features in the conversion stage for VC, and such a mismatch often causes significant quality and similarity degradation.

Voice Conversion

Voice Conversion Based on Cross-Domain Features Using Variational Auto Encoders

1 code implementation29 Aug 2018 Wen-Chin Huang, Hsin-Te Hwang, Yu-Huai Peng, Yu Tsao, Hsin-Min Wang

An effective approach to non-parallel voice conversion (VC) is to utilize deep neural networks (DNNs), specifically variational auto encoders (VAEs), to model the latent structure of speech in an unsupervised manner.

Voice Conversion

Noise Adaptive Speech Enhancement using Domain Adversarial Training

1 code implementation19 Jul 2018 Chien-Feng Liao, Yu Tsao, Hung-Yi Lee, Hsin-Min Wang

The proposed noise adaptive SE system contains an encoder-decoder-based enhancement model and a domain discriminator model.

Sound Audio and Speech Processing

Audio-Visual Speech Enhancement Using Multimodal Deep Convolutional Neural Networks

no code implementations1 Sep 2017 Jen-Cheng Hou, Syu-Siang Wang, Ying-Hui Lai, Yu Tsao, Hsiu-Wen Chang, Hsin-Min Wang

Precisely speaking, the proposed AVDCNN model is structured as an audio-visual encoder-decoder network, in which audio and visual data are first processed using individual CNNs, and then fused into a joint network to generate enhanced speech (the primary task) and reconstructed images (the secondary task) at the output layer.

Multi-Task Learning Speech Enhancement

Voice Conversion from Unaligned Corpora using Variational Autoencoding Wasserstein Generative Adversarial Networks

1 code implementation4 Apr 2017 Chin-Cheng Hsu, Hsin-Te Hwang, Yi-Chiao Wu, Yu Tsao, Hsin-Min Wang

Building a voice conversion (VC) system from non-parallel speech corpora is challenging but highly valuable in real application scenarios.

Voice Conversion

Audio-Visual Speech Enhancement Using Multimodal Deep Convolutional Neural Networks

no code implementations30 Mar 2017 Jen-Cheng Hou, Syu-Siang Wang, Ying-Hui Lai, Yu Tsao, Hsiu-Wen Chang, Hsin-Min Wang

Precisely speaking, the proposed AVDCNN model is structured as an audio-visual encoder-decoder network, in which audio and visual data are first processed using individual CNNs, and then fused into a joint network to generate enhanced speech (the primary task) and reconstructed images (the secondary task) at the output layer.

Multi-Task Learning Speech Enhancement

Learning to Distill: The Essence Vector Modeling Framework

no code implementations COLING 2016 Kuan-Yu Chen, Shih-Hung Liu, Berlin Chen, Hsin-Min Wang

The D-EV model not only inherits the advantages of the EV model but also can infer a more robust representation for a given spoken paragraph against imperfect speech recognition.

Denoising Document Embedding +4

Voice Conversion from Non-parallel Corpora Using Variational Auto-encoder

4 code implementations13 Oct 2016 Chin-Cheng Hsu, Hsin-Te Hwang, Yi-Chiao Wu, Yu Tsao, Hsin-Min Wang

We propose a flexible framework for spectral conversion (SC) that facilitates training with unaligned corpora.

Voice Conversion

Dictionary Update for NMF-based Voice Conversion Using an Encoder-Decoder Network

no code implementations13 Oct 2016 Chin-Cheng Hsu, Hsin-Te Hwang, Yi-Chiao Wu, Yu Tsao, Hsin-Min Wang

In this paper, we propose a dictionary update method for Nonnegative Matrix Factorization (NMF) with high dimensional data in a spectral conversion (SC) task.

Speech Enhancement Speech Synthesis +1

Novel Word Embedding and Translation-based Language Modeling for Extractive Speech Summarization

no code implementations22 Jul 2016 Kuan-Yu Chen, Shih-Hung Liu, Berlin Chen, Hsin-Min Wang, Hsin-Hsi Chen

Word embedding methods revolve around learning continuous distributed vector representations of words with neural networks, which can capture semantic and/or syntactic cues, and in turn be used to induce similarity measures among words, sentences and documents in context.

Language Modelling Representation Learning +1

Improved Spoken Document Summarization with Coverage Modeling Techniques

no code implementations20 Jan 2016 Kuan-Yu Chen, Shih-Hung Liu, Berlin Chen, Hsin-Min Wang

In addition to MMR, there is only a dearth of research concentrating on reducing redundancy or increasing diversity for the spoken document summarization task, as far as we are aware.

Document Summarization Extractive Summarization

Leveraging Word Embeddings for Spoken Document Summarization

no code implementations14 Jun 2015 Kuan-Yu Chen, Shih-Hung Liu, Hsin-Min Wang, Berlin Chen, Hsin-Hsi Chen

Owing to the rapidly growing multimedia content available on the Internet, extractive spoken document summarization, with the purpose of automatically selecting a set of representative sentences from a spoken document to concisely express the most important theme of the document, has been an active area of research and experimentation.

Document Summarization Natural Language Processing +1

Cannot find the paper you are looking for? You can Submit a new open access paper.