Search Results for author: Hung-Yi Lee

Found 133 papers, 57 papers with code

Is BERT a Cross-Disciplinary Knowledge Learner? A Surprising Finding of Pre-trained Models’ Transferability

no code implementations Findings (EMNLP) 2021 Wei-Tsung Kao, Hung-Yi Lee

This paper investigates whether the power of the models pre-trained on text data, such as BERT, can be transferred to general token sequence classification applications.

Text Classification

Membership Inference Attacks Against Self-supervised Speech Models

1 code implementation9 Nov 2021 Wei-Cheng Tseng, Wei-Tsung Kao, Hung-Yi Lee

Recently, adapting the idea of self-supervised learning (SSL) on continuous speech has started gaining attention.

Self-Supervised Learning

Characterizing the adversarial vulnerability of speech self-supervised learning

no code implementations8 Nov 2021 Haibin Wu, Bo Zheng, Xu Li, Xixin Wu, Hung-Yi Lee, Helen Meng

As the paradigm of the self-supervised learning upstream model followed by downstream tasks arouses more attention in the speech community, characterizing the adversarial robustness of such paradigm is of high priority.

Adversarial Robustness Representation Learning +1

Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech

1 code implementation7 Nov 2021 Sung-Feng Huang, Chyi-Jiunn Lin, Hung-Yi Lee

On the one hand, speaker adaptation methods fine-tune a trained multi-speaker text-to-speech (TTS) model with few enrolled samples.

Meta-Learning Speech Synthesis

Don't speak too fast: The impact of data bias on self-supervised speech models

no code implementations15 Oct 2021 Yen Meng, Yi-Hui Chou, Andy T. Liu, Hung-Yi Lee

Self-supervised Speech Models (S3Ms) have been proven successful in many speech downstream tasks, like ASR.

Toward Degradation-Robust Voice Conversion

no code implementations14 Oct 2021 Chien-yu Huang, Kai-Wei Chang, Hung-Yi Lee

However, in real-world scenarios, it is difficult to collect clean utterances of a speaker, and they are usually degraded by noises or reverberations.

Denoising Speech Enhancement +1

S3PRL-VC: Open-source Voice Conversion Framework with Self-supervised Speech Representations

1 code implementation12 Oct 2021 Wen-Chin Huang, Shu-wen Yang, Tomoki Hayashi, Hung-Yi Lee, Shinji Watanabe, Tomoki Toda

In this work, we provide a series of in-depth analyses by benchmarking on the two tasks in VCC2020, namely intra-/cross-lingual any-to-one (A2O) VC, as well as an any-to-any (A2A) setting.

Voice Conversion

CheerBots: Chatbots toward Empathy and Emotionusing Reinforcement Learning

no code implementations8 Oct 2021 Jiun-Hao Jhan, Chao-Peng Liu, Shyh-Kang Jeng, Hung-Yi Lee

Apart from the coherence and fluency of responses, an empathetic chatbot emphasizes more on people's feelings.


Analyzing the Robustness of Unsupervised Speech Recognition

no code implementations7 Oct 2021 Guan-Ting Lin, Chan-Jan Hsu, Da-Rong Liu, Hung-Yi Lee, Yu Tsao

In this work, we further analyze the training robustness of unsupervised ASR on the domain mismatch scenarios in which the domains of unpaired speech and text are different.

Speech Recognition Unsupervised Speech Recognition

DistilHuBERT: Speech Representation Learning by Layer-wise Distillation of Hidden-unit BERT

1 code implementation5 Oct 2021 Heng-Jui Chang, Shu-wen Yang, Hung-Yi Lee

Self-supervised speech representation learning methods like wav2vec 2. 0 and Hidden-unit BERT (HuBERT) leverage unlabeled speech data for pre-training and offer good representations for numerous speech processing tasks.

Multi-Task Learning Representation Learning

On the Transferability of Pre-trained Language Models: A Study from Artificial Datasets

no code implementations8 Sep 2021 Cheng-Han Chiang, Hung-Yi Lee

In this work, we study what specific traits in the pre-training data, other than the semantics, make a pre-trained LM superior to their counterparts trained from scratch on downstream tasks.

Parallelized Reverse Curriculum Generation

no code implementations4 Aug 2021 Zih-Yun Chiu, Yi-Lin Tuan, Hung-Yi Lee, Li-Chen Fu

For reinforcement learning (RL), it is challenging for an agent to master a task that requires a specific series of actions due to sparse rewards.

Spotting adversarial samples for speaker verification by neural vocoders

1 code implementation1 Jul 2021 Haibin Wu, Po-chun Hsu, Ji Gao, Shanshan Zhang, Shen Huang, Jian Kang, Zhiyong Wu, Helen Meng, Hung-Yi Lee

This effort is, to the best of our knowledge, among the first to pursue such a technical direction for detecting adversarial samples for ASV, and hence there is a lack of established baselines for comparison.

Speaker Verification

Voting for the right answer: Adversarial defense for speaker verification

1 code implementation15 Jun 2021 Haibin Wu, Yang Zhang, Zhiyong Wu, Dong Wang, Hung-Yi Lee

Automatic speaker verification (ASV) is a well developed technology for biometric identification, and has been ubiquitous implemented in security-critic applications, such as banking and access control.

Adversarial Defense Speaker Verification

Investigating the Reordering Capability in CTC-based Non-Autoregressive End-to-End Speech Translation

1 code implementation Findings (ACL) 2021 Shun-Po Chuang, Yung-Sung Chuang, Chih-Chiang Chang, Hung-Yi Lee

We study the possibilities of building a non-autoregressive speech-to-text translation model using connectionist temporal classification (CTC), and use CTC-based automatic speech recognition as an auxiliary task to improve the performance.

Automatic Speech Recognition Speech-to-Text Translation +1

Improving Cross-Lingual Reading Comprehension with Self-Training

no code implementations8 May 2021 Wei-Cheng Huang, Chien-yu Huang, Hung-Yi Lee

Substantial improvements have been made in machine reading comprehension, where the machine answers questions based on a given context.

Machine Reading Comprehension

S2VC: A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations

3 code implementations7 Apr 2021 Jheng-Hao Lin, Yist Y. Lin, Chung-Ming Chien, Hung-Yi Lee

AUTOVC used dvector to extract speaker information, and self-supervised learning (SSL) features like wav2vec 2. 0 is used in FragmentVC to extract the phonetic content information.

Self-Supervised Learning Voice Conversion

Non-autoregressive Mandarin-English Code-switching Speech Recognition

no code implementations6 Apr 2021 Shun-Po Chuang, Heng-Jui Chang, Sung-Feng Huang, Hung-Yi Lee

Mandarin-English code-switching (CS) is frequently used among East and Southeast Asian people.

Speech Recognition

Towards Lifelong Learning of End-to-end ASR

no code implementations4 Apr 2021 Heng-Jui Chang, Hung-Yi Lee, Lin-shan Lee

We can collect new data describing the new environment and fine-tune the system, but this naturally leads to higher error rates for the earlier datasets, referred to as catastrophic forgetting.

Automatic Speech Recognition

Auto-KWS 2021 Challenge: Task, Datasets, and Baselines

1 code implementation31 Mar 2021 Jingsong Wang, Yuxuan He, Chunyu Zhao, Qijie Shao, Wei-Wei Tu, Tom Ko, Hung-Yi Lee, Lei Xie

Auto-KWS 2021 challenge calls for automated machine learning (AutoML) solutions to automate the process of applying machine learning to a customized keyword spotting task.

AutoML Keyword Spotting

Is BERT a Cross-Disciplinary Knowledge Learner? A Surprising Finding of Pre-trained Models' Transferability

no code implementations12 Mar 2021 Wei-Tsung Kao, Hung-Yi Lee

This paper investigates whether the power of the models pre-trained on text data, such as BERT, can be transferred to general token sequence classification applications.

General Classification Text Classification

Investigating on Incorporating Pretrained and Learnable Speaker Representations for Multi-Speaker Multi-Style Text-to-Speech

1 code implementation6 Mar 2021 Chung-Ming Chien, Jheng-Hao Lin, Chien-yu Huang, Po-chun Hsu, Hung-Yi Lee

The few-shot multi-speaker multi-style voice cloning task is to synthesize utterances with voice and speaking style similar to a reference speaker given only a few reference samples.

Voice Conversion

Pre-Training a Language Model Without Human Language

no code implementations22 Dec 2020 Cheng-Han Chiang, Hung-Yi Lee

In this paper, we study how the intrinsic nature of pre-training data contributes to the fine-tuned downstream performance.

Language Modelling

TaylorGAN: Neighbor-Augmented Policy Update Towards Sample-Efficient Natural Language Generation

1 code implementation NeurIPS 2020 Chun-Hsing Lin, Siang-Ruei Wu, Hung-Yi Lee, Yun-Nung Chen

Score function-based natural language generation (NLG) approaches such as REINFORCE, in general, suffer from low sample efficiency and training instability problems.

Text Generation

TaylorGAN: Neighbor-Augmented Policy Update for Sample-Efficient Natural Language Generation

1 code implementation27 Nov 2020 Chun-Hsing Lin, Siang-Ruei Wu, Hung-Yi Lee, Yun-Nung Chen

Score function-based natural language generation (NLG) approaches such as REINFORCE, in general, suffer from low sample efficiency and training instability problems.

Text Generation

How Far Are We from Robust Voice Conversion: A Survey

no code implementations24 Nov 2020 Tzu-Hsien Huang, Jheng-Hao Lin, Chien-yu Huang, Hung-Yi Lee

Voice conversion technologies have been greatly improved in recent years with the help of deep learning, but their capabilities of producing natural sounding utterances in different conditions remain unclear.

Speaker Identification Voice Conversion

Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis

1 code implementation12 Nov 2020 Chung-Ming Chien, Hung-Yi Lee

Prosody modeling is an essential component in modern text-to-speech (TTS) frameworks.

Speech Synthesis

AGAIN-VC: A One-shot Voice Conversion using Activation Guidance and Adaptive Instance Normalization

1 code implementation31 Oct 2020 Yen-Hao Chen, Da-Yi Wu, Tsung-Han Wu, Hung-Yi Lee

With a proper activation as an information bottleneck on content embeddings, the trade-off between the synthesis quality and the speaker similarity of the converted speech is improved drastically.

Audio and Speech Processing Sound

Stabilizing Label Assignment for Speech Separation by Self-supervised Pre-training

1 code implementation29 Oct 2020 Sung-Feng Huang, Shun-Po Chuang, Da-Rong Liu, Yi-Chen Chen, Gene-Ping Yang, Hung-Yi Lee

Speech separation has been well developed, with the very successful permutation invariant training (PIT) approach, although the frequent label assignment switching happening during PIT training remains to be a problem when better convergence speed and achievable performance are desired.

 Ranked #1 on Speech Separation on Libri2Mix (using extra training data)

Speaker Separation Speech Enhancement +1

FragmentVC: Any-to-Any Voice Conversion by End-to-End Extracting and Fusing Fine-Grained Voice Fragments With Attention

1 code implementation27 Oct 2020 Yist Y. Lin, Chung-Ming Chien, Jheng-Hao Lin, Hung-Yi Lee, Lin-shan Lee

Any-to-any voice conversion aims to convert the voice from and to any speakers even unseen during training, which is much more challenging compared to one-to-one or many-to-many tasks, but much more attractive in real-world scenarios.

Speaker Verification Voice Conversion

Semi-Supervised Spoken Language Understanding via Self-Supervised Speech and Language Model Pretraining

1 code implementation26 Oct 2020 Cheng-I Lai, Yung-Sung Chuang, Hung-Yi Lee, Shang-Wen Li, James Glass

Much recent work on Spoken Language Understanding (SLU) is limited in at least one of three ways: models were trained on oracle text input and neglected ASR errors, models were trained to predict only intents without the slot values, or models were trained on a large amount of in-house data.

Language Modelling Spoken Language Understanding

What makes multilingual BERT multilingual?

no code implementations20 Oct 2020 Chi-Liang Liu, Tsung-Yuan Hsu, Yung-Sung Chuang, Hung-Yi Lee

Recently, multilingual BERT works remarkably well on cross-lingual transfer tasks, superior to static non-contextualized word embeddings.

Cross-Lingual Transfer Word Embeddings

Pretrained Language Model Embryology: The Birth of ALBERT

1 code implementation EMNLP 2020 Cheng-Han Chiang, Sung-Feng Huang, Hung-Yi Lee

These findings suggest that knowledge of a pretrained model varies during pretraining, and having more pretrain steps does not necessarily provide a model with more comprehensive knowledge.

Language Modelling POS

Investigation of Sentiment Controllable Chatbot

no code implementations11 Jul 2020 Hung-Yi Lee, Cheng-Hao Ho, Chien-Fu Lin, Chiung-Chih Chang, Chih-Wei Lee, Yau-Shian Wang, Tsung-Yuan Hsu, Kuan-Yu Chen

Conventional seq2seq chatbot models attempt only to find sentences with the highest probabilities conditioned on the input sequences, without considering the sentiment of the output sentences.


VQVC+: One-Shot Voice Conversion by Vector Quantization and U-Net architecture

1 code implementation7 Jun 2020 Da-Yi Wu, Yen-Hao Chen, Hung-Yi Lee

Voice conversion (VC) is a task that transforms the source speaker's timbre, accent, and tones in audio into another one's while preserving the linguistic content.

Quantization Voice Conversion

Defense for Black-box Attacks on Anti-spoofing Models by Self-Supervised Learning

4 code implementations5 Jun 2020 Haibin Wu, Andy T. Liu, Hung-Yi Lee

To explore this issue, we proposed to employ Mockingjay, a self-supervised learning based model, to protect anti-spoofing models against adversarial attacks in the black-box scenario.

Self-Supervised Learning Speaker Verification +1

Understanding Self-Attention of Self-Supervised Audio Transformers

1 code implementation5 Jun 2020 Shu-wen Yang, Andy T. Liu, Hung-Yi Lee

Self-supervised Audio Transformers (SAT) enable great success in many downstream speech applications like ASR, but how they work has not been widely explored yet.

Defending Your Voice: Adversarial Attack on Voice Conversion

1 code implementation18 May 2020 Chien-yu Huang, Yist Y. Lin, Hung-Yi Lee, Lin-shan Lee

We introduce human imperceptible noise into the utterances of a speaker whose voice is to be defended.

Adversarial Attack Voice Conversion

Semi-supervised Learning for Multi-speaker Text-to-speech Synthesis Using Discrete Speech Representation

no code implementations16 May 2020 Tao Tu, Yuan-Jui Chen, Alexander H. Liu, Hung-Yi Lee

The experiment results demonstrate that with only an hour of paired speech data, no matter the paired data is from multiple speakers or a single speaker, the proposed model can generate intelligible speech in different voices.

Speech Synthesis Text-To-Speech Synthesis

WG-WaveNet: Real-Time High-Fidelity Speech Synthesis without GPU

1 code implementation15 May 2020 Po-chun Hsu, Hung-Yi Lee

As we design a flow-based model that is heavily compressed, the proposed model requires much less computational resources compared to other waveform generation models during both training and inference time; even though the model is highly compressed, the post-filter maintains the quality of generated waveform.

Speech Synthesis Text-To-Speech Synthesis Audio and Speech Processing Sound

DARTS-ASR: Differentiable Architecture Search for Multilingual Speech Recognition and Adaptation

no code implementations13 May 2020 Yi-Chen Chen, Jui-Yang Hsu, Cheng-Kuang Lee, Hung-Yi Lee

In order to examine the generalizability of DARTS-ASR, we apply our approach not only on many languages to perform monolingual ASR, but also on a multilingual ASR setting.

Speech Recognition

End-to-end Whispered Speech Recognition with Frequency-weighted Approaches and Pseudo Whisper Pre-training

no code implementations5 May 2020 Heng-Jui Chang, Alexander H. Liu, Hung-Yi Lee, Lin-shan Lee

Whispering is an important mode of human speech, but no end-to-end recognition results for it were reported yet, probably due to the scarcity of available whispered speech data.

Speech Recognition Transfer Learning

A Study of Cross-Lingual Ability and Language-specific Information in Multilingual BERT

no code implementations20 Apr 2020 Chi-Liang Liu, Tsung-Yuan Hsu, Yung-Sung Chuang, Hung-Yi Lee

Recently, multilingual BERT works remarkably well on cross-lingual transfer tasks, superior to static non-contextualized word embeddings.

Cross-Lingual Transfer Translation +1

Defense against adversarial attacks on spoofing countermeasures of ASV

no code implementations6 Mar 2020 Haibin Wu, Songxiang Liu, Helen Meng, Hung-Yi Lee

Various forefront countermeasure methods for automatic speaker verification (ASV) with considerable performance in anti-spoofing are proposed in the ASVspoof 2019 challenge.

Speaker Verification

BERT's output layer recognizes all hidden layers? Some Intriguing Phenomena and a simple way to boost BERT

no code implementations25 Jan 2020 Wei-Tsung Kao, Tsung-Han Wu, Po-Han Chi, Chun-Cheng Hsieh, Hung-Yi Lee

Although Bidirectional Encoder Representations from Transformers (BERT) have achieved tremendous success in many natural language processing (NLP) tasks, it remains a black box.

MITAS: A Compressed Time-Domain Audio Separation Network with Parameter Sharing

no code implementations9 Dec 2019 Chao-I Tuan, Yuan-Kuei Wu, Hung-Yi Lee, Yu Tsao

Our experimental results first confirmed the robustness of our MiTAS on two types of perturbations in mixed audio.

Speech Separation

Towards Robust Neural Vocoding for Speech Generation: A Survey

no code implementations5 Dec 2019 Po-chun Hsu, Chun-hsuan Wang, Andy T. Liu, Hung-Yi Lee

We found out that the speaker variety is much more important for achieving a universal vocoder than the language.

Speech Synthesis Voice Conversion

J-Net: Randomly weighted U-Net for audio source separation

1 code implementation29 Nov 2019 Bo-Wen Chen, Yen-Min Hsu, Hung-Yi Lee

According to these discoveries, we pose two questions: what is the value of randomly weighted networks in difficult generative audio tasks such as audio source separation and does such positive correlation still exist when it comes to large random networks and their trained counterparts?

Audio Source Separation

Training a code-switching language model with monolingual data

no code implementations14 Nov 2019 Shun-Po Chuang, Tzu-Wei Sung, Hung-Yi Lee

A lack of code-switching data complicates the training of code-switching (CS) language models.

Language Modelling Translation

Towards Unsupervised Speech Recognition and Synthesis with Quantized Speech Representation Learning

no code implementations28 Oct 2019 Alexander H. Liu, Tao Tu, Hung-Yi Lee, Lin-shan Lee

In this paper we propose a Sequential Representation Quantization AutoEncoder (SeqRQ-AE) to learn from primarily unpaired audio data and produce sequences of representations very close to phoneme sequences of speech utterances.

Quantization Representation Learning +2

Sequence-to-sequence Automatic Speech Recognition with Word Embedding Regularization and Fused Decoding

1 code implementation28 Oct 2019 Alexander H. Liu, Tzu-Wei Sung, Shun-Po Chuang, Hung-Yi Lee, Lin-shan Lee

This allows the decoder to consider the semantic consistency during decoding by absorbing the information carried by the transformed decoder feature, which is learned to be close to the target word embedding.

Automatic Speech Recognition

Interrupted and cascaded permutation invariant training for speech separation

1 code implementation28 Oct 2019 Gene-Ping Yang, Szu-Lin Wu, Yao-Wen Mao, Hung-Yi Lee, Lin-shan Lee

Permutation Invariant Training (PIT) has long been a stepping stone method for training speech separation model in handling the label ambiguity problem.

Speech Separation

Meta Learning for End-to-End Low-Resource Speech Recognition

no code implementations26 Oct 2019 Jui-Yang Hsu, Yuan-Jui Chen, Hung-Yi Lee

In this paper, we proposed to apply meta learning approach for low-resource automatic speech recognition (ASR).

Automatic Speech Recognition Meta-Learning

SpeechBERT: An Audio-and-text Jointly Learned Language Model for End-to-end Spoken Question Answering

no code implementations25 Oct 2019 Yung-Sung Chuang, Chi-Liang Liu, Hung-Yi Lee, Lin-shan Lee

In addition to the potential of end-to-end SQA, the SpeechBERT can also be considered for many other spoken language understanding tasks just as BERT for many text processing tasks.

Question Answering Speech Recognition +1

Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders

6 code implementations25 Oct 2019 Andy T. Liu, Shu-wen Yang, Po-Han Chi, Po-chun Hsu, Hung-Yi Lee

We present Mockingjay as a new speech representation learning approach, where bidirectional Transformer encoders are pre-trained on a large amount of unlabeled speech.

General Classification Representation Learning +2

Adversarial Attacks on Spoofing Countermeasures of automatic speaker verification

1 code implementation19 Oct 2019 Songxiang Liu, Haibin Wu, Hung-Yi Lee, Helen Meng

High-performance spoofing countermeasure systems for automatic speaker verification (ASV) have been proposed in the ASVspoof 2019 challenge.

Speaker Verification

DyKgChat: Benchmarking Dialogue Generation Grounding on Dynamic Knowledge Graphs

1 code implementation IJCNLP 2019 Yi-Lin Tuan, Yun-Nung Chen, Hung-Yi Lee

This paper proposes a new task about how to apply dynamic knowledge graphs in neural conversation model and presents a novel TV series conversation corpus (DyKgChat) for the task.

Dialogue Generation Knowledge Graphs

Tree Transformer: Integrating Tree Structures into Self-Attention

2 code implementations IJCNLP 2019 Yau-Shian Wang, Hung-Yi Lee, Yun-Nung Chen

This paper proposes Tree Transformer, which adds an extra constraint to attention heads of the bidirectional Transformer encoder in order to encourage the attention heads to follow tree structures.

Language Modelling

Order-free Learning Alleviating Exposure Bias in Multi-label Classification

1 code implementation8 Sep 2019 Che-Ping Tsai, Hung-Yi Lee

In this paper, we propose a new framework for MLC which does not rely on a predefined label order and thus alleviates exposure bias.

General Classification Multi-Label Classification

LAMOL: LAnguage MOdeling for Lifelong Language Learning

1 code implementation ICLR 2020 Fan-Keng Sun, Cheng-Hao Ho, Hung-Yi Lee

We present LAMOL, a simple yet effective method for lifelong language learning (LLL) based on language modeling.

Language Modelling

Cross-Lingual Transfer Learning for Question Answering

no code implementations13 Jul 2019 Chia-Hsuan Lee, Hung-Yi Lee

In this paper, we explore the problem of cross-lingual transfer learning for QA, where a source language task with plentiful annotations is utilized to improve the performance of a QA model on a target language task with limited available annotations.

Cross-Lingual Transfer Machine Translation +3

Unsupervised End-to-End Learning of Discrete Linguistic Units for Voice Conversion

1 code implementation28 May 2019 Andy T. Liu, Po-chun Hsu, Hung-Yi Lee

We found that the proposed encoding method offers automatic extraction of speech content from speaker style, and is sufficient to cover full linguistic content in a given language.

Voice Conversion

Improved Speech Separation with Time-and-Frequency Cross-domain Joint Embedding and Clustering

1 code implementation16 Apr 2019 Gene-Ping Yang, Chao-I Tuan, Hung-Yi Lee, Lin-shan Lee

Substantial effort has been reported based on approaches over spectrogram, which is well known as the standard time-and-frequency cross-domain representation for speech signals.

Speech Separation

End-to-end Text-to-speech for Low-resource Languages by Cross-Lingual Transfer Learning

no code implementations13 Apr 2019 Tao Tu, Yuan-Jui Chen, Cheng-chieh Yeh, Hung-Yi Lee

In this paper, we aim to build TTS systems for such low-resource (target) languages where only very limited paired data are available.

Cross-Lingual Transfer Transfer Learning

One-shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization

9 code implementations10 Apr 2019 Ju-chieh Chou, Cheng-chieh Yeh, Hung-Yi Lee

Recently, voice conversion (VC) without parallel data has been successfully adapted to multi-target scenario in which a single model is trained to convert the input voice to many different speakers.

Voice Conversion

From Semi-supervised to Almost-unsupervised Speech Recognition with Very-low Resource by Jointly Learning Phonetic Structures from Audio and Text Embeddings

no code implementations10 Apr 2019 Yi-Chen Chen, Sung-Feng Huang, Hung-Yi Lee, Lin-shan Lee

However, we note human babies start to learn the language by the sounds (or phonetic structures) of a small number of exemplar words, and "generalize" such knowledge to other words without hearing a large amount of data.

Speech Recognition Unsupervised Speech Recognition

Completely Unsupervised Speech Recognition By A Generative Adversarial Network Harmonized With Iteratively Refined Hidden Markov Models

no code implementations8 Apr 2019 Kuan-Yu Chen, Che-Ping Tsai, Da-Rong Liu, Hung-Yi Lee, Lin-shan Lee

Producing a large annotated speech corpus for training ASR systems remains difficult for more than 95% of languages all over the world which are low-resourced, but collecting a relatively big unlabeled data set for such languages is more achievable.

Speech Recognition Unsupervised Speech Recognition

Improved Audio Embeddings by Adjacency-Based Clustering with Applications in Spoken Term Detection

no code implementations7 Nov 2018 Sung-Feng Huang, Yi-Chen Chen, Hung-Yi Lee, Lin-shan Lee

Embedding audio signal segments into vectors with fixed dimensionality is attractive because all following processing will be easier and more efficient, for example modeling, classifying or indexing.

Code-switching Sentence Generation by Generative Adversarial Networks and its Application to Data Augmentation

2 code implementations6 Nov 2018 Ching-Ting Chang, Shun-Po Chuang, Hung-Yi Lee

To mitigate the issue without expensive human annotation, we proposed an unsupervised method for code-switching data augmentation.

Data Augmentation

Adversarial Training of End-to-end Speech Recognition Using a Criticizing Language Model

no code implementations2 Nov 2018 Alexander H. Liu, Hung-Yi Lee, Lin-shan Lee

In this paper we proposed a novel Adversarial Training (AT) approach for end-to-end speech recognition using a Criticizing Language Model (CLM).

Automatic Speech Recognition

Generative Adversarial Networks for Unpaired Voice Transformation on Impaired Speech

1 code implementation30 Oct 2018 Li-Wei Chen, Hung-Yi Lee, Yu Tsao

This paper focuses on using voice conversion (VC) to improve the speech intelligibility of surgical patients who have had parts of their articulators removed.

Voice Conversion

Learning to Encode Text as Human-Readable Summaries using Generative Adversarial Networks

1 code implementation EMNLP 2018 Yau-Shian Wang, Hung-Yi Lee

The generator encodes the input text into a shorter word sequence, and the reconstructor recovers the generator input from the generator output.

Abstractive Text Summarization

TopicGAN: Unsupervised Text Generation from Explainable Latent Topics

no code implementations27 Sep 2018 Yau-Shian Wang, Yun-Nung Chen, Hung-Yi Lee

Learning discrete representations of data and then generating data from the discovered representations have been increasingly studied because the obtained discrete representations can benefit unsupervised learning.

Image Generation Text Generation

Temporal Pattern Attention for Multivariate Time Series Forecasting

5 code implementations12 Sep 2018 Shun-Yao Shih, Fan-Keng Sun, Hung-Yi Lee

To obtain accurate prediction, it is crucial to model long-term dependency in time series data, which can be achieved to some good extent by recurrent neural network (RNN) with attention mechanism.

Multivariate Time Series Forecasting Time Series +1

Proximal Policy Optimization and its Dynamic Version for Sequence Generation

no code implementations24 Aug 2018 Yi-Lin Tuan, Jinzhi Zhang, Yujia Li, Hung-Yi Lee

In sequence generation task, many works use policy gradient for model optimization to tackle the intractable backpropagation issue when maximizing the non-differentiable evaluation metrics or fooling the discriminator in adversarial learning.


Improving Conditional Sequence Generative Adversarial Networks by Stepwise Evaluation

1 code implementation16 Aug 2018 Yi-Lin Tuan, Hung-Yi Lee

To stabilize the training of SeqGAN, Monte Carlo tree search (MCTS) or reward at every generation step (REGS) is used to evaluate the goodness of a generated subsequence.

Dialogue Generation

Rhythm-Flexible Voice Conversion without Parallel Data Using Cycle-GAN over Phoneme Posteriorgram Sequences

1 code implementation9 Aug 2018 Cheng-chieh Yeh, Po-chun Hsu, Ju-chieh Chou, Hung-Yi Lee, Lin-shan Lee

In this way, the length constraint mentioned above is removed to offer rhythm-flexible voice conversion without requiring parallel data.

Sound Audio and Speech Processing

ODSQA: Open-domain Spoken Question Answering Dataset

1 code implementation7 Aug 2018 Chia-Hsuan Lee, Shang-Ming Wang, Huan-Cheng Chang, Hung-Yi Lee

Reading comprehension by machine has been widely studied, but machine comprehension of spoken content is still a less investigated problem.

Data Augmentation Question Answering +1

Segmental Audio Word2Vec: Representing Utterances as Sequences of Vectors with Applications in Spoken Term Detection

no code implementations7 Aug 2018 Yu-Hsuan Wang, Hung-Yi Lee, Lin-shan Lee

In this paper, we extend audio Word2Vec from word-level to utterance-level by proposing a new segmental audio Word2Vec, in which unsupervised spoken word boundary segmentation and audio Word2Vec are jointly learned and mutually enhanced, so an utterance can be directly represented as a sequence of vectors carrying phonetic structure information.

Noise Adaptive Speech Enhancement using Domain Adversarial Training

1 code implementation19 Jul 2018 Chien-Feng Liao, Yu Tsao, Hung-Yi Lee, Hsin-Min Wang

The proposed noise adaptive SE system contains an encoder-decoder-based enhancement model and a domain discriminator model.

Sound Audio and Speech Processing

Multi-target Voice Conversion without Parallel Data by Adversarially Learning Disentangled Audio Representations

4 code implementations9 Apr 2018 Ju-chieh Chou, Cheng-chieh Yeh, Hung-Yi Lee, Lin-shan Lee

The decoder then takes the speaker-independent latent representation and the target speaker embedding as the input to generate the voice of the target speaker with the linguistic content of the source utterance.

Voice Conversion

Scalable Sentiment for Sequence-to-sequence Chatbot Response with Performance Analysis

no code implementations7 Apr 2018 Chih-Wei Lee, Yau-Shian Wang, Tsung-Yuan Hsu, Kuan-Yu Chen, Hung-Yi Lee, Lin-shan Lee

Conventional seq2seq chatbot models only try to find the sentences with the highest probabilities conditioned on the input sequences, without considering the sentiment of the output sentences.


Joint Learning of Interactive Spoken Content Retrieval and Trainable User Simulator

no code implementations1 Apr 2018 Pei-Hung Chung, Kuan Tung, Ching-Lun Tai, Hung-Yi Lee

User-machine interaction is crucial for information retrieval, especially for spoken content retrieval, because spoken content is difficult to browse, and speech recognition has a high degree of uncertainty.

Information Retrieval Q-Learning +1

Completely Unsupervised Phoneme Recognition by Adversarially Learning Mapping Relationships from Audio Embeddings

no code implementations1 Apr 2018 Da-Rong Liu, Kuan-Yu Chen, Hung-Yi Lee, Lin-shan Lee

Unsupervised discovery of acoustic tokens from audio corpora without annotation and learning vector representations for these tokens have been widely studied.

Towards Unsupervised Automatic Speech Recognition Trained by Unaligned Speech and Text only

no code implementations29 Mar 2018 Yi-Chen Chen, Chia-Hao Shen, Sung-Feng Huang, Hung-Yi Lee

In this work, we propose a framework to achieve unsupervised ASR on a read English speech dataset, where audio and text are unaligned.

Automatic Speech Recognition

Supervised and Unsupervised Transfer Learning for Question Answering

no code implementations NAACL 2018 Yu-An Chung, Hung-Yi Lee, James Glass

Although transfer learning has been shown to be successful for tasks like object and speech recognition, its applicability to question answering (QA) has yet to be well-studied.

Question Answering Speech Recognition +1

Personalized word representations Carrying Personalized Semantics Learned from Social Network Posts

no code implementations29 Oct 2017 Zih-Wei Lin, Tzu-Wei Sung, Hung-Yi Lee, Lin-shan Lee

In this framework, universal background word vectors are first learned from the background corpora, and then adapted by the personalized corpus for each individual user to learn the personalized word vectors.

Sentence Completion

Mitigating the Impact of Speech Recognition Errors on Chatbot using Sequence-to-Sequence Model

no code implementations22 Sep 2017 Pin-Jung Chen, I-Hung Hsu, Yi-Yao Huang, Hung-Yi Lee

We apply sequence-to-sequence model to mitigate the impact of speech recognition errors on open domain end-to-end dialog generation.

Chatbot Domain Adaptation +1

Order-Preserving Abstractive Summarization for Spoken Content Based on Connectionist Temporal Classification

no code implementations16 Sep 2017 Bo-Ru Lu, Frank Shyu, Yun-Nung Chen, Hung-Yi Lee, Lin-shan Lee

Connectionist temporal classification (CTC) is a powerful approach for sequence-to-sequence learning, and has been popularly used in speech recognition.

Abstractive Text Summarization General Classification +1

Query-based Attention CNN for Text Similarity Map

2 code implementations15 Sep 2017 Tzu-Chien Liu, Yu-Hsueh Wu, Hung-Yi Lee

This network is composed of compare mechanism, two-staged CNN architecture with attention mechanism, and a prediction layer.

Question Answering

Query-by-example Spoken Term Detection using Attention-based Multi-hop Networks

no code implementations1 Sep 2017 Chia-Wei Ao, Hung-Yi Lee

Retrieving spoken content with spoken queries, or query-by- example spoken term detection (STD), is attractive because it makes possible the matching of signals directly on the acoustic level without transcribing them into text.

Learning Chinese Word Representations From Glyphs Of Characters

1 code implementation EMNLP 2017 Tzu-Ray Su, Hung-Yi Lee

The character glyph features are directly learned from the bitmaps of characters by convolutional auto-encoder(convAE), and the glyph features improve Chinese word representations which are already enhanced by character embeddings.

Language Transfer of Audio Word2Vec: Learning Audio Segment Representations without Target Language Data

no code implementations19 Jul 2017 Chia-Hao Shen, Janet Y. Sung, Hung-Yi Lee

We train SA from one language (source language) and use it to extract the vector representation of the audio segments of another language (target language).

Gate Activation Signal Analysis for Gated Recurrent Neural Networks and Its Correlation with Phoneme Boundaries

1 code implementation22 Mar 2017 Yu-Hsuan Wang, Cheng-Tao Chung, Hung-Yi Lee

In this paper we analyze the gate activation signals inside the gated recurrent neural networks, and find the temporal structure of such signals is highly correlated with the phoneme boundaries.

Abstractive Headline Generation for Spoken Content by Attentive Recurrent Neural Networks with ASR Error Modeling

no code implementations26 Dec 2016 Lang-Chi Yu, Hung-Yi Lee, Lin-shan Lee

In this way, the model for abstractive headline generation for spoken content can be learned from abundant text data and the ASR data for some recognizers.

Abstractive Text Summarization Document Summarization

Interactive Spoken Content Retrieval by Deep Reinforcement Learning

no code implementations16 Sep 2016 Yen-chen Wu, Tzu-Hsiang Lin, Yang-De Chen, Hung-Yi Lee, Lin-shan Lee

In our previous work, some hand-crafted states estimated from the present retrieval results are used to determine the proper actions.

Q-Learning Speech Recognition

Hierarchical Attention Model for Improved Machine Comprehension of Spoken Content

no code implementations28 Aug 2016 Wei Fang, Jui-Yang Hsu, Hung-Yi Lee, Lin-shan Lee

Multimedia or spoken content presents more attractive information than plain text content, but the former is more difficult to display on a screen and be selected by a user.

Reading Comprehension

Towards Machine Comprehension of Spoken Content: Initial TOEFL Listening Comprehension Test by Machine

no code implementations23 Aug 2016 Bo-Hsiang Tseng, Sheng-syun Shen, Hung-Yi Lee, Lin-shan Lee

Multimedia or spoken content presents more attractive information than plain text content, but it's more difficult to display on a screen and be selected by a user.

Reading Comprehension

Audio Word2Vec: Unsupervised Learning of Audio Segment Representations using Sequence-to-sequence Autoencoder

no code implementations3 Mar 2016 Yu-An Chung, Chao-Chung Wu, Chia-Hao Shen, Hung-Yi Lee, Lin-shan Lee

The vector representations of fixed dimensionality for words (in text) offered by Word2Vec have been shown to be very useful in many application scenarios, in particular due to the semantic information they carry.

Denoising Dynamic Time Warping

An Iterative Deep Learning Framework for Unsupervised Discovery of Speech Features and Linguistic Units with Applications on Spoken Term Detection

no code implementations1 Feb 2016 Cheng-Tao Chung, Cheng-Yu Tsai, Hsiang-Hung Lu, Chia-Hsiang Liu, Hung-Yi Lee, Lin-shan Lee

The multiple sets of token labels are then used as the targets of a Multi-target Deep Neural Network (MDNN) trained on low-level acoustic features.

Towards Structured Deep Neural Network for Automatic Speech Recognition

no code implementations8 Nov 2015 Yi-Hsiu Liao, Hung-Yi Lee, Lin-shan Lee

In this paper we propose the Structured Deep Neural Network (structured DNN) as a structured and deep learning framework.

Automatic Speech Recognition

Towards Structured Deep Neural Network for Automatic Speech Recognition

no code implementations3 Jun 2015 Yi-Hsiu Liao, Hung-Yi Lee, Lin-shan Lee

In this paper we propose the Structured Deep Neural Network (Structured DNN) as a structured and deep learning algorithm, learning to find the best structured object (such as a label sequence) given a structured input (such as a vector sequence) by globally considering the mapping relationships between the structure rather than item by item.

Automatic Speech Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.