Search Results for author: Hung-Yi Lee

Found 229 papers, 91 papers with code

EURO: ESPnet Unsupervised ASR Open-source Toolkit

1 code implementation • 30 Nov 2022 • Dongji Gao, Jiatong Shi, Shun-Po Chuang, Leibny Paola Garcia, Hung-Yi Lee, Shinji Watanabe, Sanjeev Khudanpur

This paper describes the ESPnet Unsupervised ASR Open-source Toolkit (EURO), an end-to-end open-source toolkit for unsupervised automatic speech recognition (UASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

7,858

Paper
Code

Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders

7 code implementations • 25 Oct 2019 • Andy T. Liu, Shu-wen Yang, Po-Han Chi, Po-chun Hsu, Hung-Yi Lee

We present Mockingjay as a new speech representation learning approach, where bidirectional Transformer encoders are pre-trained on a large amount of unlabeled speech.

General Classification Representation Learning +3

2,084

Paper
Code

Audio ALBERT: A Lite BERT for Self-supervised Learning of Audio Representation

3 code implementations • 18 May 2020 • Po-Han Chi, Pei-Hung Chung, Tsung-Han Wu, Chun-Cheng Hsieh, Yen-Hao Chen, Shang-Wen Li, Hung-Yi Lee

We use the representations with two downstream tasks, speaker identification, and phoneme classification.

Self-Supervised Learning Speaker Identification

2,084

Paper
Code

Defense for Black-box Attacks on Anti-spoofing Models by Self-Supervised Learning

5 code implementations • 5 Jun 2020 • Haibin Wu, Andy T. Liu, Hung-Yi Lee

To explore this issue, we proposed to employ Mockingjay, a self-supervised learning based model, to protect anti-spoofing models against adversarial attacks in the black-box scenario.

Self-Supervised Learning Speaker Verification +1

2,084

Paper
Code

TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech

6 code implementations • 12 Jul 2020 • Andy T. Liu, Shang-Wen Li, Hung-Yi Lee

We present a large-scale comparison of various self-supervised models.

Keyword Spotting Self-Supervised Learning +3

2,084

Paper
Code

Utilizing Self-supervised Representations for MOS Prediction

6 code implementations • 7 Apr 2021 • Wei-Cheng Tseng, Chien-yu Huang, Wei-Tsung Kao, Yist Y. Lin, Hung-Yi Lee

In this paper, we use self-supervised pre-trained models for MOS prediction.

Voice Conversion

2,084

Paper
Code

S2VC: A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations

3 code implementations • 7 Apr 2021 • Jheng-Hao Lin, Yist Y. Lin, Chung-Ming Chien, Hung-Yi Lee

AUTOVC used dvector to extract speaker information, and self-supervised learning (SSL) features like wav2vec 2. 0 is used in FragmentVC to extract the phonetic content information.

Self-Supervised Learning Voice Conversion

2,084

Paper
Code

DistilHuBERT: Speech Representation Learning by Layer-wise Distillation of Hidden-unit BERT

1 code implementation • 5 Oct 2021 • Heng-Jui Chang, Shu-wen Yang, Hung-Yi Lee

Self-supervised speech representation learning methods like wav2vec 2. 0 and Hidden-unit BERT (HuBERT) leverage unlabeled speech data for pre-training and offer good representations for numerous speech processing tasks.

Multi-Task Learning Representation Learning

2,084

Paper
Code

SUPERB: Speech processing Universal PERformance Benchmark

5 code implementations • 3 May 2021 • Shu-wen Yang, Po-Han Chi, Yung-Sung Chuang, Cheng-I Jeff Lai, Kushal Lakhotia, Yist Y. Lin, Andy T. Liu, Jiatong Shi, Xuankai Chang, Guan-Ting Lin, Tzu-Hsien Huang, Wei-Cheng Tseng, Ko-tik Lee, Da-Rong Liu, Zili Huang, Shuyan Dong, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-Yi Lee

SUPERB is a leaderboard to benchmark the performance of a shared model across a wide range of speech processing tasks with minimal architecture changes and labeled data.

Representation Learning Self-Supervised Learning

2,083

Paper
Code

S3PRL-VC: Open-source Voice Conversion Framework with Self-supervised Speech Representations

2 code implementations • 12 Oct 2021 • Wen-Chin Huang, Shu-wen Yang, Tomoki Hayashi, Hung-Yi Lee, Shinji Watanabe, Tomoki Toda

In this work, we provide a series of in-depth analyses by benchmarking on the two tasks in VCC2020, namely intra-/cross-lingual any-to-one (A2O) VC, as well as an any-to-any (A2A) setting.

Benchmarking Voice Conversion

2,083

Paper
Code

SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities

1 code implementation • ACL 2022 • Hsiang-Sheng Tsai, Heng-Jui Chang, Wen-Chin Huang, Zili Huang, Kushal Lakhotia, Shu-wen Yang, Shuyan Dong, Andy T. Liu, Cheng-I Jeff Lai, Jiatong Shi, Xuankai Chang, Phil Hall, Hsuan-Jui Chen, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-Yi Lee

In this paper, we introduce SUPERB-SG, a new benchmark focused on evaluating the semantic and generative capabilities of pre-trained models by increasing task diversity and difficulty over SUPERB.

Self-Supervised Learning Transfer Learning

2,083

Paper
Code

Self-supervised Representation Learning for Speech Processing

1 code implementation • NAACL (ACL) 2022 • Hung-Yi Lee, Abdelrahman Mohamed, Shinji Watanabe, Tara Sainath, Karen Livescu, Shang-Wen Li, Shu-wen Yang, Katrin Kirchhoff

Due to the growing popularity of SSL, and the shared mission of the areas in bringing speech and language technologies to more use cases with better quality and scaling the technologies for under-represented languages, we propose this tutorial to systematically survey the latest SSL techniques, tools, datasets, and performance achievement in speech processing.

Representation Learning

2,083

Paper
Code

Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis

1 code implementation • 12 Nov 2020 • Chung-Ming Chien, Hung-Yi Lee

Prosody modeling is an essential component in modern text-to-speech (TTS) frameworks.

Speech Synthesis

1,604

Paper
Code

Investigating on Incorporating Pretrained and Learnable Speaker Representations for Multi-Speaker Multi-Style Text-to-Speech

1 code implementation • 6 Mar 2021 • Chung-Ming Chien, Jheng-Hao Lin, Chien-yu Huang, Po-chun Hsu, Hung-Yi Lee

The few-shot multi-speaker multi-style voice cloning task is to synthesize utterances with voice and speaking style similar to a reference speaker given only a few reference samples.

Voice Cloning Voice Conversion

1,604

Paper
Code

Sequence-to-sequence Automatic Speech Recognition with Word Embedding Regularization and Fused Decoding

1 code implementation • 28 Oct 2019 • Alexander H. Liu, Tzu-Wei Sung, Shun-Po Chuang, Hung-Yi Lee, Lin-shan Lee

This allows the decoder to consider the semantic consistency during decoding by absorbing the information carried by the transformed decoder feature, which is learned to be close to the target word embedding.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

1,157

Paper
Code

Temporal Pattern Attention for Multivariate Time Series Forecasting

4 code implementations • 12 Sep 2018 • Shun-Yao Shih, Fan-Keng Sun, Hung-Yi Lee

To obtain accurate prediction, it is crucial to model long-term dependency in time series data, which can be achieved to some good extent by recurrent neural network (RNN) with attention mechanism.

Ranked #2 on Univariate Time Series Forecasting on Electricity

Multivariate Time Series Forecasting Time Series +1

651

Paper
Code

One-shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization

11 code implementations • 10 Apr 2019 • Ju-chieh Chou, Cheng-chieh Yeh, Hung-Yi Lee

Recently, voice conversion (VC) without parallel data has been successfully adapted to multi-target scenario in which a single model is trained to convert the input voice to many different speakers.

Voice Conversion

499

Paper
Code

Multi-target Voice Conversion without Parallel Data by Adversarially Learning Disentangled Audio Representations

4 code implementations • 9 Apr 2018 • Ju-chieh Chou, Cheng-chieh Yeh, Hung-Yi Lee, Lin-shan Lee

The decoder then takes the speaker-independent latent representation and the target speaker embedding as the input to generate the voice of the target speaker with the linguistic content of the source utterance.

Voice Conversion

245

Paper
Code

Tree Transformer: Integrating Tree Structures into Self-Attention

3 code implementations • IJCNLP 2019 • Yau-Shian Wang, Hung-Yi Lee, Yun-Nung Chen

This paper proposes Tree Transformer, which adds an extra constraint to attention heads of the bidirectional Transformer encoder in order to encourage the attention heads to follow tree structures.

Language Modelling

206

Paper
Code

FragmentVC: Any-to-Any Voice Conversion by End-to-End Extracting and Fusing Fine-Grained Voice Fragments With Attention

2 code implementations • 27 Oct 2020 • Yist Y. Lin, Chung-Ming Chien, Jheng-Hao Lin, Hung-Yi Lee, Lin-shan Lee

Any-to-any voice conversion aims to convert the voice from and to any speakers even unseen during training, which is much more challenging compared to one-to-one or many-to-many tasks, but much more attractive in real-world scenarios.

Disentanglement Speaker Verification +1

194

Paper
Code

Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech

1 code implementation • 7 Nov 2021 • Sung-Feng Huang, Chyi-Jiunn Lin, Da-Rong Liu, Yi-Chen Chen, Hung-Yi Lee

On the one hand, speaker adaptation methods fine-tune a trained multi-speaker text-to-speech (TTS) model with few enrolled samples.

Meta-Learning Speech Synthesis

182

Paper
Code

Codec-SUPERB: An In-Depth Analysis of Sound Codec Models

1 code implementation • 20 Feb 2024 • Haibin Wu, Ho-Lam Chung, Yi-Cheng Lin, Yuan-Kuei Wu, Xuanjun Chen, Yu-Chi Pai, Hsiu-Hsuan Wang, Kai-Wei Chang, Alexander H. Liu, Hung-Yi Lee

The sound codec's dual roles in minimizing data transmission latency and serving as tokenizers underscore its critical importance.

138

Paper
Code

Unsupervised End-to-End Learning of Discrete Linguistic Units for Voice Conversion

1 code implementation • 28 May 2019 • Andy T. Liu, Po-chun Hsu, Hung-Yi Lee

We found that the proposed encoding method offers automatic extraction of speech content from speaker style, and is sufficient to cover full linguistic content in a given language.

Voice Conversion

110

Paper
Code

AGAIN-VC: A One-shot Voice Conversion using Activation Guidance and Adaptive Instance Normalization

1 code implementation • 31 Oct 2020 • Yen-Hao Chen, Da-Yi Wu, Tsung-Han Wu, Hung-Yi Lee

With a proper activation as an information bottleneck on content embeddings, the trade-off between the synthesis quality and the speaker similarity of the converted speech is improved drastically.

Audio and Speech Processing Sound

104

Paper
Code

SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model

1 code implementation • 3 Oct 2022 • Yi-Jen Shih, Hsuan-Fu Wang, Heng-Jui Chang, Layne Berry, Hung-Yi Lee, David Harwath

Data-driven speech processing models usually perform well with a large amount of text supervision, but collecting transcribed speech data is costly.

Language Modelling Retrieval +1

104

Paper
Code

SpeechPrompt: An Exploration of Prompt Tuning on Generative Spoken Language Model for Speech Processing Tasks

1 code implementation • 31 Mar 2022 • Kai-Wei Chang, Wei-Cheng Tseng, Shang-Wen Li, Hung-Yi Lee

We report in this paper the first exploration of the prompt tuning paradigm for speech processing tasks based on Generative Spoken Language Model (GSLM).

Language Modelling Self-Supervised Learning

Paper
Code

Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech

1 code implementation • 18 Sep 2023 • Chien-yu Huang, Ke-Han Lu, Shih-Heng Wang, Chi-Yuan Hsiao, Chun-Yi Kuan, Haibin Wu, Siddhant Arora, Kai-Wei Chang, Jiatong Shi, Yifan Peng, Roshan Sharma, Shinji Watanabe, Bhiksha Ramakrishnan, Shady Shehata, Hung-Yi Lee

To achieve comprehensive coverage of diverse speech tasks and harness instruction tuning, we invite the community to collaborate and contribute, facilitating the dynamic growth of the benchmark.

Paper
Code

LAMOL: LAnguage MOdeling for Lifelong Language Learning

1 code implementation • ICLR 2020 • Fan-Keng Sun, Cheng-Hao Ho, Hung-Yi Lee

We present LAMOL, a simple yet effective method for lifelong language learning (LLL) based on language modeling.

Ranked #4 on Continual Learning on ASC (19 tasks)

Continual Learning Language Modelling

Paper
Code

Adversarial Sample Detection for Speaker Verification by Neural Vocoders

1 code implementation • 1 Jul 2021 • Haibin Wu, Po-chun Hsu, Ji Gao, Shanshan Zhang, Shen Huang, Jian Kang, Zhiyong Wu, Helen Meng, Hung-Yi Lee

We also show that the neural vocoder adopted in the detection framework is dataset-independent.

Speaker Verification

Paper
Code

VQVC+: One-Shot Voice Conversion by Vector Quantization and U-Net architecture

1 code implementation • 7 Jun 2020 • Da-Yi Wu, Yen-Hao Chen, Hung-Yi Lee

Voice conversion (VC) is a task that transforms the source speaker's timbre, accent, and tones in audio into another one's while preserving the linguistic content.

Disentanglement Quantization +1

Paper
Code

WG-WaveNet: Real-Time High-Fidelity Speech Synthesis without GPU

1 code implementation • 15 May 2020 • Po-chun Hsu, Hung-Yi Lee

As we design a flow-based model that is heavily compressed, the proposed model requires much less computational resources compared to other waveform generation models during both training and inference time; even though the model is highly compressed, the post-filter maintains the quality of generated waveform.

Speech Synthesis Text-To-Speech Synthesis Audio and Speech Processing Sound

Paper
Code

Learning to Encode Text as Human-Readable Summaries using Generative Adversarial Networks

1 code implementation • EMNLP 2018 • Yau-Shian Wang, Hung-Yi Lee

The generator encodes the input text into a shorter word sequence, and the reconstructor recovers the generator input from the generator output.

Abstractive Text Summarization

Paper
Code

Stabilizing Label Assignment for Speech Separation by Self-supervised Pre-training

1 code implementation • 29 Oct 2020 • Sung-Feng Huang, Shun-Po Chuang, Da-Rong Liu, Yi-Chen Chen, Gene-Ping Yang, Hung-Yi Lee

Speech separation has been well developed, with the very successful permutation invariant training (PIT) approach, although the frequent label assignment switching happening during PIT training remains to be a problem when better convergence speed and achievable performance are desired.

Ranked #6 on Speech Separation on Libri2Mix (using extra training data)

Speaker Separation Speech Enhancement +1

Paper
Code

ODSQA: Open-domain Spoken Question Answering Dataset

1 code implementation • 7 Aug 2018 • Chia-Hsuan Lee, Shang-Ming Wang, Huan-Cheng Chang, Hung-Yi Lee

Reading comprehension by machine has been widely studied, but machine comprehension of spoken content is still a less investigated problem.

Data Augmentation Question Answering +1

Paper
Code

Multimodal Transformer Distillation for Audio-Visual Synchronization

2 code implementations • 27 Oct 2022 • Xuanjun Chen, Haibin Wu, Chung-Che Wang, Hung-Yi Lee, Jyh-Shing Roger Jang

This paper proposed an MTDVocaLiST model, which is trained by our proposed multimodal Transformer distillation (MTD) loss.

Audio-Visual Synchronization

Paper
Code

MelHuBERT: A simplified HuBERT on Mel spectrograms

1 code implementation • 17 Nov 2022 • Tzu-Quan Lin, Hung-Yi Lee, Hao Tang

Self-supervised models have had great success in learning speech representations that can generalize to various downstream tasks.

Automatic Speech Recognition Self-Supervised Learning +3

Paper
Code

DyKgChat: Benchmarking Dialogue Generation Grounding on Dynamic Knowledge Graphs

1 code implementation • IJCNLP 2019 • Yi-Lin Tuan, Yun-Nung Chen, Hung-Yi Lee

This paper proposes a new task about how to apply dynamic knowledge graphs in neural conversation model and presents a novel TV series conversation corpus (DyKgChat) for the task.

Benchmarking Dialogue Generation +1

Paper
Code

Adversarial Attacks on Spoofing Countermeasures of automatic speaker verification

1 code implementation • 19 Oct 2019 • Songxiang Liu, Haibin Wu, Hung-Yi Lee, Helen Meng

High-performance spoofing countermeasure systems for automatic speaker verification (ASV) have been proposed in the ASVspoof 2019 challenge.

Speaker Verification

Paper
Code

Spoken SQuAD: A Study of Mitigating the Impact of Speech Recognition Errors on Listening Comprehension

1 code implementation • 1 Apr 2018 • Chia-Hsuan Li, Szu-Lin Wu, Chi-Liang Liu, Hung-Yi Lee

Reading comprehension has been widely studied.

Ranked #4 on Spoken Language Understanding on Spoken-SQuAD

Question Answering Reading Comprehension +3

Paper
Code

Mitigating the Impact of Speech Recognition Errors on Spoken Question Answering by Adversarial Domain Adaptation

1 code implementation • 16 Apr 2019 • Chia-Hsuan Lee, Yun-Nung Chen, Hung-Yi Lee

Spoken question answering (SQA) is challenging due to complex reasoning on top of the spoken documents.

Ranked #3 on Spoken Language Understanding on Spoken-SQuAD

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Code

Defending Your Voice: Adversarial Attack on Voice Conversion

1 code implementation • 18 May 2020 • Chien-yu Huang, Yist Y. Lin, Hung-Yi Lee, Lin-shan Lee

We introduce human imperceptible noise into the utterances of a speaker whose voice is to be defended.

Adversarial Attack Voice Conversion

Paper
Code

Improving Conditional Sequence Generative Adversarial Networks by Stepwise Evaluation

1 code implementation • 16 Aug 2018 • Yi-Lin Tuan, Hung-Yi Lee

To stabilize the training of SeqGAN, Monte Carlo tree search (MCTS) or reward at every generation step (REGS) is used to evaluate the goodness of a generated subsequence.

Dialogue Generation

Paper
Code

DUAL: Discrete Spoken Unit Adaptive Learning for Textless Spoken Question Answering

1 code implementation • 9 Mar 2022 • Guan-Ting Lin, Yung-Sung Chuang, Ho-Lam Chung, Shu-wen Yang, Hsuan-Jui Chen, Shuyan Dong, Shang-Wen Li, Abdelrahman Mohamed, Hung-Yi Lee, Lin-shan Lee

We empirically showed that DUAL yields results comparable to those obtained by cascading ASR and text QA model and robust to real-world data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models

1 code implementation • 19 Sep 2023 • Yuan Tseng, Layne Berry, Yi-Ting Chen, I-Hsiang Chiu, Hsuan-Hao Lin, Max Liu, Puyuan Peng, Yi-Jen Shih, Hung-Yu Wang, Haibin Wu, Po-Yao Huang, Chun-Mao Lai, Shang-Wen Li, David Harwath, Yu Tsao, Shinji Watanabe, Abdelrahman Mohamed, Chi-Luen Feng, Hung-Yi Lee

Audio-visual representation learning aims to develop systems with human-like perception by utilizing correlation between auditory and visual information.

audio-visual learning Representation Learning

Paper
Code

Query-based Attention CNN for Text Similarity Map

2 code implementations • 15 Sep 2017 • Tzu-Chien Liu, Yu-Hsueh Wu, Hung-Yi Lee

This network is composed of compare mechanism, two-staged CNN architecture with attention mechanism, and a prediction layer.

Question Answering Sentence +1

Paper
Code

Learning Chinese Word Representations From Glyphs Of Characters

1 code implementation • EMNLP 2017 • Tzu-Ray Su, Hung-Yi Lee

The character glyph features are directly learned from the bitmaps of characters by convolutional auto-encoder(convAE), and the glyph features improve Chinese word representations which are already enhanced by character embeddings.

Paper
Code

TaylorGAN: Neighbor-Augmented Policy Update Towards Sample-Efficient Natural Language Generation

1 code implementation • NeurIPS 2020 • Chun-Hsing Lin, Siang-Ruei Wu, Hung-Yi Lee, Yun-Nung Chen

Score function-based natural language generation (NLG) approaches such as REINFORCE, in general, suffer from low sample efficiency and training instability problems.

Text Generation

Paper
Code

TaylorGAN: Neighbor-Augmented Policy Update for Sample-Efficient Natural Language Generation

1 code implementation • 27 Nov 2020 • Chun-Hsing Lin, Siang-Ruei Wu, Hung-Yi Lee, Yun-Nung Chen

Score function-based natural language generation (NLG) approaches such as REINFORCE, in general, suffer from low sample efficiency and training instability problems.

Text Generation

Paper
Code

On the Utility of Self-supervised Models for Prosody-related Tasks

1 code implementation • 13 Oct 2022 • Guan-Ting Lin, Chi-Luen Feng, Wei-Ping Huang, Yuan Tseng, Tzu-Han Lin, Chen-An Li, Hung-Yi Lee, Nigel G. Ward

We find that 13 of the 15 SSL models outperformed the baseline on all the prosody-related tasks.

Prosody Prediction Self-Supervised Learning

Paper
Code

Listen, Adapt, Better WER: Source-free Single-utterance Test-time Adaptation for Automatic Speech Recognition

2 code implementations • 27 Mar 2022 • Guan-Ting Lin, Shang-Wen Li, Hung-Yi Lee

Although deep learning-based end-to-end Automatic Speech Recognition (ASR) has shown remarkable performance in recent years, it suffers severe performance regression on test samples drawn from different data distributions.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

Improved Speech Separation with Time-and-Frequency Cross-domain Joint Embedding and Clustering

1 code implementation • 16 Apr 2019 • Gene-Ping Yang, Chao-I Tuan, Hung-Yi Lee, Lin-shan Lee

Substantial effort has been reported based on approaches over spectrogram, which is well known as the standard time-and-frequency cross-domain representation for speech signals.

Ranked #24 on Speech Separation on WSJ0-2mix

Clustering Speech Separation

Paper
Code

Interrupted and cascaded permutation invariant training for speech separation

1 code implementation • 28 Oct 2019 • Gene-Ping Yang, Szu-Lin Wu, Yao-Wen Mao, Hung-Yi Lee, Lin-shan Lee

Permutation Invariant Training (PIT) has long been a stepping stone method for training speech separation model in handling the label ambiguity problem.

Ranked #22 on Speech Separation on WSJ0-2mix

Speech Separation

Paper
Code

Noise Adaptive Speech Enhancement using Domain Adversarial Training

1 code implementation • 19 Jul 2018 • Chien-Feng Liao, Yu Tsao, Hung-Yi Lee, Hsin-Min Wang

The proposed noise adaptive SE system contains an encoder-decoder-based enhancement model and a domain discriminator model.

Sound Audio and Speech Processing

Paper
Code

Exploring Continuous Integrate-and-Fire for Adaptive Simultaneous Speech Translation

1 code implementation • 22 Mar 2022 • Chih-Chiang Chang, Hung-Yi Lee

Simultaneous speech translation (SimulST) is a challenging task aiming to translate streaming speech before the complete input is observed.

Translation

Paper
Code

Generative Adversarial Networks for Unpaired Voice Transformation on Impaired Speech

2 code implementations • 30 Oct 2018 • Li-Wei Chen, Hung-Yi Lee, Yu Tsao

This paper focuses on using voice conversion (VC) to improve the speech intelligibility of surgical patients who have had parts of their articulators removed.

Speech Recognition Voice Conversion

Paper
Code

Order-free Learning Alleviating Exposure Bias in Multi-label Classification

1 code implementation • 8 Sep 2019 • Che-Ping Tsai, Hung-Yi Lee

In this paper, we propose a new framework for MLC which does not rely on a predefined label order and thus alleviates exposure bias.

General Classification Multi-Label Classification

Paper
Code

SpeechNet: A Universal Modularized Model for Speech Processing Tasks

1 code implementation • 7 May 2021 • Yi-Chen Chen, Po-Han Chi, Shu-wen Yang, Kai-Wei Chang, Jheng-Hao Lin, Sung-Feng Huang, Da-Rong Liu, Chi-Liang Liu, Cheng-Kuang Lee, Hung-Yi Lee

The multi-task learning of a wide variety of speech processing tasks with a universal model has not been studied.

Multi-Task Learning

Paper
Code

T5lephone: Bridging Speech and Text Self-supervised Models for Spoken Language Understanding via Phoneme level T5

1 code implementation • 1 Nov 2022 • Chan-Jan Hsu, Ho-Lam Chung, Hung-Yi Lee, Yu Tsao

In Spoken language understanding (SLU), a natural solution is concatenating pre-trained speech models (e. g. HuBERT) and pretrained language models (PLM, e. g. T5).

Language Modelling Question Answering +1

Paper
Code

AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks

1 code implementation • Findings (NAACL) 2022 • Chin-Lun Fu, Zih-Ching Chen, Yun-Ru Lee, Hung-Yi Lee

Transformer-based pre-trained models with millions of parameters require large storage.

Paper
Code

EMO-SUPERB: An In-depth Look at Speech Emotion Recognition

1 code implementation • 20 Feb 2024 • Haibin Wu, Huang-Cheng Chou, Kai-Wei Chang, Lucas Goncalves, Jiawei Du, Jyh-Shing Roger Jang, Chi-Chun Lee, Hung-Yi Lee

Speech emotion recognition (SER) is a pivotal technology for human-computer interaction systems.

Self-Supervised Learning Speech Emotion Recognition

Paper
Code

A Closer Look into Automatic Evaluation Using Large Language Models

1 code implementation • 9 Oct 2023 • Cheng-Han Chiang, Hung-Yi Lee

In this paper, we analyze LLM evaluation (Chiang and Lee, 2023) and G-Eval (Liu et al., 2023), and we discuss how those details in the evaluation process change how well the ratings given by LLMs correlate with human ratings.

Paper
Code

Auto-KWS 2021 Challenge: Task, Datasets, and Baselines

1 code implementation • 31 Mar 2021 • Jingsong Wang, Yuxuan He, Chunyu Zhao, Qijie Shao, Wei-Wei Tu, Tom Ko, Hung-Yi Lee, Lei Xie

Auto-KWS 2021 challenge calls for automated machine learning (AutoML) solutions to automate the process of applying machine learning to a customized keyword spotting task.

AutoML BIG-bench Machine Learning +1

Paper
Code

Pretrained Language Model Embryology: The Birth of ALBERT

1 code implementation • EMNLP 2020 • Cheng-Han Chiang, Sung-Feng Huang, Hung-Yi Lee

These findings suggest that knowledge of a pretrained model varies during pretraining, and having more pretrain steps does not necessarily provide a model with more comprehensive knowledge.

Language Modelling POS +1

Paper
Code

Semi-Supervised Spoken Language Understanding via Self-Supervised Speech and Language Model Pretraining

1 code implementation • 26 Oct 2020 • Cheng-I Lai, Yung-Sung Chuang, Hung-Yi Lee, Shang-Wen Li, James Glass

Much recent work on Spoken Language Understanding (SLU) is limited in at least one of three ways: models were trained on oracle text input and neglected ASR errors, models were trained to predict only intents without the slot values, or models were trained on a large amount of in-house data.

Language Modelling Spoken Language Understanding

Paper
Code

The Efficacy of Self-Supervised Speech Models for Audio Representations

1 code implementation • 26 Sep 2022 • Tung-Yu Wu, Chen-An Li, Tzu-Han Lin, Tsu-Yuan Hsu, Hung-Yi Lee

Extensive experiments on speech and non-speech audio datasets are conducted to investigate the representation abilities of our ensemble method and its single constituent model.

Pitch Classification Representation Learning +1

Paper
Code

J-Net: Randomly weighted U-Net for audio source separation

1 code implementation • 29 Nov 2019 • Bo-Wen Chen, Yen-Min Hsu, Hung-Yi Lee

According to these discoveries, we pose two questions: what is the value of randomly weighted networks in difficult generative audio tasks such as audio source separation and does such positive correlation still exist when it comes to large random networks and their trained counterparts?

Audio Source Separation

Paper
Code

Investigating the Reordering Capability in CTC-based Non-Autoregressive End-to-End Speech Translation

1 code implementation • Findings (ACL) 2021 • Shun-Po Chuang, Yung-Sung Chuang, Chih-Chiang Chang, Hung-Yi Lee

We study the possibilities of building a non-autoregressive speech-to-text translation model using connectionist temporal classification (CTC), and use CTC-based automatic speech recognition as an auxiliary task to improve the performance.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Code

Membership Inference Attacks Against Self-supervised Speech Models

1 code implementation • 9 Nov 2021 • Wei-Cheng Tseng, Wei-Tsung Kao, Hung-Yi Lee

Recently, adapting the idea of self-supervised learning (SSL) on continuous speech has started gaining attention.

Self-Supervised Learning

Paper
Code

Anticipation-Free Training for Simultaneous Machine Translation

1 code implementation • IWSLT (ACL) 2022 • Chih-Chiang Chang, Shun-Po Chuang, Hung-Yi Lee

Existing methods increase latency or introduce adaptive read-write policies for SimulMT models to handle local reordering and improve translation quality.

Hallucination Machine Translation +2

Paper
Code

Audio Word2Vec: Unsupervised Learning of Audio Segment Representations using Sequence-to-sequence Autoencoder

1 code implementation • 3 Mar 2016 • Yu-An Chung, Chao-Chung Wu, Chia-Hao Shen, Hung-Yi Lee, Lin-shan Lee

The vector representations of fixed dimensionality for words (in text) offered by Word2Vec have been shown to be very useful in many application scenarios, in particular due to the semantic information they carry.

Denoising Dynamic Time Warping

Paper
Code

Voting for the right answer: Adversarial defense for speaker verification

1 code implementation • 15 Jun 2021 • Haibin Wu, Yang Zhang, Zhiyong Wu, Dong Wang, Hung-Yi Lee

Automatic speaker verification (ASV) is a well developed technology for biometric identification, and has been ubiquitous implemented in security-critic applications, such as banking and access control.

Adversarial Defense Speaker Verification

Paper
Code

Unsupervised Multiple Choices Question Answering: Start Learning from Basic Knowledge

3 code implementations • EMNLP (MRQA) 2021 • Chi-Liang Liu, Hung-Yi Lee

In this paper, we study the possibility of almost unsupervised Multiple Choices Question Answering (MCQA).

Question Answering

Paper
Code

Universal Adaptor: Converting Mel-Spectrograms Between Different Configurations for Speech Synthesis

1 code implementation • 1 Apr 2022 • Fan-Lin Wang, Po-chun Hsu, Da-Rong Liu, Hung-Yi Lee

Most recent speech synthesis systems are composed of a synthesizer and a vocoder.

Speech Synthesis Voice Conversion

Paper
Code

Over-Reasoning and Redundant Calculation of Large Language Models

1 code implementation • 21 Jan 2024 • Cheng-Han Chiang, Hung-Yi Lee

Large language models (LLMs) can solve problems step-by-step.

GSM8K Math

Paper
Code

Gate Activation Signal Analysis for Gated Recurrent Neural Networks and Its Correlation with Phoneme Boundaries

1 code implementation • 22 Mar 2017 • Yu-Hsuan Wang, Cheng-Tao Chung, Hung-Yi Lee

In this paper we analyze the gate activation signals inside the gated recurrent neural networks, and find the temporal structure of such signals is highly correlated with the phoneme boundaries.

Paper
Code

Rhythm-Flexible Voice Conversion without Parallel Data Using Cycle-GAN over Phoneme Posteriorgram Sequences

1 code implementation • 9 Aug 2018 • Cheng-chieh Yeh, Po-chun Hsu, Ju-chieh Chou, Hung-Yi Lee, Lin-shan Lee

In this way, the length constraint mentioned above is removed to offer rhythm-flexible voice conversion without requiring parallel data.

Sound Audio and Speech Processing

Paper
Code

Mitigating Biases in Toxic Language Detection through Invariant Rationalization

1 code implementation • ACL (WOAH) 2021 • Yung-Sung Chuang, Mingye Gao, Hongyin Luo, James Glass, Hung-Yi Lee, Yun-Nung Chen, Shang-Wen Li

Automatic detection of toxic language plays an essential role in protecting social media users, especially minority groups, from verbal abuse.

Natural Language Understanding

Paper
Code

How to Estimate Model Transferability of Pre-Trained Speech Models?

1 code implementation • 1 Jun 2023 • Zih-Ching Chen, Chao-Han Huck Yang, Bo Li, Yu Zhang, Nanxin Chen, Shuo-Yiin Chang, Rohit Prabhavalkar, Hung-Yi Lee, Tara N. Sainath

In this work, we introduce a "score-based assessment" framework for estimating the transferability of pre-trained speech models (PSMs) for fine-tuning target tasks.

Paper
Code

Understanding Self-Attention of Self-Supervised Audio Transformers

2 code implementations • 5 Jun 2020 • Shu-wen Yang, Andy T. Liu, Hung-Yi Lee

Self-supervised Audio Transformers (SAT) enable great success in many downstream speech applications like ASR, but how they work has not been widely explored yet.

Paper
Code

Improving generalizability of distilled self-supervised speech processing models under distorted settings

1 code implementation • 14 Oct 2022 • Kuan-Po Huang, Yu-Kuan Fu, Tsu-Yuan Hsu, Fabian Ritter Gutierrez, Fan-Lin Wang, Liang-Hsuan Tseng, Yu Zhang, Hung-Yi Lee

Self-supervised learned (SSL) speech pre-trained models perform well across various speech processing tasks.

Knowledge Distillation

Paper
Code

Cascading and Direct Approaches to Unsupervised Constituency Parsing on Spoken Sentences

1 code implementation • 15 Mar 2023 • Yuan Tseng, Cheng-I Lai, Hung-Yi Lee

The goal is to determine the spoken sentences' hierarchical syntactic structure in the form of constituency parse trees, such that each node is a span of audio that corresponds to a constituent.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Code

MiniSUPERB: Lightweight Benchmark for Self-supervised Speech Models

1 code implementation • 30 May 2023 • Yu-Hsiang Wang, Huang-Yu Chen, Kai-Wei Chang, Winston Hsu, Hung-Yi Lee

In this paper, we introduce MiniSUPERB, a lightweight benchmark that efficiently evaluates SSL speech models with comparable results to SUPERB but lower computational costs significantly.

Self-Supervised Learning

Paper
Code

Why We Should Report the Details in Subjective Evaluation of TTS More Rigorously

1 code implementation • 3 Jun 2023 • Cheng-Han Chiang, Wei-Ping Huang, Hung-Yi Lee

This paper emphasizes the importance of reporting experiment details in subjective evaluations and demonstrates how such details can significantly impact evaluation results in the field of speech synthesis.

Speech Synthesis

Paper
Code

What does a network layer hear? Analyzing hidden representations of end-to-end ASR through speech synthesis

1 code implementation • 4 Nov 2019 • Chung-Yi Li, Pei-Chieh Yuan, Hung-Yi Lee

End-to-end speech recognition systems have achieved competitive results compared to traditional systems.

Speaker Verification Speech Enhancement +3

Paper
Code

Looking for Clues of Language in Multilingual BERT to Improve Cross-lingual Generalization

1 code implementation • 20 Oct 2020 • Chi-Liang Liu, Tsung-Yuan Hsu, Yung-Sung Chuang, Chung-Yi Li, Hung-Yi Lee

Token embeddings in multilingual BERT (m-BERT) contain both language and semantic information.

Translation

Paper
Code

On the Transferability of Pre-trained Language Models: A Study from Artificial Datasets

1 code implementation • 8 Sep 2021 • Cheng-Han Chiang, Hung-Yi Lee

In this work, we study what specific traits in the pre-training data, other than the semantics, make a pre-trained LM superior to their counterparts trained from scratch on downstream tasks.

Paper
Code

Compressing Transformer-based self-supervised models for speech processing

1 code implementation • 17 Nov 2022 • Tzu-Quan Lin, Tsung-Huan Yang, Chun-Yao Chang, Kuang-Ming Chen, Tzu-hsun Feng, Hung-Yi Lee, Hao Tang

Despite the success of Transformers in self- supervised learning with applications to various downstream tasks, the computational cost of training and inference remains a major challenge for applying these models to a wide spectrum of devices.

Knowledge Distillation Model Compression +1

Paper
Code

Code-switching Sentence Generation by Generative Adversarial Networks and its Application to Data Augmentation

2 code implementations • 6 Nov 2018 • Ching-Ting Chang, Shun-Po Chuang, Hung-Yi Lee

To mitigate the issue without expensive human annotation, we proposed an unsupervised method for code-switching data augmentation.

Data Augmentation Generative Adversarial Network +1

Paper
Code

Zero Resource Code-switched Speech Benchmark Using Speech Utterance Pairs For Multiple Spoken Languages

1 code implementation • 4 Oct 2023 • Kuan-Po Huang, Chih-Kai Yang, Yu-Kuan Fu, Ewan Dunbar, Hung-Yi Lee

We introduce a new zero resource code-switched speech benchmark designed to directly assess the code-switching capabilities of self-supervised speech encoders.

Language Modelling

Paper
Code

Merging Facts, Crafting Fallacies: Evaluating the Contradictory Nature of Aggregated Factual Claims in Long-Form Generations

1 code implementation • 8 Feb 2024 • Cheng-Han Chiang, Hung-Yi Lee

We show that LLMs can generate paragraphs that contain verifiable facts, but the facts are combined to form a non-factual paragraph due to entity ambiguity.

Paper
Code

Towards Unsupervised Automatic Speech Recognition Trained by Unaligned Speech and Text only

no code implementations • 29 Mar 2018 • Yi-Chen Chen, Chia-Hao Shen, Sung-Feng Huang, Hung-Yi Lee

In this work, we propose a framework to achieve unsupervised ASR on a read English speech dataset, where audio and text are unaligned.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Query-by-example Spoken Term Detection using Attention-based Multi-hop Networks

no code implementations • 1 Sep 2017 • Chia-Wei Ao, Hung-Yi Lee

Retrieving spoken content with spoken queries, or query-by- example spoken term detection (STD), is attractive because it makes possible the matching of signals directly on the acoustic level without transcribing them into text.

Paper
Add Code

Supervised and Unsupervised Transfer Learning for Question Answering

no code implementations • NAACL 2018 • Yu-An Chung, Hung-Yi Lee, James Glass

Although transfer learning has been shown to be successful for tasks like object and speech recognition, its applicability to question answering (QA) has yet to be well-studied.

Question Answering speech-recognition +2

Paper
Add Code

Scalable Sentiment for Sequence-to-sequence Chatbot Response with Performance Analysis

no code implementations • 7 Apr 2018 • Chih-Wei Lee, Yau-Shian Wang, Tsung-Yuan Hsu, Kuan-Yu Chen, Hung-Yi Lee, Lin-shan Lee

Conventional seq2seq chatbot models only try to find the sentences with the highest probabilities conditioned on the input sequences, without considering the sentiment of the output sentences.

Chatbot reinforcement-learning +1

Paper
Add Code

Joint Learning of Interactive Spoken Content Retrieval and Trainable User Simulator

no code implementations • 1 Apr 2018 • Pei-Hung Chung, Kuan Tung, Ching-Lun Tai, Hung-Yi Lee

User-machine interaction is crucial for information retrieval, especially for spoken content retrieval, because spoken content is difficult to browse, and speech recognition has a high degree of uncertainty.

Information Retrieval Q-Learning +3

Paper
Add Code

Completely Unsupervised Phoneme Recognition by Adversarially Learning Mapping Relationships from Audio Embeddings

no code implementations • 1 Apr 2018 • Da-Rong Liu, Kuan-Yu Chen, Hung-Yi Lee, Lin-shan Lee

Unsupervised discovery of acoustic tokens from audio corpora without annotation and learning vector representations for these tokens have been widely studied.

Generative Adversarial Network

Paper
Add Code

Mitigating the Impact of Speech Recognition Errors on Chatbot using Sequence-to-Sequence Model

no code implementations • 22 Sep 2017 • Pin-Jung Chen, I-Hung Hsu, Yi-Yao Huang, Hung-Yi Lee

We apply sequence-to-sequence model to mitigate the impact of speech recognition errors on open domain end-to-end dialog generation.

Chatbot Domain Adaptation +2

Paper
Add Code

Order-Preserving Abstractive Summarization for Spoken Content Based on Connectionist Temporal Classification

no code implementations • 16 Sep 2017 • Bo-Ru Lu, Frank Shyu, Yun-Nung Chen, Hung-Yi Lee, Lin-shan Lee

Connectionist temporal classification (CTC) is a powerful approach for sequence-to-sequence learning, and has been popularly used in speech recognition.

Abstractive Text Summarization General Classification +2

Paper
Add Code

Personalized word representations Carrying Personalized Semantics Learned from Social Network Posts

no code implementations • 29 Oct 2017 • Zih-Wei Lin, Tzu-Wei Sung, Hung-Yi Lee, Lin-shan Lee

In this framework, universal background word vectors are first learned from the background corpora, and then adapted by the personalized corpus for each individual user to learn the personalized word vectors.

Sentence Sentence Completion

Paper
Add Code

Language Transfer of Audio Word2Vec: Learning Audio Segment Representations without Target Language Data

no code implementations • 19 Jul 2017 • Chia-Hao Shen, Janet Y. Sung, Hung-Yi Lee

We train SA from one language (source language) and use it to extract the vector representation of the audio segments of another language (target language).

Paper
Add Code

Hierarchical Attention Model for Improved Machine Comprehension of Spoken Content

no code implementations • 28 Aug 2016 • Wei Fang, Jui-Yang Hsu, Hung-Yi Lee, Lin-shan Lee

Multimedia or spoken content presents more attractive information than plain text content, but the former is more difficult to display on a screen and be selected by a user.

Reading Comprehension

Paper
Add Code

Abstractive Headline Generation for Spoken Content by Attentive Recurrent Neural Networks with ASR Error Modeling

no code implementations • 26 Dec 2016 • Lang-Chi Yu, Hung-Yi Lee, Lin-shan Lee

In this way, the model for abstractive headline generation for spoken content can be learned from abundant text data and the ASR data for some recognizers.

Abstractive Text Summarization Document Summarization +1

Paper
Add Code

Attention-based Memory Selection Recurrent Network for Language Modeling

no code implementations • 26 Nov 2016 • Da-Rong Liu, Shun-Po Chuang, Hung-Yi Lee

Recurrent neural networks (RNNs) have achieved great success in language modeling.

Language Modelling Sentence

Paper
Add Code

Personalizing Universal Recurrent Neural Network Language Model with User Characteristic Features by Social Network Crowdsouring

no code implementations • 3 Jun 2015 • Bo-Hsiang Tseng, Hung-Yi Lee, Lin-shan Lee

With the popularity of mobile devices, personalized speech recognizer becomes more realizable today and highly attractive.

Language Modelling

Paper
Add Code

Interactive Spoken Content Retrieval by Deep Reinforcement Learning

no code implementations • 16 Sep 2016 • Yen-chen Wu, Tzu-Hsiang Lin, Yang-De Chen, Hung-Yi Lee, Lin-shan Lee

In our previous work, some hand-crafted states estimated from the present retrieval results are used to determine the proper actions.

Q-Learning reinforcement-learning +4

Paper
Add Code

Towards Machine Comprehension of Spoken Content: Initial TOEFL Listening Comprehension Test by Machine

no code implementations • 23 Aug 2016 • Bo-Hsiang Tseng, Sheng-syun Shen, Hung-Yi Lee, Lin-shan Lee

Multimedia or spoken content presents more attractive information than plain text content, but it's more difficult to display on a screen and be selected by a user.

Reading Comprehension Sentence

Paper
Add Code

Neural Attention Models for Sequence Classification: Analysis and Application to Key Term Extraction and Dialogue Act Detection

no code implementations • 31 Mar 2016 • Sheng-syun Shen, Hung-Yi Lee

The major difficulty of sequence labeling is that when the input sequence is long, it can include many noisy or irrelevant part.

Caption Generation Classification +8

Paper
Add Code

An Iterative Deep Learning Framework for Unsupervised Discovery of Speech Features and Linguistic Units with Applications on Spoken Term Detection

no code implementations • 1 Feb 2016 • Cheng-Tao Chung, Cheng-Yu Tsai, Hsiang-Hung Lu, Chia-Hsiang Liu, Hung-Yi Lee, Lin-shan Lee

The multiple sets of token labels are then used as the targets of a Multi-target Deep Neural Network (MDNN) trained on low-level acoustic features.

Paper
Add Code

Towards Structured Deep Neural Network for Automatic Speech Recognition

no code implementations • 8 Nov 2015 • Yi-Hsiu Liao, Hung-Yi Lee, Lin-shan Lee

In this paper we propose the Structured Deep Neural Network (structured DNN) as a structured and deep learning framework.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

A Multi-layered Acoustic Tokenizing Deep Neural Network (MAT-DNN) for Unsupervised Discovery of Linguistic Units and Generation of High Quality Features

no code implementations • 7 Jun 2015 • Cheng-Tao Chung, Cheng-Yu Tsai, Hsiang-Hung Lu, Yuan-ming Liou, Yen-chen Wu, Yen-Ju Lu, Hung-Yi Lee, Lin-shan Lee

The Multi-layered Acoustic Tokenizer (MAT) proposed in this work automatically discovers multiple sets of acoustic tokens from the given corpus.

Paper
Add Code

Towards Structured Deep Neural Network for Automatic Speech Recognition

no code implementations • 3 Jun 2015 • Yi-Hsiu Liao, Hung-Yi Lee, Lin-shan Lee

In this paper we propose the Structured Deep Neural Network (Structured DNN) as a structured and deep learning algorithm, learning to find the best structured object (such as a label sequence) given a structured input (such as a vector sequence) by globally considering the mapping relationships between the structure rather than item by item.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Phonetic-and-Semantic Embedding of Spoken Words with Applications in Spoken Content Retrieval

no code implementations • 21 Jul 2018 • Yi-Chen Chen, Sung-Feng Huang, Chia-Hao Shen, Hung-Yi Lee, Lin-shan Lee

Stage 1 performs phonetic embedding with speaker characteristics disentangled.

Retrieval

Paper
Add Code

Segmental Audio Word2Vec: Representing Utterances as Sequences of Vectors with Applications in Spoken Term Detection

no code implementations • 7 Aug 2018 • Yu-Hsuan Wang, Hung-Yi Lee, Lin-shan Lee

In this paper, we extend audio Word2Vec from word-level to utterance-level by proposing a new segmental audio Word2Vec, in which unsupervised spoken word boundary segmentation and audio Word2Vec are jointly learned and mutually enhanced, so an utterance can be directly represented as a sequence of vectors carrying phonetic structure information.

Segmentation

Paper
Add Code

Towards Audio to Scene Image Synthesis using Generative Adversarial Network

no code implementations • 13 Aug 2018 • Chia-Hung Wan, Shun-Po Chuang, Hung-Yi Lee

Humans can imagine a scene from a sound.

Generative Adversarial Network Image Generation

Paper
Add Code

Proximal Policy Optimization and its Dynamic Version for Sequence Generation

no code implementations • 24 Aug 2018 • Yi-Lin Tuan, Jinzhi Zhang, Yujia Li, Hung-Yi Lee

In sequence generation task, many works use policy gradient for model optimization to tackle the intractable backpropagation issue when maximizing the non-differentiable evaluation metrics or fooling the discriminator in adversarial learning.

Chatbot Model Optimization +2

Paper
Add Code

Almost-unsupervised Speech Recognition with Close-to-zero Resource Based on Phonetic Structures Learned from Very Small Unpaired Speech and Text Data

no code implementations • 30 Oct 2018 • Yi-Chen Chen, Chia-Hao Shen, Sung-Feng Huang, Hung-Yi Lee, Lin-shan Lee

This can be learned by aligning a small number of spoken words and the corresponding text words in the embedding spaces.

speech-recognition Speech Recognition +1

Paper
Add Code

Adversarial Training of End-to-end Speech Recognition Using a Criticizing Language Model

no code implementations • 2 Nov 2018 • Alexander H. Liu, Hung-Yi Lee, Lin-shan Lee

In this paper we proposed a novel Adversarial Training (AT) approach for end-to-end speech recognition using a Criticizing Language Model (CLM).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Improved Audio Embeddings by Adjacency-Based Clustering with Applications in Spoken Term Detection

no code implementations • 7 Nov 2018 • Sung-Feng Huang, Yi-Chen Chen, Hung-Yi Lee, Lin-shan Lee

Embedding audio signal segments into vectors with fixed dimensionality is attractive because all following processing will be easier and more efficient, for example modeling, classifying or indexing.

Clustering

Paper
Add Code

Adversarial Learning of Label Dependency: A Novel Framework for Multi-class Classification

no code implementations • 12 Nov 2018 • Che-Ping Tsai, Hung-Yi Lee

The discriminator learns to model label dependency by discriminating real and generated label sets.

Classification General Classification +3

Paper
Add Code

Completely Unsupervised Speech Recognition By A Generative Adversarial Network Harmonized With Iteratively Refined Hidden Markov Models

no code implementations • 8 Apr 2019 • Kuan-Yu Chen, Che-Ping Tsai, Da-Rong Liu, Hung-Yi Lee, Lin-shan Lee

Producing a large annotated speech corpus for training ASR systems remains difficult for more than 95% of languages all over the world which are low-resourced, but collecting a relatively big unlabeled data set for such languages is more achievable.

Generative Adversarial Network speech-recognition +2

Paper
Add Code

From Semi-supervised to Almost-unsupervised Speech Recognition with Very-low Resource by Jointly Learning Phonetic Structures from Audio and Text Embeddings

no code implementations • 10 Apr 2019 • Yi-Chen Chen, Sung-Feng Huang, Hung-Yi Lee, Lin-shan Lee

However, we note human babies start to learn the language by the sounds (or phonetic structures) of a small number of exemplar words, and "generalize" such knowledge to other words without hearing a large amount of data.

speech-recognition Speech Recognition +1

Paper
Add Code

End-to-end Text-to-speech for Low-resource Languages by Cross-Lingual Transfer Learning

no code implementations • 13 Apr 2019 • Tao Tu, Yuan-Jui Chen, Cheng-chieh Yeh, Hung-Yi Lee

In this paper, we aim to build TTS systems for such low-resource (target) languages where only very limited paired data are available.

Cross-Lingual Transfer Transfer Learning

Paper
Add Code

Cross-Lingual Transfer Learning for Question Answering

no code implementations • 13 Jul 2019 • Chia-Hsuan Lee, Hung-Yi Lee

In this paper, we explore the problem of cross-lingual transfer learning for QA, where a source language task with plentiful annotations is utilized to improve the performance of a QA model on a target language task with limited available annotations.

Cross-Lingual Transfer Machine Translation +4

Paper
Add Code

Polly Want a Cracker: Analyzing Performance of Parroting on Paraphrase Generation Datasets

no code implementations • IJCNLP 2019 • Hongren Mao, Hung-Yi Lee

Paraphrase generation is an interesting and challenging NLP task which has numerous practical applications.

Paraphrase Generation Sentence

Paper
Add Code

Zero-shot Reading Comprehension by Cross-lingual Transfer Learning with Multi-lingual Language Representation Model

no code implementations • IJCNLP 2019 • Tsung-Yuan Hsu, Chi-Liang Liu, Hung-Yi Lee

Because it is not feasible to collect training data for every language, there is a growing interest in cross-lingual transfer learning.

Reading Comprehension Transfer Learning +2

Paper
Add Code

SpeechBERT: An Audio-and-text Jointly Learned Language Model for End-to-end Spoken Question Answering

no code implementations • 25 Oct 2019 • Yung-Sung Chuang, Chi-Liang Liu, Hung-Yi Lee, Lin-shan Lee

In addition to the potential of end-to-end SQA, the SpeechBERT can also be considered for many other spoken language understanding tasks just as BERT for many text processing tasks.

Ranked #3 on Spoken Language Understanding on Spoken-SQuAD

Language Modelling Question Answering +2

Paper
Add Code

Towards Unsupervised Speech Recognition and Synthesis with Quantized Speech Representation Learning

no code implementations • 28 Oct 2019 • Alexander H. Liu, Tao Tu, Hung-Yi Lee, Lin-shan Lee

In this paper we propose a Sequential Representation Quantization AutoEncoder (SeqRQ-AE) to learn from primarily unpaired audio data and produce sequences of representations very close to phoneme sequences of speech utterances.

Clustering Quantization +4

Paper
Add Code

Meta Learning for End-to-End Low-Resource Speech Recognition

no code implementations • 26 Oct 2019 • Jui-Yang Hsu, Yuan-Jui Chen, Hung-Yi Lee

In this paper, we proposed to apply meta learning approach for low-resource automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Training a code-switching language model with monolingual data

no code implementations • 14 Nov 2019 • Shun-Po Chuang, Tzu-Wei Sung, Hung-Yi Lee

A lack of code-switching data complicates the training of code-switching (CS) language models.

Language Modelling Translation +1

Paper
Add Code

Towards Robust Neural Vocoding for Speech Generation: A Survey

no code implementations • 5 Dec 2019 • Po-chun Hsu, Chun-hsuan Wang, Andy T. Liu, Hung-Yi Lee

We found out that the speaker variety is much more important for achieving a universal vocoder than the language.

Speech Synthesis Voice Conversion

Paper
Add Code

MITAS: A Compressed Time-Domain Audio Separation Network with Parameter Sharing

no code implementations • 9 Dec 2019 • Chao-I Tuan, Yuan-Kuei Wu, Hung-Yi Lee, Yu Tsao

Our experimental results first confirmed the robustness of our MiTAS on two types of perturbations in mixed audio.

Speech Separation

Paper
Add Code

BERT's output layer recognizes all hidden layers? Some Intriguing Phenomena and a simple way to boost BERT

no code implementations • 25 Jan 2020 • Wei-Tsung Kao, Tsung-Han Wu, Po-Han Chi, Chun-Cheng Hsieh, Hung-Yi Lee

Although Bidirectional Encoder Representations from Transformers (BERT) have achieved tremendous success in many natural language processing (NLP) tasks, it remains a black box.

Sentence

Paper
Add Code

Defense against adversarial attacks on spoofing countermeasures of ASV

no code implementations • 6 Mar 2020 • Haibin Wu, Songxiang Liu, Helen Meng, Hung-Yi Lee

Various forefront countermeasure methods for automatic speaker verification (ASV) with considerable performance in anti-spoofing are proposed in the ASVspoof 2019 challenge.

Speaker Verification

Paper
Add Code

A Study of Cross-Lingual Ability and Language-specific Information in Multilingual BERT

no code implementations • 20 Apr 2020 • Chi-Liang Liu, Tsung-Yuan Hsu, Yung-Sung Chuang, Hung-Yi Lee

Recently, multilingual BERT works remarkably well on cross-lingual transfer tasks, superior to static non-contextualized word embeddings.

Cross-Lingual Transfer Translation +1

Paper
Add Code

Learning Interpretable and Discrete Representations with Adversarial Training for Unsupervised Text Classification

no code implementations • 28 Apr 2020 • Yau-Shian Wang, Hung-Yi Lee, Yun-Nung Chen

Also, the performance is on par with a recently proposed weakly-supervised text classification method.

General Classification text-classification +1

Paper
Add Code

End-to-end Whispered Speech Recognition with Frequency-weighted Approaches and Pseudo Whisper Pre-training

no code implementations • 5 May 2020 • Heng-Jui Chang, Alexander H. Liu, Hung-Yi Lee, Lin-shan Lee

Whispering is an important mode of human speech, but no end-to-end recognition results for it were reported yet, probably due to the scarcity of available whispered speech data.

speech-recognition Speech Recognition +1

Paper
Add Code

DARTS-ASR: Differentiable Architecture Search for Multilingual Speech Recognition and Adaptation

no code implementations • 13 May 2020 • Yi-Chen Chen, Jui-Yang Hsu, Cheng-Kuang Lee, Hung-Yi Lee

In order to examine the generalizability of DARTS-ASR, we apply our approach not only on many languages to perform monolingual ASR, but also on a multilingual ASR setting.

speech-recognition Speech Recognition

Paper
Add Code

Semi-supervised Learning for Multi-speaker Text-to-speech Synthesis Using Discrete Speech Representation

no code implementations • 16 May 2020 • Tao Tu, Yuan-Jui Chen, Alexander H. Liu, Hung-Yi Lee

The experiment results demonstrate that with only an hour of paired speech data, no matter the paired data is from multiple speakers or a single speaker, the proposed model can generate intelligible speech in different voices.

Speech Synthesis Text-To-Speech Synthesis

Paper
Add Code

Worse WER, but Better BLEU? Leveraging Word Embedding as Intermediate in Multitask End-to-End Speech Translation

no code implementations • ACL 2020 • Shun-Po Chuang, Tzu-Wei Sung, Alexander H. Liu, Hung-Yi Lee

Speech translation (ST) aims to learn transformations from speech in the source language to the text in the target language.

Translation

Paper
Add Code

Input-independent Attention Weights Are Expressive Enough: A Study of Attention in Self-supervised Audio Transformers

no code implementations • 9 Jun 2020 • Tsung-Han Wu, Chun-Chen Hsieh, Yen-Hao Chen, Po-Han Chi, Hung-Yi Lee

In this paper, we seek solutions for reducing the computation complexity of transformer-based models for speech representation learning.

General Classification Representation Learning

Paper
Add Code

Investigation of Sentiment Controllable Chatbot

no code implementations • 11 Jul 2020 • Hung-Yi Lee, Cheng-Hao Ho, Chien-Fu Lin, Chiung-Chih Chang, Chih-Wei Lee, Yau-Shian Wang, Tsung-Yuan Hsu, Kuan-Yu Chen

Conventional seq2seq chatbot models attempt only to find sentences with the highest probabilities conditioned on the input sequences, without considering the sentiment of the output sentences.

Chatbot reinforcement-learning +1

Paper
Add Code

What makes multilingual BERT multilingual?

no code implementations • 20 Oct 2020 • Chi-Liang Liu, Tsung-Yuan Hsu, Yung-Sung Chuang, Hung-Yi Lee

Recently, multilingual BERT works remarkably well on cross-lingual transfer tasks, superior to static non-contextualized word embeddings.

Cross-Lingual Transfer Word Embeddings

Paper
Add Code

Pre-Training a Language Model Without Human Language

no code implementations • 22 Dec 2020 • Cheng-Han Chiang, Hung-Yi Lee

In this paper, we study how the intrinsic nature of pre-training data contributes to the fine-tuned downstream performance.

Language Modelling

Paper
Add Code

Adversarial defense for automatic speaker verification by cascaded self-supervised learning models

no code implementations • 14 Feb 2021 • Haibin Wu, Xu Li, Andy T. Liu, Zhiyong Wu, Helen Meng, Hung-Yi Lee

Automatic speaker verification (ASV) is one of the core technologies in biometric identification.

Adversarial Defense Open-Ended Question Answering +2

Paper
Add Code

Is BERT a Cross-Disciplinary Knowledge Learner? A Surprising Finding of Pre-trained Models' Transferability

no code implementations • 12 Mar 2021 • Wei-Tsung Kao, Hung-Yi Lee

This paper investigates whether the power of the models pre-trained on text data, such as BERT, can be transferred to general token sequence classification applications.

General Classification text-classification +1

Paper
Add Code

Put Chatbot into Its Interlocutor's Shoes: New Framework to Learn Chatbot Responding with Intention

no code implementations • NAACL 2021 • Hsuan Su, Jiun-Hao Jhan, Fan-Yun Sun, Saurav Sahay, Hung-Yi Lee

Our framework includes a guiding chatbot and an interlocutor model that plays the role of humans.

Chatbot

Paper
Add Code

Towards Lifelong Learning of End-to-end ASR

no code implementations • 4 Apr 2021 • Heng-Jui Chang, Hung-Yi Lee, Lin-shan Lee

We can collect new data describing the new environment and fine-tune the system, but this naturally leads to higher error rates for the earlier datasets, referred to as catastrophic forgetting.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Non-autoregressive Mandarin-English Code-switching Speech Recognition

no code implementations • 6 Apr 2021 • Shun-Po Chuang, Heng-Jui Chang, Sung-Feng Huang, Hung-Yi Lee

Mandarin-English code-switching (CS) is frequently used among East and Southeast Asian people.

Sentence speech-recognition +1

Paper
Add Code

How Far Are We from Robust Voice Conversion: A Survey

no code implementations • 24 Nov 2020 • Tzu-Hsien Huang, Jheng-Hao Lin, Chien-yu Huang, Hung-Yi Lee

Voice conversion technologies have been greatly improved in recent years with the help of deep learning, but their capabilities of producing natural sounding utterances in different conditions remain unclear.

Speaker Identification Voice Conversion

Paper
Add Code

Improving Cross-Lingual Reading Comprehension with Self-Training

no code implementations • 8 May 2021 • Wei-Cheng Huang, Chien-yu Huang, Hung-Yi Lee

Substantial improvements have been made in machine reading comprehension, where the machine answers questions based on a given context.

Machine Reading Comprehension

Paper
Add Code

Improving the Adversarial Robustness for Speaker Verification by Self-Supervised Learning

no code implementations • 1 Jun 2021 • Haibin Wu, Xu Li, Andy T. Liu, Zhiyong Wu, Helen Meng, Hung-Yi Lee

This work is among the first to perform adversarial defense for ASV without knowing the specific attack algorithms.

Adversarial Defense Adversarial Robustness +2

Paper
Add Code

Parallelized Reverse Curriculum Generation

no code implementations • 4 Aug 2021 • Zih-Yun Chiu, Yi-Lin Tuan, Hung-Yi Lee, Li-Chen Fu

For reinforcement learning (RL), it is challenging for an agent to master a task that requires a specific series of actions due to sparse rewards.

Reinforcement Learning (RL)

Paper
Add Code

Meta Learning and Its Applications to Natural Language Processing

no code implementations • ACL 2021 • Hung-Yi Lee, Ngoc Thang Vu, Shang-Wen Li

Meta-learning is one of the most important new techniques in machine learning in recent years.

Dialogue Generation Few-Shot Learning +2

Paper
Add Code

Mandarin-English Code-switching Speech Recognition with Self-supervised Speech Representation Models

no code implementations • 7 Oct 2021 • Liang-Hsuan Tseng, Yu-Kuan Fu, Heng-Jui Chang, Hung-Yi Lee

Code-switching (CS) is common in daily conversations where more than one language is used within a sentence.

Language Identification Self-Supervised Learning +3

Paper
Add Code

Analyzing the Robustness of Unsupervised Speech Recognition

no code implementations • 7 Oct 2021 • Guan-Ting Lin, Chan-Jan Hsu, Da-Rong Liu, Hung-Yi Lee, Yu Tsao

In this work, we further analyze the training robustness of unsupervised ASR on the domain mismatch scenarios in which the domains of unpaired speech and text are different.

Generative Adversarial Network speech-recognition +2

Paper
Add Code

CheerBots: Chatbots toward Empathy and Emotionusing Reinforcement Learning

no code implementations • 8 Oct 2021 • Jiun-Hao Jhan, Chao-Peng Liu, Shyh-Kang Jeng, Hung-Yi Lee

Apart from the coherence and fluency of responses, an empathetic chatbot emphasizes more on people's feelings.

Chatbot reinforcement-learning +2

Paper
Add Code

An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition

no code implementations • 9 Oct 2021 • Xuankai Chang, Takashi Maekaku, Pengcheng Guo, Jing Shi, Yen-Ju Lu, Aswin Shanmugam Subramanian, Tianzi Wang, Shu-wen Yang, Yu Tsao, Hung-Yi Lee, Shinji Watanabe

We select several pretrained speech representations and present the experimental results on various open-source and publicly available corpora for E2E-ASR.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Toward Degradation-Robust Voice Conversion

no code implementations • 14 Oct 2021 • Chien-yu Huang, Kai-Wei Chang, Hung-Yi Lee

However, in real-world scenarios, it is difficult to collect clean utterances of a speaker, and they are usually degraded by noises or reverberations.

Denoising Speech Enhancement +1

Paper
Add Code

Don't speak too fast: The impact of data bias on self-supervised speech models

no code implementations • 15 Oct 2021 • Yen Meng, Yi-Hui Chou, Andy T. Liu, Hung-Yi Lee

Self-supervised Speech Models (S3Ms) have been proven successful in many speech downstream tasks, like ASR.

Paper
Add Code

Speech Representation Learning Through Self-supervised Pretraining And Multi-task Finetuning

no code implementations • 18 Oct 2021 • Yi-Chen Chen, Shu-wen Yang, Cheng-Kuang Lee, Simon See, Hung-Yi Lee

It has been shown that an SSL pretraining model can achieve excellent performance in various downstream tasks of speech processing.

Multi-Task Learning Representation Learning +1

Paper
Add Code

Is BERT a Cross-Disciplinary Knowledge Learner? A Surprising Finding of Pre-trained Models’ Transferability

no code implementations • Findings (EMNLP) 2021 • Wei-Tsung Kao, Hung-Yi Lee

This paper investigates whether the power of the models pre-trained on text data, such as BERT, can be transferred to general token sequence classification applications.

text-classification Text Classification

Paper
Add Code

Characterizing the adversarial vulnerability of speech self-supervised learning

no code implementations • 8 Nov 2021 • Haibin Wu, Bo Zheng, Xu Li, Xixin Wu, Hung-Yi Lee, Helen Meng

As the paradigm of the self-supervised learning upstream model followed by downstream tasks arouses more attention in the speech community, characterizing the adversarial robustness of such paradigm is of high priority.

Adversarial Robustness Benchmarking +2

Paper
Add Code

TopicGAN: Unsupervised Text Generation from Explainable Latent Topics

no code implementations • 27 Sep 2018 • Yau-Shian Wang, Yun-Nung Chen, Hung-Yi Lee

Learning discrete representations of data and then generating data from the discovered representations have been increasingly studied because the obtained discrete representations can benefit unsupervised learning.

Image Generation Text Generation

Paper
Add Code

Partially Fake Audio Detection by Self-attention-based Fake Span Discovery

no code implementations • 14 Feb 2022 • Haibin Wu, Heng-Cheng Kuo, Naijun Zheng, Kuo-Hsuan Hung, Hung-Yi Lee, Yu Tsao, Hsin-Min Wang, Helen Meng

Also ADD 2022 is the first challenge to propose the partially fake audio detection task.

Open-Ended Question Answering Speech Synthesis +1

Paper
Add Code

Spoofing-Aware Speaker Verification by Multi-Level Fusion

no code implementations • 29 Mar 2022 • Haibin Wu, Lingwei Meng, Jiawen Kang, Jinchao Li, Xu Li, Xixin Wu, Hung-Yi Lee, Helen Meng

In the second-level fusion, the CM score and ASV scores directly from ASV systems will be concatenated into a prediction block for the final decision.

Speaker Verification

Paper
Add Code

Improving Distortion Robustness of Self-supervised Speech Processing Tasks with Domain Adaptation

no code implementations • 30 Mar 2022 • Kuan Po Huang, Yu-Kuan Fu, Yu Zhang, Hung-Yi Lee

Speech distortions are a long-standing problem that degrades the performance of supervisely trained speech processing models.

Domain Adaptation

Paper
Add Code

On the Efficiency of Integrating Self-supervised Learning and Meta-learning for User-defined Few-shot Keyword Spotting

no code implementations • 1 Apr 2022 • Wei-Tsung Kao, Yuan-Kuei Wu, Chia-Ping Chen, Zhi-Sheng Chen, Yu-Pao Tsai, Hung-Yi Lee

User-defined keyword spotting is a task to detect new spoken terms defined by users.

Few-Shot Learning Keyword Spotting +1

Paper
Add Code

DDOS: A MOS Prediction Framework utilizing Domain Adaptive Pre-training and Distribution of Opinion Scores

no code implementations • 7 Apr 2022 • Wei-Cheng Tseng, Wei-Tsung Kao, Hung-Yi Lee

Mean opinion score (MOS) is a typical subjective evaluation metric for speech synthesis systems.

Self-Supervised Learning Speech Synthesis

Paper
Add Code

Re-Examining Human Annotations for Interpretable NLP

no code implementations • 10 Apr 2022 • Cheng-Han Chiang, Hung-Yi Lee

Our results reveal that the annotation quality is highly subject to the workers' qualification, and workers can be guided to provide certain annotations by the instructions.

Paper
Add Code

Understanding, Detecting, and Separating Out-of-Distribution Samples and Adversarial Samples in Text Classification

no code implementations • 9 Apr 2022 • Cheng-Han Chiang, Hung-Yi Lee

Based on our observations, we propose a simple method to separate ID, OOD, and Adv samples using the hidden representations and output probabilities of the model.

text-classification Text Classification

Paper
Add Code

XDBERT: Distilling Visual Information to BERT from Cross-Modal Systems to Improve Language Understanding

no code implementations • ACL 2022 • Chan-Jan Hsu, Hung-Yi Lee, Yu Tsao

Transformer-based models are widely used in natural language understanding (NLU) tasks, and multimodal transformers have been effective in visual-language tasks.

Natural Language Understanding

Paper
Add Code

Meta Learning for Natural Language Processing: A Survey

no code implementations • NAACL 2022 • Hung-Yi Lee, Shang-Wen Li, Ngoc Thang Vu

Deep learning has been the mainstream technique in natural language processing (NLP) area.

Meta-Learning

Paper
Add Code

Silence is Sweeter Than Speech: Self-Supervised Model Using Silence to Store Speaker Information

no code implementations • 8 May 2022 • Chi-Luen Feng, Po-chun Hsu, Hung-Yi Lee

We found that HuBERT stores speaker information in representations whose positions correspond to silences in a waveform.

Self-Supervised Learning Speaker Identification

Paper
Add Code

Self-Supervised Speech Representation Learning: A Review

no code implementations • 21 May 2022 • Abdelrahman Mohamed, Hung-Yi Lee, Lasse Borgholt, Jakob D. Havtorn, Joakim Edin, Christian Igel, Katrin Kirchhoff, Shang-Wen Li, Karen Livescu, Lars Maaløe, Tara N. Sainath, Shinji Watanabe

Although self-supervised speech representation is still a nascent research area, it is closely related to acoustic word embedding and learning with zero lexical resources, both of which have seen active research for many years.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Structured Prompt Tuning

no code implementations • 24 May 2022 • Chi-Liang Liu, Hung-Yi Lee, Wen-tau Yih

We propose structured prompt tuning, a simple and effective method to improve prompt tuning.

Paper
Add Code

Searching for the Essence of Adversarial Perturbations

no code implementations • 30 May 2022 • Dennis Y. Menn, Tzu-hsun Feng, Hung-Yi Lee

Neural networks have demonstrated state-of-the-art performance in various machine learning fields.

Autonomous Driving

Paper
Add Code

Learning to Generate Prompts for Dialogue Generation through Reinforcement Learning

no code implementations • 8 Jun 2022 • Hsuan Su, PoHan Chi, Shih-Cheng Huang, Chung Ho Lam, Saurav Sahay, Shang-Tse Chen, Hung-Yi Lee

Much literature has shown that prompt-based learning is an efficient method to make use of the large pre-trained language model.

Dialogue Generation Language Modelling +4

Paper
Add Code

Tackling Spoofing-Aware Speaker Verification with Multi-Model Fusion

no code implementations • 18 Jun 2022 • Haibin Wu, Jiawen Kang, Lingwei Meng, Yang Zhang, Xixin Wu, Zhiyong Wu, Hung-Yi Lee, Helen Meng

However, previous works show that state-of-the-art ASV models are seriously vulnerable to voice spoofing attacks, and the recently proposed high-performance spoofing countermeasure (CM) models only focus solely on the standalone anti-spoofing tasks, and ignore the subsequent speaker verification process.

Open-Ended Question Answering Speaker Verification

Paper
Add Code

Few-Shot Cross-Lingual TTS Using Transferable Phoneme Embedding

no code implementations • 27 Jun 2022 • Wei-Ping Huang, Po-Chun Chen, Sung-Feng Huang, Hung-Yi Lee

This paper studies a transferable phoneme embedding framework that aims to deal with the cross-lingual text-to-speech (TTS) problem under the few-shot setting.

Few-Shot Learning Transfer Learning

Paper
Add Code

Learning Phone Recognition from Unpaired Audio and Phone Sequences Based on Generative Adversarial Network

no code implementations • 29 Jul 2022 • Da-Rong Liu, Po-chun Hsu, Yi-Chen Chen, Sung-Feng Huang, Shun-Po Chuang, Da-Yi Wu, Hung-Yi Lee

GAN training is adopted in the first stage to find the mapping relationship between unpaired speech and phone sequence.

Acoustic Unit Discovery Generative Adversarial Network

Paper
Add Code

Push-Pull: Characterizing the Adversarial Robustness for Audio-Visual Active Speaker Detection

no code implementations • 3 Oct 2022 • Xuanjun Chen, Haibin Wu, Helen Meng, Hung-Yi Lee, Jyh-Shing Roger Jang

Audio-visual active speaker detection (AVASD) is well-developed, and now is an indispensable front-end for several multi-modal applications.

Adversarial Robustness Audio-Visual Active Speaker Detection

Paper
Add Code

Are Synonym Substitution Attacks Really Synonym Substitution Attacks?

no code implementations • 6 Oct 2022 • Cheng-Han Chiang, Hung-Yi Lee

In this paper, we explore the following question: Are synonym substitution attacks really synonym substitution attacks (SSAs)?

Sentence

Paper
Add Code

Exploring Efficient-tuning Methods in Self-supervised Speech Models

no code implementations • 10 Oct 2022 • Zih-Ching Chen, Chin-Lun Fu, Chih-Ying Liu, Shang-Wen Li, Hung-Yi Lee

In downstream tasks, the parameters of SSL models are frozen, and only the adapters are trained.

Self-Supervised Learning

Paper
Add Code

On Compressing Sequences for Self-Supervised Speech Models

no code implementations • 13 Oct 2022 • Yen Meng, Hsuan-Jui Chen, Jiatong Shi, Shinji Watanabe, Paola Garcia, Hung-Yi Lee, Hao Tang

Subsampling while training self-supervised models not only improves the overall performance on downstream tasks under certain frame rates, but also brings significant speed-up in inference.

Self-Supervised Learning

Paper
Add Code

SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning

no code implementations • 16 Oct 2022 • Tzu-hsun Feng, Annie Dong, Ching-Feng Yeh, Shu-wen Yang, Tzu-Quan Lin, Jiatong Shi, Kai-Wei Chang, Zili Huang, Haibin Wu, Xuankai Chang, Shinji Watanabe, Abdelrahman Mohamed, Shang-Wen Li, Hung-Yi Lee

We present the SUPERB challenge at SLT 2022, which aims at learning self-supervised speech representation for better performance, generalization, and efficiency.

Audio Generation Representation Learning +2

Paper
Add Code

M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval

no code implementations • 2 Nov 2022 • Layne Berry, Yi-Jen Shih, Hsuan-Fu Wang, Heng-Jui Chang, Hung-Yi Lee, David Harwath

This work investigates the use of large-scale, English-only pre-trained models (CLIP and HuBERT) for multilingual image-speech retrieval.

Image Retrieval Retrieval +1

Paper
Add Code

Once-for-All Sequence Compression for Self-Supervised Speech Models

no code implementations • 4 Nov 2022 • Hsuan-Jui Chen, Yen Meng, Hung-Yi Lee

The sequence length along the time axis is often the dominant factor of the computation in speech processing.

Paper
Add Code

Bridging Speech and Textual Pre-trained Models with Unsupervised ASR

no code implementations • 6 Nov 2022 • Jiatong Shi, Chan-Jan Hsu, Holam Chung, Dongji Gao, Paola Garcia, Shinji Watanabe, Ann Lee, Hung-Yi Lee

To be specific, we propose to use unsupervised automatic speech recognition (ASR) as a connector that bridges different modalities used in speech and textual pre-trained models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Introducing Semantics into Speech Encoders

no code implementations • 15 Nov 2022 • Derek Xu, Shuyan Dong, Changhan Wang, Suyoun Kim, Zhaojiang Lin, Akshat Shrivastava, Shang-Wen Li, Liang-Hsuan Tseng, Alexei Baevski, Guan-Ting Lin, Hung-Yi Lee, Yizhou Sun, Wei Wang

Recent studies find existing self-supervised speech encoders contain primarily acoustic rather than semantic information.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +10

Paper
Add Code

Model Extraction Attack against Self-supervised Speech Models

no code implementations • 29 Nov 2022 • Tsu-Yuan Hsu, Chen-An Li, Tung-Yu Wu, Hung-Yi Lee

In the first stage, SSL is conducted on the large-scale unlabeled corpus to pre-train a small speech model.

Model extraction Self-Supervised Learning

Paper
Add Code

General Framework for Self-Supervised Model Priming for Parameter-Efficient Fine-tuning

no code implementations • 2 Dec 2022 • Shih-Cheng Huang, Shih-Heng Wang, Min-Han Shih, Saurav Sahay, Hung-Yi Lee

To tackle these issues, we propose a general framework to enhance the few-shot adaptation and cross-domain generalization ability of parameter-efficient methods.

Domain Generalization

Paper
Add Code

CHAPTER: Exploiting Convolutional Neural Network Adapters for Self-supervised Speech Models

no code implementations • 1 Dec 2022 • Zih-Ching Chen, Yu-Shun Sung, Hung-Yi Lee

However, such efficient tuning techniques only provide adaptation at the transformer layer, but failed to perform adaptation at the feature extractor.

Self-Supervised Learning

Paper
Add Code

SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding Tasks

no code implementations • 20 Dec 2022 • Suwon Shon, Siddhant Arora, Chyi-Jiunn Lin, Ankita Pasad, Felix Wu, Roshan Sharma, Wei-Lun Wu, Hung-Yi Lee, Karen Livescu, Shinji Watanabe

In this work, we introduce several new annotated SLU benchmark tasks based on freely available speech data, which complement existing benchmarks and address gaps in the SLU evaluation landscape.

Dialog Act Classification Question Answering +4

Paper
Add Code

Hierarchical Programmatic Reinforcement Learning via Learning to Compose Programs

no code implementations • 30 Jan 2023 • Guan-Ting Liu, En-Pei Hu, Pu-Jen Cheng, Hung-Yi Lee, Shao-Hua Sun

Aiming to produce reinforcement learning (RL) policies that are human-interpretable and can generalize better to novel scenarios, Trivedi et al. (2021) present a method (LEAPS) that first learns a program embedding space to continuously parameterize diverse programs from a pre-generated program dataset, and then searches for a task-solving program in the learned program embedding space when given a task.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Position Matters! Empirical Study of Order Effect in Knowledge-grounded Dialogue

no code implementations • 12 Feb 2023 • Hsuan Su, Shachi H Kumar, Sahisnu Mazumder, Wenda Chen, Ramesh Manuvinakurike, Eda Okur, Saurav Sahay, Lama Nachman, Shang-Tse Chen, Hung-Yi Lee

With the power of large pretrained language models, various research works have integrated knowledge into dialogue systems.

Position

Paper
Add Code

Ensemble knowledge distillation of self-supervised speech models

no code implementations • 24 Feb 2023 • Kuan-Po Huang, Tzu-hsun Feng, Yu-Kuan Fu, Tsu-Yuan Hsu, Po-Chieh Yen, Wei-Cheng Tseng, Kai-Wei Chang, Hung-Yi Lee

We tried two different aggregation techniques, layerwise-average and layerwise-concatenation, to the representations of different teacher models and found that the former was more effective.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

SpeechPrompt v2: Prompt Tuning for Speech Classification Tasks

no code implementations • 1 Mar 2023 • Kai-Wei Chang, Yu-Kai Wang, Hua Shen, Iu-thing Kang, Wei-Cheng Tseng, Shang-Wen Li, Hung-Yi Lee

For speech processing, SpeechPrompt shows its high parameter efficiency and competitive performance on a few speech classification tasks.

Ranked #17 on Spoken Language Understanding on Fluent Speech Commands (using extra training data)

Classification Language Modelling +1

Paper
Add Code

Can Large Language Models Be an Alternative to Human Evaluations?

no code implementations • 3 May 2023 • Cheng-Han Chiang, Hung-Yi Lee

We show that the result of LLM evaluation is consistent with the results obtained by expert human evaluation: the texts rated higher by human experts are also rated higher by the LLMs.

Story Generation

Paper
Add Code

Improving Cascaded Unsupervised Speech Translation with Denoising Back-translation

no code implementations • 12 May 2023 • Yu-Kuan Fu, Liang-Hsuan Tseng, Jiatong Shi, Chen-An Li, Tsu-Yuan Hsu, Shinji Watanabe, Hung-Yi Lee

We use fully unpaired data to train our unsupervised systems and evaluate our results on CoVoST 2 and CVSS.

Denoising Machine Translation +1

Paper
Add Code

ML-SUPERB: Multilingual Speech Universal PERformance Benchmark

no code implementations • 18 May 2023 • Jiatong Shi, Dan Berrebbi, William Chen, Ho-Lam Chung, En-Pei Hu, Wei Ping Huang, Xuankai Chang, Shang-Wen Li, Abdelrahman Mohamed, Hung-Yi Lee, Shinji Watanabe

Speech processing Universal PERformance Benchmark (SUPERB) is a leaderboard to benchmark the performance of Self-Supervised Learning (SSL) models on various speech processing tasks.

Automatic Speech Recognition Language Identification +3

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.