1 code implementation • NAACL (ACL) 2022 • Hung-Yi Lee, Abdelrahman Mohamed, Shinji Watanabe, Tara Sainath, Karen Livescu, Shang-Wen Li, Shu-wen Yang, Katrin Kirchhoff
Due to the growing popularity of SSL, and the shared mission of the areas in bringing speech and language technologies to more use cases with better quality and scaling the technologies for under-represented languages, we propose this tutorial to systematically survey the latest SSL techniques, tools, datasets, and performance achievement in speech processing.
no code implementations • Findings (EMNLP) 2021 • Wei-Tsung Kao, Hung-Yi Lee
This paper investigates whether the power of the models pre-trained on text data, such as BERT, can be transferred to general token sequence classification applications.
no code implementations • 1 Jun 2023 • Zih-Ching Chen, Chao-Han Huck Yang, Bo Li, Yu Zhang, Nanxin Chen, Shou-Yiin Chang, Rohit Prabhavalkar, Hung-Yi Lee, Tara N. Sainath
In this work, we introduce a ``score-based assessment'' framework for estimating the transferability of pre-trained speech models (PSMs) for fine-tuning target tasks.
no code implementations • 30 May 2023 • Yu-Hsiang Wang, Huang-Yu Chen, Kai-Wei Chang, Winston Hsu, Hung-Yi Lee
In the meanwhile, the computational cost is reduced by 97% in regard to MACs (number of Multiply-ACcumulate operations) in the tasks we choose.
no code implementations • 29 May 2023 • Guan-Wei Wu, Guan-Ting Lin, Shang-Wen Li, Hung-Yi Lee
However, the absence of intermediate targets and training guidance for textless SLU often results in suboptimal performance.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 22 May 2023 • Haibin Wu, Jiawen Kang, Lingwei Meng, Helen Meng, Hung-Yi Lee
Automatic speaker verification (ASV) plays a critical role in security-sensitive environments.
no code implementations • 18 May 2023 • Jiatong Shi, Dan Berrebbi, William Chen, Ho-Lam Chung, En-Pei Hu, Wei Ping Huang, Xuankai Chang, Shang-Wen Li, Abdelrahman Mohamed, Hung-Yi Lee, Shinji Watanabe
Speech processing Universal PERformance Benchmark (SUPERB) is a leaderboard to benchmark the performance of Self-Supervised Learning (SSL) models on various speech processing tasks.
no code implementations • 12 May 2023 • Yu-Kuan Fu, Liang-Hsuan Tseng, Jiatong Shi, Chen-An Li, Tsu-Yuan Hsu, Shinji Watanabe, Hung-Yi Lee
We use fully unpaired data to train our unsupervised systems and evaluate our results on CoVoST 2 and CVSS.
no code implementations • 3 May 2023 • Cheng-Han Chiang, Hung-Yi Lee
We show that the result of LLM evaluation is consistent with the results obtained by expert human evaluation: the texts rated higher by human experts are also rated higher by the LLMs.
1 code implementation • 15 Mar 2023 • Yuan Tseng, Cheng-I Lai, Hung-Yi Lee
The goal is to determine the spoken sentences' hierarchical syntactic structure in the form of constituency parse trees, such that each node is a span of audio that corresponds to a constituent.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 1 Mar 2023 • Kai-Wei Chang, Yu-Kai Wang, Hua Shen, Iu-thing Kang, Wei-Cheng Tseng, Shang-Wen Li, Hung-Yi Lee
For speech processing, SpeechPrompt shows its high parameter efficiency and competitive performance on a few speech classification tasks.
no code implementations • 24 Feb 2023 • Kuan-Po Huang, Tzu-hsun Feng, Yu-Kuan Fu, Tsu-Yuan Hsu, Po-Chieh Yen, Wei-Cheng Tseng, Kai-Wei Chang, Hung-Yi Lee
We tried two different aggregation techniques, layerwise-average and layerwise-concatenation, to the representations of different teacher models and found that the former was more effective.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • 12 Feb 2023 • Hsuan Su, Shachi H Kumar, Sahisnu Mazumder, Wenda Chen, Ramesh Manuvinakurike, Eda Okur, Saurav Sahay, Lama Nachman, Shang-Tse Chen, Hung-Yi Lee
With the power of large pretrained language models, various research works have integrated knowledge into dialogue systems.
no code implementations • 30 Jan 2023 • Guan-Ting Liu, En-Pei Hu, Pu-Jen Cheng, Hung-Yi Lee, Shao-Hua Sun
Aiming to produce reinforcement learning (RL) policies that are human-interpretable and can generalize better to novel scenarios, Trivedi et al. (2021) present a method (LEAPS) that first learns a program embedding space to continuously parameterize diverse programs from a pre-generated program dataset, and then searches for a task-solving program in the learned program embedding space when given a task.
no code implementations • 20 Dec 2022 • Suwon Shon, Siddhant Arora, Chyi-Jiunn Lin, Ankita Pasad, Felix Wu, Roshan Sharma, Wei-Lun Wu, Hung-Yi Lee, Karen Livescu, Shinji Watanabe
In this work, we introduce several new annotated SLU benchmark tasks based on freely available speech data, which complement existing benchmarks and address gaps in the SLU evaluation landscape.
no code implementations • 2 Dec 2022 • Shih-Cheng Huang, Shih-Heng Wang, Min-Han Shih, Saurav Sahay, Hung-Yi Lee
To tackle these issues, we propose a general framework to enhance the few-shot adaptation and cross-domain generalization ability of parameter-efficient methods.
no code implementations • 1 Dec 2022 • Zih-Ching Chen, Yu-Shun Sung, Hung-Yi Lee
However, such efficient tuning techniques only provide adaptation at the transformer layer, but failed to perform adaptation at the feature extractor.
1 code implementation • 30 Nov 2022 • Dongji Gao, Jiatong Shi, Shun-Po Chuang, Leibny Paola Garcia, Hung-Yi Lee, Shinji Watanabe, Sanjeev Khudanpur
This paper describes the ESPnet Unsupervised ASR Open-source Toolkit (EURO), an end-to-end open-source toolkit for unsupervised automatic speech recognition (UASR).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 29 Nov 2022 • Tsu-Yuan Hsu, Chen-An Li, Tung-Yu Wu, Hung-Yi Lee
In the first stage, SSL is conducted on the large-scale unlabeled corpus to pre-train a small speech model.
no code implementations • 17 Nov 2022 • Tzu-Quan Lin, Hung-Yi Lee, Hao Tang
Self-supervised models have had great success in learning speech representations that can generalize to various downstream tasks.
no code implementations • 17 Nov 2022 • Tzu-Quan Lin, Tsung-Huan Yang, Chun-Yao Chang, Kuang-Ming Chen, Tzu-hsun Feng, Hung-Yi Lee, Hao Tang
Despite the success of Transformers in self-supervised learning with applications to various downstream tasks, the computational cost of training and inference remains a major challenge for applying these models to a wide spectrum of devices.
no code implementations • 15 Nov 2022 • Derek Xu, Shuyan Dong, Changhan Wang, Suyoun Kim, Zhaojiang Lin, Akshat Shrivastava, Shang-Wen Li, Liang-Hsuan Tseng, Alexei Baevski, Guan-Ting Lin, Hung-Yi Lee, Yizhou Sun, Wei Wang
Recent studies find existing self-supervised speech encoders contain primarily acoustic rather than semantic information.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+9
no code implementations • 6 Nov 2022 • Jiatong Shi, Chan-Jan Hsu, Holam Chung, Dongji Gao, Paola Garcia, Shinji Watanabe, Ann Lee, Hung-Yi Lee
To be specific, we propose to use unsupervised automatic speech recognition (ASR) as a connector that bridges different modalities used in speech and textual pre-trained models.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 4 Nov 2022 • Hsuan-Jui Chen, Yen Meng, Hung-Yi Lee
The sequence length along the time axis is often the dominant factor of the computation in speech processing.
no code implementations • 2 Nov 2022 • Layne Berry, Yi-Jen Shih, Hsuan-Fu Wang, Heng-Jui Chang, Hung-Yi Lee, David Harwath
This work investigates the use of large-scale, English-only pre-trained models (CLIP and HuBERT) for multilingual image-speech retrieval.
1 code implementation • 1 Nov 2022 • Chan-Jan Hsu, Ho-Lam Chung, Hung-Yi Lee, Yu Tsao
In Spoken language understanding (SLU), a natural solution is concatenating pre-trained speech models (e. g. HuBERT) and pretrained language models (PLM, e. g. T5).
no code implementations • 27 Oct 2022 • Xuanjun Chen, Haibin Wu, Chung-Che Wang, Hung-Yi Lee, Jyh-Shing Roger Jang
MTD loss enables MTDVocaLiST model to deeply mimic the cross-attention distribution and value-relation in the Transformer of VocaLiST.
no code implementations • 16 Oct 2022 • Tzu-hsun Feng, Annie Dong, Ching-Feng Yeh, Shu-wen Yang, Tzu-Quan Lin, Jiatong Shi, Kai-Wei Chang, Zili Huang, Haibin Wu, Xuankai Chang, Shinji Watanabe, Abdelrahman Mohamed, Shang-Wen Li, Hung-Yi Lee
We present the SUPERB challenge at SLT 2022, which aims at learning self-supervised speech representation for better performance, generalization, and efficiency.
1 code implementation • 14 Oct 2022 • Kuan-Po Huang, Yu-Kuan Fu, Tsu-Yuan Hsu, Fabian Ritter Gutierrez, Fan-Lin Wang, Liang-Hsuan Tseng, Yu Zhang, Hung-Yi Lee
Self-supervised learned (SSL) speech pre-trained models perform well across various speech processing tasks.
1 code implementation • 13 Oct 2022 • Guan-Ting Lin, Chi-Luen Feng, Wei-Ping Huang, Yuan Tseng, Tzu-Han Lin, Chen-An Li, Hung-Yi Lee, Nigel G. Ward
We find that 13 of the 15 SSL models outperformed the baseline on all the prosody-related tasks.
no code implementations • 13 Oct 2022 • Yen Meng, Hsuan-Jui Chen, Jiatong Shi, Shinji Watanabe, Paola Garcia, Hung-Yi Lee, Hao Tang
Subsampling while training self-supervised models not only improves the overall performance on downstream tasks under certain frame rates, but also brings significant speed-up in inference.
no code implementations • 10 Oct 2022 • Zih-Ching Chen, Chin-Lun Fu, Chih-Ying Liu, Shang-Wen Li, Hung-Yi Lee
In downstream tasks, the parameters of SSL models are frozen, and only the adapters are trained.
no code implementations • 6 Oct 2022 • Cheng-Han Chiang, Hung-Yi Lee
In this paper, we explore the following question: Are synonym substitution attacks really synonym substitution attacks (SSAs)?
1 code implementation • 3 Oct 2022 • Yi-Jen Shih, Hsuan-Fu Wang, Heng-Jui Chang, Layne Berry, Hung-Yi Lee, David Harwath
Data-driven speech processing models usually perform well with a large amount of text supervision, but collecting transcribed speech data is costly.
no code implementations • 3 Oct 2022 • Xuanjun Chen, Haibin Wu, Helen Meng, Hung-Yi Lee, Jyh-Shing Roger Jang
Audio-visual active speaker detection (AVASD) is well-developed, and now is an indispensable front-end for several multi-modal applications.
Adversarial Robustness
Audio-Visual Active Speaker Detection
1 code implementation • 26 Sep 2022 • Tung-Yu Wu, Chen-An Li, Tzu-Han Lin, Tsu-Yuan Hsu, Hung-Yi Lee
Extensive experiments on speech and non-speech audio datasets are conducted to investigate the representation abilities of our ensemble method and its single constituent model.
no code implementations • 29 Jul 2022 • Da-Rong Liu, Po-chun Hsu, Yi-Chen Chen, Sung-Feng Huang, Shun-Po Chuang, Da-Yi Wu, Hung-Yi Lee
GAN training is adopted in the first stage to find the mapping relationship between unpaired speech and phone sequence.
no code implementations • 27 Jun 2022 • Wei-Ping Huang, Po-Chun Chen, Sung-Feng Huang, Hung-Yi Lee
This paper studies a transferable phoneme embedding framework that aims to deal with the cross-lingual text-to-speech (TTS) problem under the few-shot setting.
no code implementations • 18 Jun 2022 • Haibin Wu, Jiawen Kang, Lingwei Meng, Yang Zhang, Xixin Wu, Zhiyong Wu, Hung-Yi Lee, Helen Meng
However, previous works show that state-of-the-art ASV models are seriously vulnerable to voice spoofing attacks, and the recently proposed high-performance spoofing countermeasure (CM) models only focus solely on the standalone anti-spoofing tasks, and ignore the subsequent speaker verification process.
no code implementations • 8 Jun 2022 • Hsuan Su, PoHan Chi, Shih-Cheng Huang, Chung Ho Lam, Saurav Sahay, Shang-Tse Chen, Hung-Yi Lee
Much literature has shown that prompt-based learning is an efficient method to make use of the large pre-trained language model.
no code implementations • 30 May 2022 • Dennis Y. Menn, Tzu-hsun Feng, Hung-Yi Lee
Neural networks have demonstrated state-of-the-art performance in various machine learning fields.
no code implementations • 24 May 2022 • Chi-Liang Liu, Hung-Yi Lee, Wen-tau Yih
We propose structured prompt tuning, a simple and effective method to improve prompt tuning.
no code implementations • 21 May 2022 • Abdelrahman Mohamed, Hung-Yi Lee, Lasse Borgholt, Jakob D. Havtorn, Joakim Edin, Christian Igel, Katrin Kirchhoff, Shang-Wen Li, Karen Livescu, Lars Maaløe, Tara N. Sainath, Shinji Watanabe
Although self-supervised speech representation is still a nascent research area, it is closely related to acoustic word embedding and learning with zero lexical resources, both of which have seen active research for many years.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 8 May 2022 • Chi-Luen Feng, Po-chun Hsu, Hung-Yi Lee
We found that HuBERT stores speaker information in representations whose positions correspond to silences in a waveform.
no code implementations • NAACL 2022 • Hung-Yi Lee, Shang-Wen Li, Ngoc Thang Vu
Deep learning has been the mainstream technique in natural language processing (NLP) area.
1 code implementation • Findings (NAACL) 2022 • Chin-Lun Fu, Zih-Ching Chen, Yun-Ru Lee, Hung-Yi Lee
Transformer-based pre-trained models with millions of parameters require large storage.
no code implementations • ACL 2022 • Chan-Jan Hsu, Hung-Yi Lee, Yu Tsao
Transformer-based models are widely used in natural language understanding (NLU) tasks, and multimodal transformers have been effective in visual-language tasks.
no code implementations • 10 Apr 2022 • Cheng-Han Chiang, Hung-Yi Lee
Our results reveal that the annotation quality is highly subject to the workers' qualification, and workers can be guided to provide certain annotations by the instructions.
no code implementations • 9 Apr 2022 • Cheng-Han Chiang, Hung-Yi Lee
Based on our observations, we propose a simple method to separate ID, OOD, and Adv samples using the hidden representations and output probabilities of the model.
no code implementations • 7 Apr 2022 • Wei-Cheng Tseng, Wei-Tsung Kao, Hung-Yi Lee
Mean opinion score (MOS) is a typical subjective evaluation metric for speech synthesis systems.
1 code implementation • 1 Apr 2022 • Fan-Lin Wang, Po-chun Hsu, Da-Rong Liu, Hung-Yi Lee
Most recent speech synthesis systems are composed of a synthesizer and a vocoder.
no code implementations • 1 Apr 2022 • Wei-Tsung Kao, Yuan-Kuei Wu, Chia-Ping Chen, Zhi-Sheng Chen, Yu-Pao Tsai, Hung-Yi Lee
User-defined keyword spotting is a task to detect new spoken terms defined by users.
1 code implementation • 31 Mar 2022 • Kai-Wei Chang, Wei-Cheng Tseng, Shang-Wen Li, Hung-Yi Lee
We report in this paper the first exploration of the prompt tuning paradigm for speech processing tasks based on Generative Spoken Language Model (GSLM).
no code implementations • 30 Mar 2022 • Kuan Po Huang, Yu-Kuan Fu, Yu Zhang, Hung-Yi Lee
Speech distortions are a long-standing problem that degrades the performance of supervisely trained speech processing models.
no code implementations • 29 Mar 2022 • Haibin Wu, Lingwei Meng, Jiawen Kang, Jinchao Li, Xu Li, Xixin Wu, Hung-Yi Lee, Helen Meng
In the second-level fusion, the CM score and ASV scores directly from ASV systems will be concatenated into a prediction block for the final decision.
1 code implementation • 27 Mar 2022 • Guan-Ting Lin, Shang-Wen Li, Hung-Yi Lee
Although deep learning-based end-to-end Automatic Speech Recognition (ASR) has shown remarkable performance in recent years, it suffers severe performance regression on test samples drawn from different data distributions.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
1 code implementation • 22 Mar 2022 • Chih-Chiang Chang, Hung-Yi Lee
Simultaneous speech translation (SimulST) is a challenging task aiming to translate streaming speech before the complete input is observed.
1 code implementation • ACL 2022 • Hsiang-Sheng Tsai, Heng-Jui Chang, Wen-Chin Huang, Zili Huang, Kushal Lakhotia, Shu-wen Yang, Shuyan Dong, Andy T. Liu, Cheng-I Jeff Lai, Jiatong Shi, Xuankai Chang, Phil Hall, Hsuan-Jui Chen, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-Yi Lee
In this paper, we introduce SUPERB-SG, a new benchmark focused on evaluating the semantic and generative capabilities of pre-trained models by increasing task diversity and difficulty over SUPERB.
1 code implementation • 9 Mar 2022 • Guan-Ting Lin, Yung-Sung Chuang, Ho-Lam Chung, Shu-wen Yang, Hsuan-Jui Chen, Shuyan Dong, Shang-Wen Li, Abdelrahman Mohamed, Hung-Yi Lee, Lin-shan Lee
We empirically showed that DUAL yields results comparable to those obtained by cascading ASR and text QA model and robust to real-world data.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 14 Feb 2022 • Haibin Wu, Heng-Cheng Kuo, Naijun Zheng, Kuo-Hsuan Hung, Hung-Yi Lee, Yu Tsao, Hsin-Min Wang, Helen Meng
Also ADD 2022 is the first challenge to propose the partially fake audio detection task.
1 code implementation • IWSLT (ACL) 2022 • Chih-Chiang Chang, Shun-Po Chuang, Hung-Yi Lee
Existing methods increase latency or introduce adaptive read-write policies for SimulMT models to handle local reordering and improve translation quality.
1 code implementation • 9 Nov 2021 • Wei-Cheng Tseng, Wei-Tsung Kao, Hung-Yi Lee
Recently, adapting the idea of self-supervised learning (SSL) on continuous speech has started gaining attention.
no code implementations • 8 Nov 2021 • Haibin Wu, Bo Zheng, Xu Li, Xixin Wu, Hung-Yi Lee, Helen Meng
As the paradigm of the self-supervised learning upstream model followed by downstream tasks arouses more attention in the speech community, characterizing the adversarial robustness of such paradigm is of high priority.
1 code implementation • 7 Nov 2021 • Sung-Feng Huang, Chyi-Jiunn Lin, Da-Rong Liu, Yi-Chen Chen, Hung-Yi Lee
On the one hand, speaker adaptation methods fine-tune a trained multi-speaker text-to-speech (TTS) model with few enrolled samples.
no code implementations • 18 Oct 2021 • Yi-Chen Chen, Shu-wen Yang, Cheng-Kuang Lee, Simon See, Hung-Yi Lee
It has been shown that an SSL pretraining model can achieve excellent performance in various downstream tasks of speech processing.
no code implementations • 15 Oct 2021 • Yen Meng, Yi-Hui Chou, Andy T. Liu, Hung-Yi Lee
Self-supervised Speech Models (S3Ms) have been proven successful in many speech downstream tasks, like ASR.
no code implementations • 14 Oct 2021 • Chien-yu Huang, Kai-Wei Chang, Hung-Yi Lee
However, in real-world scenarios, it is difficult to collect clean utterances of a speaker, and they are usually degraded by noises or reverberations.
1 code implementation • 12 Oct 2021 • Wen-Chin Huang, Shu-wen Yang, Tomoki Hayashi, Hung-Yi Lee, Shinji Watanabe, Tomoki Toda
In this work, we provide a series of in-depth analyses by benchmarking on the two tasks in VCC2020, namely intra-/cross-lingual any-to-one (A2O) VC, as well as an any-to-any (A2A) setting.
no code implementations • 9 Oct 2021 • Xuankai Chang, Takashi Maekaku, Pengcheng Guo, Jing Shi, Yen-Ju Lu, Aswin Shanmugam Subramanian, Tianzi Wang, Shu-wen Yang, Yu Tsao, Hung-Yi Lee, Shinji Watanabe
We select several pretrained speech representations and present the experimental results on various open-source and publicly available corpora for E2E-ASR.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 8 Oct 2021 • Jiun-Hao Jhan, Chao-Peng Liu, Shyh-Kang Jeng, Hung-Yi Lee
Apart from the coherence and fluency of responses, an empathetic chatbot emphasizes more on people's feelings.
no code implementations • 7 Oct 2021 • Guan-Ting Lin, Chan-Jan Hsu, Da-Rong Liu, Hung-Yi Lee, Yu Tsao
In this work, we further analyze the training robustness of unsupervised ASR on the domain mismatch scenarios in which the domains of unpaired speech and text are different.
no code implementations • 7 Oct 2021 • Liang-Hsuan Tseng, Yu-Kuan Fu, Heng-Jui Chang, Hung-Yi Lee
Code-switching (CS) is common in daily conversations where more than one language is used within a sentence.
1 code implementation • 5 Oct 2021 • Heng-Jui Chang, Shu-wen Yang, Hung-Yi Lee
Self-supervised speech representation learning methods like wav2vec 2. 0 and Hidden-unit BERT (HuBERT) leverage unlabeled speech data for pre-training and offer good representations for numerous speech processing tasks.
1 code implementation • 8 Sep 2021 • Cheng-Han Chiang, Hung-Yi Lee
In this work, we study what specific traits in the pre-training data, other than the semantics, make a pre-trained LM superior to their counterparts trained from scratch on downstream tasks.
no code implementations • 4 Aug 2021 • Zih-Yun Chiu, Yi-Lin Tuan, Hung-Yi Lee, Li-Chen Fu
For reinforcement learning (RL), it is challenging for an agent to master a task that requires a specific series of actions due to sparse rewards.
no code implementations • ACL 2021 • Hung-Yi Lee, Ngoc Thang Vu, Shang-Wen Li
Meta-learning is one of the most important new techniques in machine learning in recent years.
1 code implementation • 1 Jul 2021 • Haibin Wu, Po-chun Hsu, Ji Gao, Shanshan Zhang, Shen Huang, Jian Kang, Zhiyong Wu, Helen Meng, Hung-Yi Lee
We also show that the neural vocoder adopted in the detection framework is dataset-independent.
1 code implementation • 15 Jun 2021 • Haibin Wu, Yang Zhang, Zhiyong Wu, Dong Wang, Hung-Yi Lee
Automatic speaker verification (ASV) is a well developed technology for biometric identification, and has been ubiquitous implemented in security-critic applications, such as banking and access control.
1 code implementation • ACL (WOAH) 2021 • Yung-Sung Chuang, Mingye Gao, Hongyin Luo, James Glass, Hung-Yi Lee, Yun-Nung Chen, Shang-Wen Li
Automatic detection of toxic language plays an essential role in protecting social media users, especially minority groups, from verbal abuse.
no code implementations • 1 Jun 2021 • Haibin Wu, Xu Li, Andy T. Liu, Zhiyong Wu, Helen Meng, Hung-Yi Lee
This work is among the first to perform adversarial defense for ASV without knowing the specific attack algorithms.
1 code implementation • Findings (ACL) 2021 • Shun-Po Chuang, Yung-Sung Chuang, Chih-Chiang Chang, Hung-Yi Lee
We study the possibilities of building a non-autoregressive speech-to-text translation model using connectionist temporal classification (CTC), and use CTC-based automatic speech recognition as an auxiliary task to improve the performance.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 8 May 2021 • Wei-Cheng Huang, Chien-yu Huang, Hung-Yi Lee
Substantial improvements have been made in machine reading comprehension, where the machine answers questions based on a given context.
1 code implementation • 7 May 2021 • Yi-Chen Chen, Po-Han Chi, Shu-wen Yang, Kai-Wei Chang, Jheng-Hao Lin, Sung-Feng Huang, Da-Rong Liu, Chi-Liang Liu, Cheng-Kuang Lee, Hung-Yi Lee
The multi-task learning of a wide variety of speech processing tasks with a universal model has not been studied.
4 code implementations • 3 May 2021 • Shu-wen Yang, Po-Han Chi, Yung-Sung Chuang, Cheng-I Jeff Lai, Kushal Lakhotia, Yist Y. Lin, Andy T. Liu, Jiatong Shi, Xuankai Chang, Guan-Ting Lin, Tzu-Hsien Huang, Wei-Cheng Tseng, Ko-tik Lee, Da-Rong Liu, Zili Huang, Shuyan Dong, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-Yi Lee
SUPERB is a leaderboard to benchmark the performance of a shared model across a wide range of speech processing tasks with minimal architecture changes and labeled data.
6 code implementations • 7 Apr 2021 • Wei-Cheng Tseng, Chien-yu Huang, Wei-Tsung Kao, Yist Y. Lin, Hung-Yi Lee
In this paper, we use self-supervised pre-trained models for MOS prediction.
3 code implementations • 7 Apr 2021 • Jheng-Hao Lin, Yist Y. Lin, Chung-Ming Chien, Hung-Yi Lee
AUTOVC used dvector to extract speaker information, and self-supervised learning (SSL) features like wav2vec 2. 0 is used in FragmentVC to extract the phonetic content information.
no code implementations • 6 Apr 2021 • Shun-Po Chuang, Heng-Jui Chang, Sung-Feng Huang, Hung-Yi Lee
Mandarin-English code-switching (CS) is frequently used among East and Southeast Asian people.
no code implementations • 4 Apr 2021 • Heng-Jui Chang, Hung-Yi Lee, Lin-shan Lee
We can collect new data describing the new environment and fine-tune the system, but this naturally leads to higher error rates for the earlier datasets, referred to as catastrophic forgetting.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
1 code implementation • 31 Mar 2021 • Jingsong Wang, Yuxuan He, Chunyu Zhao, Qijie Shao, Wei-Wei Tu, Tom Ko, Hung-Yi Lee, Lei Xie
Auto-KWS 2021 challenge calls for automated machine learning (AutoML) solutions to automate the process of applying machine learning to a customized keyword spotting task.
no code implementations • NAACL 2021 • Hsuan Su, Jiun-Hao Jhan, Fan-Yun Sun, Saurav Sahay, Hung-Yi Lee
Our framework includes a guiding chatbot and an interlocutor model that plays the role of humans.
no code implementations • 12 Mar 2021 • Wei-Tsung Kao, Hung-Yi Lee
This paper investigates whether the power of the models pre-trained on text data, such as BERT, can be transferred to general token sequence classification applications.
1 code implementation • 6 Mar 2021 • Chung-Ming Chien, Jheng-Hao Lin, Chien-yu Huang, Po-chun Hsu, Hung-Yi Lee
The few-shot multi-speaker multi-style voice cloning task is to synthesize utterances with voice and speaking style similar to a reference speaker given only a few reference samples.
no code implementations • 14 Feb 2021 • Haibin Wu, Xu Li, Andy T. Liu, Zhiyong Wu, Helen Meng, Hung-Yi Lee
Automatic speaker verification (ASV) is one of the core technologies in biometric identification.
no code implementations • 22 Dec 2020 • Cheng-Han Chiang, Hung-Yi Lee
In this paper, we study how the intrinsic nature of pre-training data contributes to the fine-tuned downstream performance.
1 code implementation • NeurIPS 2020 • Chun-Hsing Lin, Siang-Ruei Wu, Hung-Yi Lee, Yun-Nung Chen
Score function-based natural language generation (NLG) approaches such as REINFORCE, in general, suffer from low sample efficiency and training instability problems.
1 code implementation • 27 Nov 2020 • Chun-Hsing Lin, Siang-Ruei Wu, Hung-Yi Lee, Yun-Nung Chen
Score function-based natural language generation (NLG) approaches such as REINFORCE, in general, suffer from low sample efficiency and training instability problems.
no code implementations • 24 Nov 2020 • Tzu-Hsien Huang, Jheng-Hao Lin, Chien-yu Huang, Hung-Yi Lee
Voice conversion technologies have been greatly improved in recent years with the help of deep learning, but their capabilities of producing natural sounding utterances in different conditions remain unclear.
1 code implementation • 12 Nov 2020 • Chung-Ming Chien, Hung-Yi Lee
Prosody modeling is an essential component in modern text-to-speech (TTS) frameworks.
1 code implementation • 31 Oct 2020 • Yen-Hao Chen, Da-Yi Wu, Tsung-Han Wu, Hung-Yi Lee
With a proper activation as an information bottleneck on content embeddings, the trade-off between the synthesis quality and the speaker similarity of the converted speech is improved drastically.
Audio and Speech Processing Sound
1 code implementation • 29 Oct 2020 • Sung-Feng Huang, Shun-Po Chuang, Da-Rong Liu, Yi-Chen Chen, Gene-Ping Yang, Hung-Yi Lee
Speech separation has been well developed, with the very successful permutation invariant training (PIT) approach, although the frequent label assignment switching happening during PIT training remains to be a problem when better convergence speed and achievable performance are desired.
Ranked #4 on
Speech Separation
on Libri2Mix
(using extra training data)
2 code implementations • 27 Oct 2020 • Yist Y. Lin, Chung-Ming Chien, Jheng-Hao Lin, Hung-Yi Lee, Lin-shan Lee
Any-to-any voice conversion aims to convert the voice from and to any speakers even unseen during training, which is much more challenging compared to one-to-one or many-to-many tasks, but much more attractive in real-world scenarios.
1 code implementation • 26 Oct 2020 • Cheng-I Lai, Yung-Sung Chuang, Hung-Yi Lee, Shang-Wen Li, James Glass
Much recent work on Spoken Language Understanding (SLU) is limited in at least one of three ways: models were trained on oracle text input and neglected ASR errors, models were trained to predict only intents without the slot values, or models were trained on a large amount of in-house data.
2 code implementations • EMNLP (MRQA) 2021 • Chi-Liang Liu, Hung-Yi Lee
In this paper, we study the possibility of almost unsupervised Multiple Choices Question Answering (MCQA).
1 code implementation • 20 Oct 2020 • Chi-Liang Liu, Tsung-Yuan Hsu, Yung-Sung Chuang, Chung-Yi Li, Hung-Yi Lee
Token embeddings in multilingual BERT (m-BERT) contain both language and semantic information.
no code implementations • 20 Oct 2020 • Chi-Liang Liu, Tsung-Yuan Hsu, Yung-Sung Chuang, Hung-Yi Lee
Recently, multilingual BERT works remarkably well on cross-lingual transfer tasks, superior to static non-contextualized word embeddings.
1 code implementation • EMNLP 2020 • Cheng-Han Chiang, Sung-Feng Huang, Hung-Yi Lee
These findings suggest that knowledge of a pretrained model varies during pretraining, and having more pretrain steps does not necessarily provide a model with more comprehensive knowledge.
6 code implementations • 12 Jul 2020 • Andy T. Liu, Shang-Wen Li, Hung-Yi Lee
We present a large-scale comparison of various self-supervised models.
no code implementations • 11 Jul 2020 • Hung-Yi Lee, Cheng-Hao Ho, Chien-Fu Lin, Chiung-Chih Chang, Chih-Wei Lee, Yau-Shian Wang, Tsung-Yuan Hsu, Kuan-Yu Chen
Conventional seq2seq chatbot models attempt only to find sentences with the highest probabilities conditioned on the input sequences, without considering the sentiment of the output sentences.
no code implementations • 9 Jun 2020 • Tsung-Han Wu, Chun-Chen Hsieh, Yen-Hao Chen, Po-Han Chi, Hung-Yi Lee
In this paper, we seek solutions for reducing the computation complexity of transformer-based models for speech representation learning.
1 code implementation • 7 Jun 2020 • Da-Yi Wu, Yen-Hao Chen, Hung-Yi Lee
Voice conversion (VC) is a task that transforms the source speaker's timbre, accent, and tones in audio into another one's while preserving the linguistic content.
2 code implementations • 5 Jun 2020 • Shu-wen Yang, Andy T. Liu, Hung-Yi Lee
Self-supervised Audio Transformers (SAT) enable great success in many downstream speech applications like ASR, but how they work has not been widely explored yet.
5 code implementations • 5 Jun 2020 • Haibin Wu, Andy T. Liu, Hung-Yi Lee
To explore this issue, we proposed to employ Mockingjay, a self-supervised learning based model, to protect anti-spoofing models against adversarial attacks in the black-box scenario.
no code implementations • ACL 2020 • Shun-Po Chuang, Tzu-Wei Sung, Alexander H. Liu, Hung-Yi Lee
Speech translation (ST) aims to learn transformations from speech in the source language to the text in the target language.
3 code implementations • 18 May 2020 • Po-Han Chi, Pei-Hung Chung, Tsung-Han Wu, Chun-Cheng Hsieh, Yen-Hao Chen, Shang-Wen Li, Hung-Yi Lee
We use the representations with two downstream tasks, speaker identification, and phoneme classification.
1 code implementation • 18 May 2020 • Chien-yu Huang, Yist Y. Lin, Hung-Yi Lee, Lin-shan Lee
We introduce human imperceptible noise into the utterances of a speaker whose voice is to be defended.
no code implementations • 16 May 2020 • Tao Tu, Yuan-Jui Chen, Alexander H. Liu, Hung-Yi Lee
The experiment results demonstrate that with only an hour of paired speech data, no matter the paired data is from multiple speakers or a single speaker, the proposed model can generate intelligible speech in different voices.
1 code implementation • 15 May 2020 • Po-chun Hsu, Hung-Yi Lee
As we design a flow-based model that is heavily compressed, the proposed model requires much less computational resources compared to other waveform generation models during both training and inference time; even though the model is highly compressed, the post-filter maintains the quality of generated waveform.
Speech Synthesis
Text-To-Speech Synthesis
Audio and Speech Processing
Sound
no code implementations • 13 May 2020 • Yi-Chen Chen, Jui-Yang Hsu, Cheng-Kuang Lee, Hung-Yi Lee
In order to examine the generalizability of DARTS-ASR, we apply our approach not only on many languages to perform monolingual ASR, but also on a multilingual ASR setting.
no code implementations • 5 May 2020 • Heng-Jui Chang, Alexander H. Liu, Hung-Yi Lee, Lin-shan Lee
Whispering is an important mode of human speech, but no end-to-end recognition results for it were reported yet, probably due to the scarcity of available whispered speech data.
no code implementations • 28 Apr 2020 • Yau-Shian Wang, Hung-Yi Lee, Yun-Nung Chen
Also, the performance is on par with a recently proposed weakly-supervised text classification method.
no code implementations • 20 Apr 2020 • Chi-Liang Liu, Tsung-Yuan Hsu, Yung-Sung Chuang, Hung-Yi Lee
Recently, multilingual BERT works remarkably well on cross-lingual transfer tasks, superior to static non-contextualized word embeddings.
no code implementations • 6 Mar 2020 • Haibin Wu, Songxiang Liu, Helen Meng, Hung-Yi Lee
Various forefront countermeasure methods for automatic speaker verification (ASV) with considerable performance in anti-spoofing are proposed in the ASVspoof 2019 challenge.
no code implementations • 25 Jan 2020 • Wei-Tsung Kao, Tsung-Han Wu, Po-Han Chi, Chun-Cheng Hsieh, Hung-Yi Lee
Although Bidirectional Encoder Representations from Transformers (BERT) have achieved tremendous success in many natural language processing (NLP) tasks, it remains a black box.
no code implementations • 9 Dec 2019 • Chao-I Tuan, Yuan-Kuei Wu, Hung-Yi Lee, Yu Tsao
Our experimental results first confirmed the robustness of our MiTAS on two types of perturbations in mixed audio.
no code implementations • 5 Dec 2019 • Po-chun Hsu, Chun-hsuan Wang, Andy T. Liu, Hung-Yi Lee
We found out that the speaker variety is much more important for achieving a universal vocoder than the language.
1 code implementation • 29 Nov 2019 • Bo-Wen Chen, Yen-Min Hsu, Hung-Yi Lee
According to these discoveries, we pose two questions: what is the value of randomly weighted networks in difficult generative audio tasks such as audio source separation and does such positive correlation still exist when it comes to large random networks and their trained counterparts?
no code implementations • 14 Nov 2019 • Shun-Po Chuang, Tzu-Wei Sung, Hung-Yi Lee
A lack of code-switching data complicates the training of code-switching (CS) language models.
1 code implementation • 4 Nov 2019 • Chung-Yi Li, Pei-Chieh Yuan, Hung-Yi Lee
End-to-end speech recognition systems have achieved competitive results compared to traditional systems.
no code implementations • 28 Oct 2019 • Alexander H. Liu, Tao Tu, Hung-Yi Lee, Lin-shan Lee
In this paper we propose a Sequential Representation Quantization AutoEncoder (SeqRQ-AE) to learn from primarily unpaired audio data and produce sequences of representations very close to phoneme sequences of speech utterances.
1 code implementation • 28 Oct 2019 • Alexander H. Liu, Tzu-Wei Sung, Shun-Po Chuang, Hung-Yi Lee, Lin-shan Lee
This allows the decoder to consider the semantic consistency during decoding by absorbing the information carried by the transformed decoder feature, which is learned to be close to the target word embedding.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
1 code implementation • 28 Oct 2019 • Gene-Ping Yang, Szu-Lin Wu, Yao-Wen Mao, Hung-Yi Lee, Lin-shan Lee
Permutation Invariant Training (PIT) has long been a stepping stone method for training speech separation model in handling the label ambiguity problem.
Ranked #17 on
Speech Separation
on WSJ0-2mix
no code implementations • 26 Oct 2019 • Jui-Yang Hsu, Yuan-Jui Chen, Hung-Yi Lee
In this paper, we proposed to apply meta learning approach for low-resource automatic speech recognition (ASR).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 25 Oct 2019 • Yung-Sung Chuang, Chi-Liang Liu, Hung-Yi Lee, Lin-shan Lee
In addition to the potential of end-to-end SQA, the SpeechBERT can also be considered for many other spoken language understanding tasks just as BERT for many text processing tasks.
Ranked #3 on
Spoken Language Understanding
on Spoken-SQuAD
7 code implementations • 25 Oct 2019 • Andy T. Liu, Shu-wen Yang, Po-Han Chi, Po-chun Hsu, Hung-Yi Lee
We present Mockingjay as a new speech representation learning approach, where bidirectional Transformer encoders are pre-trained on a large amount of unlabeled speech.
1 code implementation • 19 Oct 2019 • Songxiang Liu, Haibin Wu, Hung-Yi Lee, Helen Meng
High-performance spoofing countermeasure systems for automatic speaker verification (ASV) have been proposed in the ASVspoof 2019 challenge.
1 code implementation • IJCNLP 2019 • Yi-Lin Tuan, Yun-Nung Chen, Hung-Yi Lee
This paper proposes a new task about how to apply dynamic knowledge graphs in neural conversation model and presents a novel TV series conversation corpus (DyKgChat) for the task.
no code implementations • IJCNLP 2019 • Tsung-Yuan Hsu, Chi-Liang Liu, Hung-Yi Lee
Because it is not feasible to collect training data for every language, there is a growing interest in cross-lingual transfer learning.
3 code implementations • IJCNLP 2019 • Yau-Shian Wang, Hung-Yi Lee, Yun-Nung Chen
This paper proposes Tree Transformer, which adds an extra constraint to attention heads of the bidirectional Transformer encoder in order to encourage the attention heads to follow tree structures.
1 code implementation • 8 Sep 2019 • Che-Ping Tsai, Hung-Yi Lee
In this paper, we propose a new framework for MLC which does not rely on a predefined label order and thus alleviates exposure bias.
1 code implementation • ICLR 2020 • Fan-Keng Sun, Cheng-Hao Ho, Hung-Yi Lee
We present LAMOL, a simple yet effective method for lifelong language learning (LLL) based on language modeling.
Ranked #4 on
Continual Learning
on ASC (19 tasks)
no code implementations • IJCNLP 2019 • Hongren Mao, Hung-Yi Lee
Paraphrase generation is an interesting and challenging NLP task which has numerous practical applications.
no code implementations • 13 Jul 2019 • Chia-Hsuan Lee, Hung-Yi Lee
In this paper, we explore the problem of cross-lingual transfer learning for QA, where a source language task with plentiful annotations is utilized to improve the performance of a QA model on a target language task with limited available annotations.
1 code implementation • 28 May 2019 • Andy T. Liu, Po-chun Hsu, Hung-Yi Lee
We found that the proposed encoding method offers automatic extraction of speech content from speaker style, and is sufficient to cover full linguistic content in a given language.
1 code implementation • 16 Apr 2019 • Chia-Hsuan Lee, Yun-Nung Chen, Hung-Yi Lee
Spoken question answering (SQA) is challenging due to complex reasoning on top of the spoken documents.
Ranked #3 on
Spoken Language Understanding
on Spoken-SQuAD
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
1 code implementation • 16 Apr 2019 • Gene-Ping Yang, Chao-I Tuan, Hung-Yi Lee, Lin-shan Lee
Substantial effort has been reported based on approaches over spectrogram, which is well known as the standard time-and-frequency cross-domain representation for speech signals.
Ranked #19 on
Speech Separation
on WSJ0-2mix
no code implementations • 13 Apr 2019 • Tao Tu, Yuan-Jui Chen, Cheng-chieh Yeh, Hung-Yi Lee
In this paper, we aim to build TTS systems for such low-resource (target) languages where only very limited paired data are available.
10 code implementations • 10 Apr 2019 • Ju-chieh Chou, Cheng-chieh Yeh, Hung-Yi Lee
Recently, voice conversion (VC) without parallel data has been successfully adapted to multi-target scenario in which a single model is trained to convert the input voice to many different speakers.
no code implementations • 10 Apr 2019 • Yi-Chen Chen, Sung-Feng Huang, Hung-Yi Lee, Lin-shan Lee
However, we note human babies start to learn the language by the sounds (or phonetic structures) of a small number of exemplar words, and "generalize" such knowledge to other words without hearing a large amount of data.
no code implementations • 8 Apr 2019 • Kuan-Yu Chen, Che-Ping Tsai, Da-Rong Liu, Hung-Yi Lee, Lin-shan Lee
Producing a large annotated speech corpus for training ASR systems remains difficult for more than 95% of languages all over the world which are low-resourced, but collecting a relatively big unlabeled data set for such languages is more achievable.
no code implementations • 12 Nov 2018 • Che-Ping Tsai, Hung-Yi Lee
The discriminator learns to model label dependency by discriminating real and generated label sets.
no code implementations • 7 Nov 2018 • Sung-Feng Huang, Yi-Chen Chen, Hung-Yi Lee, Lin-shan Lee
Embedding audio signal segments into vectors with fixed dimensionality is attractive because all following processing will be easier and more efficient, for example modeling, classifying or indexing.
2 code implementations • 6 Nov 2018 • Ching-Ting Chang, Shun-Po Chuang, Hung-Yi Lee
To mitigate the issue without expensive human annotation, we proposed an unsupervised method for code-switching data augmentation.
no code implementations • 2 Nov 2018 • Alexander H. Liu, Hung-Yi Lee, Lin-shan Lee
In this paper we proposed a novel Adversarial Training (AT) approach for end-to-end speech recognition using a Criticizing Language Model (CLM).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
1 code implementation • 30 Oct 2018 • Li-Wei Chen, Hung-Yi Lee, Yu Tsao
This paper focuses on using voice conversion (VC) to improve the speech intelligibility of surgical patients who have had parts of their articulators removed.
no code implementations • 30 Oct 2018 • Yi-Chen Chen, Chia-Hao Shen, Sung-Feng Huang, Hung-Yi Lee, Lin-shan Lee
This can be learned by aligning a small number of spoken words and the corresponding text words in the embedding spaces.
1 code implementation • EMNLP 2018 • Yau-Shian Wang, Hung-Yi Lee
The generator encodes the input text into a shorter word sequence, and the reconstructor recovers the generator input from the generator output.
no code implementations • 27 Sep 2018 • Yau-Shian Wang, Yun-Nung Chen, Hung-Yi Lee
Learning discrete representations of data and then generating data from the discovered representations have been increasingly studied because the obtained discrete representations can benefit unsupervised learning.
4 code implementations • 12 Sep 2018 • Shun-Yao Shih, Fan-Keng Sun, Hung-Yi Lee
To obtain accurate prediction, it is crucial to model long-term dependency in time series data, which can be achieved to some good extent by recurrent neural network (RNN) with attention mechanism.
Ranked #4 on
Univariate Time Series Forecasting
on Electricity
Multivariate Time Series Forecasting
Univariate Time Series Forecasting
no code implementations • 24 Aug 2018 • Yi-Lin Tuan, Jinzhi Zhang, Yujia Li, Hung-Yi Lee
In sequence generation task, many works use policy gradient for model optimization to tackle the intractable backpropagation issue when maximizing the non-differentiable evaluation metrics or fooling the discriminator in adversarial learning.
1 code implementation • 16 Aug 2018 • Yi-Lin Tuan, Hung-Yi Lee
To stabilize the training of SeqGAN, Monte Carlo tree search (MCTS) or reward at every generation step (REGS) is used to evaluate the goodness of a generated subsequence.
no code implementations • 13 Aug 2018 • Chia-Hung Wan, Shun-Po Chuang, Hung-Yi Lee
Humans can imagine a scene from a sound.
1 code implementation • 9 Aug 2018 • Cheng-chieh Yeh, Po-chun Hsu, Ju-chieh Chou, Hung-Yi Lee, Lin-shan Lee
In this way, the length constraint mentioned above is removed to offer rhythm-flexible voice conversion without requiring parallel data.
Sound Audio and Speech Processing
1 code implementation • 7 Aug 2018 • Chia-Hsuan Lee, Shang-Ming Wang, Huan-Cheng Chang, Hung-Yi Lee
Reading comprehension by machine has been widely studied, but machine comprehension of spoken content is still a less investigated problem.
no code implementations • 7 Aug 2018 • Yu-Hsuan Wang, Hung-Yi Lee, Lin-shan Lee
In this paper, we extend audio Word2Vec from word-level to utterance-level by proposing a new segmental audio Word2Vec, in which unsupervised spoken word boundary segmentation and audio Word2Vec are jointly learned and mutually enhanced, so an utterance can be directly represented as a sequence of vectors carrying phonetic structure information.
no code implementations • 21 Jul 2018 • Yi-Chen Chen, Sung-Feng Huang, Chia-Hao Shen, Hung-Yi Lee, Lin-shan Lee
Stage 1 performs phonetic embedding with speaker characteristics disentangled.
1 code implementation • 19 Jul 2018 • Chien-Feng Liao, Yu Tsao, Hung-Yi Lee, Hsin-Min Wang
The proposed noise adaptive SE system contains an encoder-decoder-based enhancement model and a domain discriminator model.
Sound Audio and Speech Processing
4 code implementations • 9 Apr 2018 • Ju-chieh Chou, Cheng-chieh Yeh, Hung-Yi Lee, Lin-shan Lee
The decoder then takes the speaker-independent latent representation and the target speaker embedding as the input to generate the voice of the target speaker with the linguistic content of the source utterance.
no code implementations • 7 Apr 2018 • Chih-Wei Lee, Yau-Shian Wang, Tsung-Yuan Hsu, Kuan-Yu Chen, Hung-Yi Lee, Lin-shan Lee
Conventional seq2seq chatbot models only try to find the sentences with the highest probabilities conditioned on the input sequences, without considering the sentiment of the output sentences.
no code implementations • 1 Apr 2018 • Pei-Hung Chung, Kuan Tung, Ching-Lun Tai, Hung-Yi Lee
User-machine interaction is crucial for information retrieval, especially for spoken content retrieval, because spoken content is difficult to browse, and speech recognition has a high degree of uncertainty.
1 code implementation • 1 Apr 2018 • Chia-Hsuan Li, Szu-Lin Wu, Chi-Liang Liu, Hung-Yi Lee
Reading comprehension has been widely studied.
Ranked #4 on
Spoken Language Understanding
on Spoken-SQuAD
no code implementations • 1 Apr 2018 • Da-Rong Liu, Kuan-Yu Chen, Hung-Yi Lee, Lin-shan Lee
Unsupervised discovery of acoustic tokens from audio corpora without annotation and learning vector representations for these tokens have been widely studied.