no code implementations • ROCLING 2021 • Chin-Ying Wu, Yung-Chang Hsu, Berlin Chen
With the recent breakthrough of deep learning technologies, research on machine reading comprehension (MRC) has attracted much attention and found its versatile applications in many use cases.
no code implementations • ROCLING 2021 • Hsin-Wei Wang, Bi-Cheng Yan, Yung-Chang Hsu, Berlin Chen
In the first stage, the speech uttered by an L2 learner is processed by an end-to-end ASR module to produce N-best phone sequence hypotheses.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • ROCLING 2021 • You-Sheng Tsao, Tien-Hong Lo, Jiun-Ting Li, Shi-Yan Weng, Berlin Chen
With the widespread commercialization of smart devices, research on environmental sound classification has gained more and more attention in recent years.
no code implementations • ROCLING 2022 • Yi-Cheng Wang, Tzu-Ting Yang, Hsin-Wei Wang, Yung-Chang Hsu, Berlin Chen
DSI dramatically simplifies the whole retrieval process by encoding all information about the document collection into the parameter space of a single Transformer model, on top of which DSI can in turn generate the relevant document identities (IDs) in an autoregressive manner in response to a user query.
no code implementations • ROCLING 2022 • Tzu-I Wu, Tien-Hong Lo, Fu-An Chao, Yao-Ting Sung, Berlin Chen
Due to the surge in global demand for English as a second language (ESL), developments of automated methods for grading speaking proficiency have gained considerable attention.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 11 Sep 2024 • Jiun-Ting Li, Bi-Cheng Yan, Tien-Hong Lo, Yi-Cheng Wang, Yung-Chang Hsu, Berlin Chen
Automated speaking assessment in conversation tests (ASAC) aims to evaluate the overall speaking proficiency of an L2 (second-language) speaker in a setting where an interlocutor interacts with one or more candidates.
no code implementations • 11 Sep 2024 • Tien-Hong Lo, Meng-Ting Tsai, Berlin Chen
Second language (L2) learners can improve their pronunciation by imitating golden speech, especially when the speech that aligns with their respective speech characteristics.
no code implementations • 10 Sep 2024 • Yi-Cheng Wang, Li-Ting Pai, Bi-Cheng Yan, Hsin-Wei Wang, Chi-Han Lin, Berlin Chen
End-to-end (E2E) automatic speech recognition (ASR) models have become standard practice for various commercial applications.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • 3 Sep 2024 • Chien-Chun Wang, Li-Wei Chen, Hung-Shin Lee, Berlin Chen, Hsin-Min Wang
Cross-domain speech enhancement (SE) is often faced with severe challenges due to the scarcity of noise and background information in an unseen target domain, leading to a mismatch between training and test conditions.
no code implementations • 16 Jun 2024 • Chung-Wen Wu, Berlin Chen
To address this challenge, we approach ASA as an ordinal classification task, introducing Weighted Vectors Ranking Similarity (W-RankSim) as a novel regularization technique.
no code implementations • 5 Jun 2024 • Bi-Cheng Yan, Wei-Cheng Chao, Jiun-Ting Li, Yi-Cheng Wang, Hsin-Wei Wang, Meng-Shin Lin, Berlin Chen
Automatic pronunciation assessment (APA) manages to evaluate the pronunciation proficiency of a second language (L2) learner in a target language.
no code implementations • 11 Apr 2024 • Tien-Hong Lo, Fu-An Chao, Tzu-I Wu, Yao-Ting Sung, Berlin Chen
Automated speaking assessment (ASA) typically involves automatic speech recognition (ASR) and hand-crafted feature extraction from the ASR transcript of a learner's speech.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 26 Mar 2024 • Yi-Cheng Wang, Hsin-Wei Wang, Bi-Cheng Yan, Chi-Han Lin, Berlin Chen
End-to-end automatic speech recognition (E2E ASR) systems often suffer from mistranscription of domain-specific phrases, such as named entities, sometimes leading to catastrophic failures in downstream tasks.
no code implementations • 21 Mar 2024 • PeiYing Lee, HauYun Guo, Berlin Chen
End-to-End Neural Diarization with Encoder-Decoder based Attractor (EEND-EDA) is an end-to-end neural model for automatic speaker segmentation and labeling.
no code implementations • 27 Feb 2024 • Tzu-Ting Yang, Hsin-Wei Wang, Yi-Cheng Wang, Chi-Han Lin, Berlin Chen
With the massive developments of end-to-end (E2E) neural networks, recent years have witnessed unprecedented breakthroughs in automatic speech recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 15 Dec 2023 • Tzu-Ting Yang, Hsin-Wei Wang, Berlin Chen
In recent years, end-to-end speech recognition has emerged as a technology that integrates the acoustic, pronunciation dictionary, and language model components of the traditional Automatic Speech Recognition model.
no code implementations • 3 Oct 2023 • Bi-Cheng Yan, Hsin-Wei Wang, Yi-Cheng Wang, Jiun-Ting Li, Chi-Han Lin, Berlin Chen
Automatic pronunciation assessment (APA) manages to quantify the pronunciation proficiency of a second language (L2) learner in a language.
no code implementations • 4 Sep 2023 • Yi-Cheng Wang, Tzu-Ting Yang, Hsin-Wei Wang, Bi-Cheng Yan, Berlin Chen
Voice, as input, has progressively become popular on mobiles and seems to transcend almost entirely text input.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 29 May 2023 • Fu-An Chao, Tien-Hong Lo, Tzu-I Wu, Yao-Ting Sung, Berlin Chen
Automatic Pronunciation Assessment (APA) plays a vital role in Computer-assisted Pronunciation Training (CAPT) when evaluating a second language (L2) learner's speaking proficiency.
no code implementations • 2 Jul 2022 • Berlin Chen, Cyrus Mostajeran, Salem Said
We present a novel algorithm for learning the parameters of hidden Markov models (HMMs) in a geometric setting where the observations take values in Riemannian manifolds.
no code implementations • 5 Nov 2021 • Bi-Cheng Yan, Hsin-Wei Wang, Shih-Hsuan Chiu, Hsuan-Sheng Chiu, Berlin Chen
Conversational speech normally is embodied with loose syntactic structures at the utterance level but simultaneously exhibits topical coherence relations across consecutive utterances.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 1 Nov 2021 • Hsin-Wei Wang, Bi-Cheng Yan, Hsuan-Sheng Chiu, Yung-Chang Hsu, Berlin Chen
In addition, we design and develop a pronunciation modeling network stacked on top of the NAR E2E models of our method to further boost the effectiveness of MD&D.
no code implementations • 17 Oct 2021 • Tien-Hong Lo, Yao-Ting Sung, Berlin Chen
Recently, end-to-end (E2E) models, which allow to take spectral vector sequences of L2 (second-language) learners' utterances as input and produce the corresponding phone-level sequences as output, have attracted much research attention in developing mispronunciation detection (MD) systems.
no code implementations • 31 Aug 2021 • Bi-Cheng Yan, Shao-Wei Fan Jiang, Fu-An Chao, Berlin Chen
End-to-end (E2E) neural models are increasingly attracting attention as a promising modeling approach for mispronunciation detection and diagnosis (MDD).
no code implementations • 26 Aug 2021 • Fu-An Chao, Jeih-weih Hung, Berlin Chen
In recent decades, many studies have suggested that phase information is crucial for speech enhancement (SE), and time-domain single-channel speech enhancement techniques have shown promise in noise suppression and robust automatic speech recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 4 Jul 2021 • Fu-An Chao, Shao-Wei Fan Jiang, Bi-Cheng Yan, Jeih-weih Hung, Berlin Chen
Due to the unprecedented breakthroughs brought about by deep learning, speech enhancement (SE) techniques have been developed rapidly and play an important role prior to acoustic modeling to mitigate noise effects on speech.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 13 Jun 2021 • Shih-Hsuan Chiu, Tien-Hong Lo, Fu-An Chao, Berlin Chen
In view of this, we in this paper seek to represent the historical context information of an utterance as graph-structured data so as to distill cross-utterances, global word interaction relationships.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 11 Apr 2021 • Shih-Hsuan Chiu, Berlin Chen
More recently, Bidirectional Encoder Representations from Transformers (BERT) was proposed and has achieved impressive success on many natural language processing (NLP) tasks such as question answering and language understanding, due mainly to its effective pre-training then fine-tuning paradigm as well as strong local contextual modeling ability.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • IJCLCLP 2021 • Fu-An Chao, Tien-Hong Lo, Shi-Yan Weng, Shih-Hsuan Chiu, Yao-Ting Sung, Berlin Chen
This paper describes the NTNU ASR system participating in the Formosa Speech Recognition Challenge 2020 (FSR-2020) supported by the Formosa Speech in the Wild project (FSW).
no code implementations • 4 Mar 2021 • Bi-Cheng Yan, Berlin Chen
Furthermore, our model can achieve comparable mispronunciation detection performance in relation to state-of-the-art E2E MDD models that take input the standard handcrafted acoustic features.
no code implementations • 4 Nov 2020 • Yu-Sen Cheng, Chun-Liang Shih, Tien-Hong Lo, Wen-Ting Tseng, Berlin Chen
In this report, we describe our submission to the VoxCeleb Speaker Recognition Challenge (VoxSRC) 2020.
no code implementations • 27 Oct 2020 • Shi-Yan Weng, Berlin Chen
The attention-based encoder-decoder modeling paradigm has achieved promising results on a variety of speech processing tasks like automatic speech recognition (ASR), text-to-speech (TTS) and among others.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 27 Oct 2020 • Wen-Ting Tseng, Tien-Hong Lo, Yung-Chang Hsu, Berlin Chen
To this end, predominant approaches to FAQ retrieval typically rank question-answer pairs by considering either the similarity between the query and a question (q-Q), the relevance between the query and the associated answer of a question (q-A), or combining the clues gathered from the q-Q similarity measure and the q-A relevance measure.
no code implementations • 1 Jun 2020 • Shi-Yan Weng, Tien-Hong Lo, Berlin Chen
Tremendous amounts of multimedia associated with speech information are driving an urgent need to develop efficient and effective automatic summarization methods.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +5
no code implementations • 25 May 2020 • Bi-Cheng Yan, Meng-Che Wu, Hsiao-Tsung Hung, Berlin Chen
Mispronunciation detection and diagnosis (MDD) is a core component of computer-assisted pronunciation training (CAPT).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 18 May 2020 • Tien-Hong Lo, Shi-Yan Weng, Hsiu-jui Chang, Berlin Chen
Recently, end-to-end (E2E) automatic speech recognition (ASR) systems have garnered tremendous attention because of their great success and unified modeling paradigms in comparison to conventional hybrid DNN-HMM ASR systems.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 18 May 2020 • Tien-Hong Lo, Fu-An Chao, Shi-Yan Weng, Berlin Chen
This paper describes the NTNU ASR system participating in the Interspeech 2020 Non-Native Children's Speech ASR Challenge supported by the SIG-CHILD group of ISCA.
2 code implementations • ICLR 2019 • Ian Tenney, Patrick Xia, Berlin Chen, Alex Wang, Adam Poliak, R. Thomas McCoy, Najoung Kim, Benjamin Van Durme, Samuel R. Bowman, Dipanjan Das, Ellie Pavlick
The jiant toolkit for general-purpose text understanding models
no code implementations • ICLR 2019 • Samuel R. Bowman, Ellie Pavlick, Edouard Grave, Benjamin Van Durme, Alex Wang, Jan Hula, Patrick Xia, Raghavendra Pappagari, R. Thomas McCoy, Roma Patel, Najoung Kim, Ian Tenney, Yinghui Huang, Katherin Yu, Shuning Jin, Berlin Chen
Work on the problem of contextualized word representation—the development of reusable neural network components for sentence understanding—has recently seen a surge of progress centered on the unsupervised pretraining task of language modeling with methods like ELMo (Peters et al., 2018).
no code implementations • ACL 2019 • Alex Wang, Jan Hula, Patrick Xia, Raghavendra Pappagari, R. Thomas McCoy, Roma Patel, Najoung Kim, Ian Tenney, Yinghui Huang, Katherin Yu, Shuning Jin, Berlin Chen, Benjamin Van Durme, Edouard Grave, Ellie Pavlick, Samuel R. Bowman
Natural language understanding has recently seen a surge of progress with the use of sentence encoders like ELMo (Peters et al., 2018a) and BERT (Devlin et al., 2019) which are pretrained on variants of language modeling.
no code implementations • COLING 2016 • Kuan-Yu Chen, Shih-Hung Liu, Berlin Chen, Hsin-Min Wang
The D-EV model not only inherits the advantages of the EV model but also can infer a more robust representation for a given spoken paragraph against imperfect speech recognition.
no code implementations • 22 Jul 2016 • Kuan-Yu Chen, Shih-Hung Liu, Berlin Chen, Hsin-Min Wang, Hsin-Hsi Chen
Word embedding methods revolve around learning continuous distributed vector representations of words with neural networks, which can capture semantic and/or syntactic cues, and in turn be used to induce similarity measures among words, sentences and documents in context.
no code implementations • 20 Jan 2016 • Kuan-Yu Chen, Shih-Hung Liu, Berlin Chen, Hsin-Min Wang
In addition to MMR, there is only a dearth of research concentrating on reducing redundancy or increasing diversity for the spoken document summarization task, as far as we are aware.
no code implementations • ROCLINGIJCLCLP 2015 • Yao-Chi Hsu, Ming-Han Yang, Hsiao-Tsung Hung, Yuwen Hsiung, Yao-Ting Hung, Berlin Chen
no code implementations • 14 Jun 2015 • Kuan-Yu Chen, Shih-Hung Liu, Hsin-Min Wang, Berlin Chen, Hsin-Hsi Chen
Owing to the rapidly growing multimedia content available on the Internet, extractive spoken document summarization, with the purpose of automatically selecting a set of representative sentences from a spoken document to concisely express the most important theme of the document, has been an active area of research and experimentation.