no code implementations • TU (COLING) 2022 • Linh The Nguyen, Dat Quoc Nguyen
We present an empirical study investigating the influence of automatic speech recognition (ASR) errors on the spoken implicit discourse relation recognition (IDRR) task.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 6 Nov 2023 • Dat Quoc Nguyen, Linh The Nguyen, Chi Tran, Dung Ngoc Nguyen, Dinh Phung, Hung Bui
The base model, PhoGPT-4B, with exactly 3. 7B parameters, is pre-trained from scratch on a Vietnamese corpus of 102B tokens, with an 8192 context length, employing a vocabulary of 20480 token types.
2 code implementations • 31 May 2023 • Linh The Nguyen, Thinh Pham, Dat Quoc Nguyen
We present XPhoneBERT, the first multilingual model pre-trained to learn phoneme representations for the downstream text-to-speech (TTS) task.
1 code implementation • 8 Aug 2022 • Linh The Nguyen, Nguyen Luong Tran, Long Doan, Manh Luong, Dat Quoc Nguyen
In this paper, we introduce a high-quality and large-scale benchmark dataset for English-Vietnamese speech translation with 508 audio hours, consisting of 331K triplets of (sentence-lengthed audio, English source transcript sentence, Vietnamese target subtitle sentence).
1 code implementation • EMNLP 2021 • Long Doan, Linh The Nguyen, Nguyen Luong Tran, Thai Hoang, Dat Quoc Nguyen
We introduce a high-quality and large-scale Vietnamese-English parallel dataset of 3. 02M sentence pairs, which is 2. 9M pairs larger than the benchmark Vietnamese-English machine translation corpus IWSLT15.
1 code implementation • NAACL 2021 • Linh The Nguyen, Dat Quoc Nguyen
We present the first multi-task learning model -- named PhoNLP -- for joint Vietnamese part-of-speech (POS) tagging, named entity recognition (NER) and dependency parsing.
1 code implementation • EMNLP (WNUT) 2020 • Dat Quoc Nguyen, Thanh Vu, Afshin Rahimi, Mai Hoang Dao, Linh The Nguyen, Long Doan
In this paper, we provide an overview of the WNUT-2020 shared task on the identification of informative COVID-19 English Tweets.
no code implementations • ACL 2019 • Linh The Nguyen, Linh Van Ngo, Khoat Than, Thien Huu Nguyen
It has been shown that implicit connectives can be exploited to improve the performance of the models for implicit discourse relation recognition (IDRR).