1 code implementation • 8 Aug 2022 • Linh The Nguyen, Nguyen Luong Tran, Long Doan, Manh Luong, Dat Quoc Nguyen
In this paper, we introduce a high-quality and large-scale benchmark dataset for English-Vietnamese speech translation with 508 audio hours, consisting of 331K triplets of (sentence-lengthed audio, English source transcript sentence, Vietnamese target subtitle sentence).
1 code implementation • EMNLP 2021 • Long Doan, Linh The Nguyen, Nguyen Luong Tran, Thai Hoang, Dat Quoc Nguyen
We introduce a high-quality and large-scale Vietnamese-English parallel dataset of 3. 02M sentence pairs, which is 2. 9M pairs larger than the benchmark Vietnamese-English machine translation corpus IWSLT15.
2 code implementations • 20 Sep 2021 • Nguyen Luong Tran, Duong Minh Le, Dat Quoc Nguyen
We present BARTpho with two versions, BARTpho-syllable and BARTpho-word, which are the first public large-scale monolingual sequence-to-sequence models pre-trained for Vietnamese.
Ranked #3 on Abstractive Text Summarization on vietnews