Search Results for author: Linh The Nguyen

Found 8 papers, 6 papers with code

Investigating the Impact of ASR Errors on Spoken Implicit Discourse Relation Recognition

no code implementations TU (COLING) 2022 Linh The Nguyen, Dat Quoc Nguyen

We present an empirical study investigating the influence of automatic speech recognition (ASR) errors on the spoken implicit discourse relation recognition (IDRR) task.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

PhoGPT: Generative Pre-training for Vietnamese

1 code implementation6 Nov 2023 Dat Quoc Nguyen, Linh The Nguyen, Chi Tran, Dung Ngoc Nguyen, Dinh Phung, Hung Bui

The base model, PhoGPT-4B, with exactly 3. 7B parameters, is pre-trained from scratch on a Vietnamese corpus of 102B tokens, with an 8192 context length, employing a vocabulary of 20480 token types.

Instruction Following

XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech

2 code implementations31 May 2023 Linh The Nguyen, Thinh Pham, Dat Quoc Nguyen

We present XPhoneBERT, the first multilingual model pre-trained to learn phoneme representations for the downstream text-to-speech (TTS) task.

A High-Quality and Large-Scale Dataset for English-Vietnamese Speech Translation

1 code implementation8 Aug 2022 Linh The Nguyen, Nguyen Luong Tran, Long Doan, Manh Luong, Dat Quoc Nguyen

In this paper, we introduce a high-quality and large-scale benchmark dataset for English-Vietnamese speech translation with 508 audio hours, consisting of 331K triplets of (sentence-lengthed audio, English source transcript sentence, Vietnamese target subtitle sentence).

Sentence Translation

PhoMT: A High-Quality and Large-Scale Benchmark Dataset for Vietnamese-English Machine Translation

1 code implementation EMNLP 2021 Long Doan, Linh The Nguyen, Nguyen Luong Tran, Thai Hoang, Dat Quoc Nguyen

We introduce a high-quality and large-scale Vietnamese-English parallel dataset of 3. 02M sentence pairs, which is 2. 9M pairs larger than the benchmark Vietnamese-English machine translation corpus IWSLT15.

Denoising Machine Translation +2

PhoNLP: A joint multi-task learning model for Vietnamese part-of-speech tagging, named entity recognition and dependency parsing

1 code implementation NAACL 2021 Linh The Nguyen, Dat Quoc Nguyen

We present the first multi-task learning model -- named PhoNLP -- for joint Vietnamese part-of-speech (POS) tagging, named entity recognition (NER) and dependency parsing.

Dependency Parsing Language Modelling +7

Employing the Correspondence of Relations and Connectives to Identify Implicit Discourse Relations via Label Embeddings

no code implementations ACL 2019 Linh The Nguyen, Linh Van Ngo, Khoat Than, Thien Huu Nguyen

It has been shown that implicit connectives can be exploited to improve the performance of the models for implicit discourse relation recognition (IDRR).

Multi-Task Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.