Search Results for author: Linh The Nguyen

Found 8 papers, 6 papers with code

Investigating the Impact of ASR Errors on Spoken Implicit Discourse Relation Recognition

no code implementations • TU (COLING) 2022 • Linh The Nguyen, Dat Quoc Nguyen

We present an empirical study investigating the influence of automatic speech recognition (ASR) errors on the spoken implicit discourse relation recognition (IDRR) task.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

PhoGPT: Generative Pre-training for Vietnamese

1 code implementation • 6 Nov 2023 • Dat Quoc Nguyen, Linh The Nguyen, Chi Tran, Dung Ngoc Nguyen, Dinh Phung, Hung Bui

The base model, PhoGPT-4B, with exactly 3. 7B parameters, is pre-trained from scratch on a Vietnamese corpus of 102B tokens, with an 8192 context length, employing a vocabulary of 20480 token types.

Instruction Following

720

Paper
Code

XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech

2 code implementations • 31 May 2023 • Linh The Nguyen, Thinh Pham, Dat Quoc Nguyen

We present XPhoneBERT, the first multilingual model pre-trained to learn phoneme representations for the downstream text-to-speech (TTS) task.

4,039

Paper
Code

A High-Quality and Large-Scale Dataset for English-Vietnamese Speech Translation

1 code implementation • 8 Aug 2022 • Linh The Nguyen, Nguyen Luong Tran, Long Doan, Manh Luong, Dat Quoc Nguyen

In this paper, we introduce a high-quality and large-scale benchmark dataset for English-Vietnamese speech translation with 508 audio hours, consisting of 331K triplets of (sentence-lengthed audio, English source transcript sentence, Vietnamese target subtitle sentence).

Sentence Translation

Paper
Code

PhoMT: A High-Quality and Large-Scale Benchmark Dataset for Vietnamese-English Machine Translation

1 code implementation • EMNLP 2021 • Long Doan, Linh The Nguyen, Nguyen Luong Tran, Thai Hoang, Dat Quoc Nguyen

We introduce a high-quality and large-scale Vietnamese-English parallel dataset of 3. 02M sentence pairs, which is 2. 9M pairs larger than the benchmark Vietnamese-English machine translation corpus IWSLT15.

Denoising Machine Translation +2

Paper
Code

PhoNLP: A joint multi-task learning model for Vietnamese part-of-speech tagging, named entity recognition and dependency parsing

1 code implementation • NAACL 2021 • Linh The Nguyen, Dat Quoc Nguyen

We present the first multi-task learning model -- named PhoNLP -- for joint Vietnamese part-of-speech (POS) tagging, named entity recognition (NER) and dependency parsing.

Dependency Parsing Language Modelling +7

131

Paper
Code

WNUT-2020 Task 2: Identification of Informative COVID-19 English Tweets

1 code implementation • EMNLP (WNUT) 2020 • Dat Quoc Nguyen, Thanh Vu, Afshin Rahimi, Mai Hoang Dao, Linh The Nguyen, Long Doan

In this paper, we provide an overview of the WNUT-2020 shared task on the identification of informative COVID-19 English Tweets.

Task 2 Text Classification

Paper
Code

Employing the Correspondence of Relations and Connectives to Identify Implicit Discourse Relations via Label Embeddings

no code implementations • ACL 2019 • Linh The Nguyen, Linh Van Ngo, Khoat Than, Thien Huu Nguyen

It has been shown that implicit connectives can be exploited to improve the performance of the models for implicit discourse relation recognition (IDRR).

Multi-Task Learning

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.