Search Results for author: Xuan-Phi Nguyen

Found 15 papers, 7 papers with code

SeaLLMs -- Large Language Models for Southeast Asia

1 code implementation1 Dec 2023 Xuan-Phi Nguyen, Wenxuan Zhang, Xin Li, Mahani Aljunied, Qingyu Tan, Liying Cheng, Guanzheng Chen, Yue Deng, Sen yang, Chaoqun Liu, Hang Zhang, Lidong Bing

Despite the remarkable achievements of large language models (LLMs) in various tasks, there remains a linguistic bias that favors high-resource languages, such as English, often at the expense of low-resource and regional languages.

Instruction Following

Large Language Models are Not Yet Human-Level Evaluators for Abstractive Summarization

1 code implementation22 May 2023 Chenhui Shen, Liying Cheng, Xuan-Phi Nguyen, Yang You, Lidong Bing

With the recent undeniable advancement in reasoning abilities in large language models (LLMs) like ChatGPT and GPT-4, there is a growing trend for using LLMs on various tasks.

Abstractive Text Summarization

A Hierarchical Encoding-Decoding Scheme for Abstractive Multi-document Summarization

1 code implementation15 May 2023 Chenhui Shen, Liying Cheng, Xuan-Phi Nguyen, Yang You, Lidong Bing

Pre-trained language models (PLMs) have achieved outstanding achievements in abstractive single-document summarization (SDS).

Document Summarization Multi-Document Summarization

Improving Speech-to-Speech Translation Through Unlabeled Text

no code implementations26 Oct 2022 Xuan-Phi Nguyen, Sravya Popuri, Changhan Wang, Yun Tang, Ilia Kulikov, Hongyu Gong

Direct speech-to-speech translation (S2ST) is among the most challenging problems in the translation paradigm due to the significant scarcity of S2ST data.

Machine Translation speech-recognition +3

Refining Low-Resource Unsupervised Translation by Language Disentanglement of Multilingual Model

1 code implementation31 May 2022 Xuan-Phi Nguyen, Shafiq Joty, Wu Kui, Ai Ti Aw

Numerous recent work on unsupervised machine translation (UMT) implies that competent unsupervised translations of low-resource and unrelated languages, such as Nepali or Sinhala, are only possible if the model is trained in a massive multilingual environment, where these low-resource languages are mixed with high-resource counterparts.

Disentanglement Translation +1

Contrastive Clustering to Mine Pseudo Parallel Data for Unsupervised Translation

no code implementations ICLR 2022 Xuan-Phi Nguyen, Hongyu Gong, Yun Tang, Changhan Wang, Philipp Koehn, Shafiq Joty

Modern unsupervised machine translation systems mostly train their models by generating synthetic parallel training data from large unlabeled monolingual corpora of different languages through various means, such as iterative back-translation.

Clustering Translation +1

A Conditional Splitting Framework for Efficient Constituency Parsing

no code implementations ACL 2021 Thanh-Tung Nguyen, Xuan-Phi Nguyen, Shafiq Joty, XiaoLi Li

We introduce a generic seq2seq parsing framework that casts constituency parsing problems (syntactic and discourse parsing) into a series of conditional splitting decisions.

Constituency Parsing Discourse Segmentation +1

RST Parsing from Scratch

1 code implementation NAACL 2021 Thanh-Tung Nguyen, Xuan-Phi Nguyen, Shafiq Joty, XiaoLi Li

We introduce a novel top-down end-to-end formulation of document-level discourse parsing in the Rhetorical Structure Theory (RST) framework.

Discourse Segmentation Segmentation

Efficient Constituency Parsing by Pointing

no code implementations ACL 2020 Thanh-Tung Nguyen, Xuan-Phi Nguyen, Shafiq Joty, Xiao-Li Li

We propose a novel constituency parsing model that casts the parsing problem into a series of pointing tasks.

Constituency Parsing

Cross-model Back-translated Distillation for Unsupervised Machine Translation

1 code implementation3 Jun 2020 Xuan-Phi Nguyen, Shafiq Joty, Thanh-Tung Nguyen, Wu Kui, Ai Ti Aw

Recent unsupervised machine translation (UMT) systems usually employ three main principles: initialization, language modeling and iterative back-translation, though they may apply them differently.

Denoising Language Modelling +2

Tree-structured Attention with Hierarchical Accumulation

no code implementations ICLR 2020 Xuan-Phi Nguyen, Shafiq Joty, Steven C. H. Hoi, Richard Socher

Incorporating hierarchical structures like constituency trees has been shown to be effective for various natural language processing (NLP) tasks.

text-classification Text Classification +1

Data Diversification: A Simple Strategy For Neural Machine Translation

2 code implementations NeurIPS 2020 Xuan-Phi Nguyen, Shafiq Joty, Wu Kui, Ai Ti Aw

Our method achieves state-of-the-art BLEU scores of 30. 7 and 43. 7 in the WMT'14 English-German and English-French translation tasks, respectively.

Knowledge Distillation Machine Translation +2

Enhancing Attention with Explicit Phrasal Alignments

no code implementations25 Sep 2019 Xuan-Phi Nguyen, Shafiq Joty, Thanh-Tung Nguyen

The attention mechanism is an indispensable component of any state-of-the-art neural machine translation system.

Language Modelling Machine Translation +1

Cannot find the paper you are looking for? You can Submit a new open access paper.