1 code implementation • COLING (WNUT) 2022 • Mai Dao, Thinh Hung Truong, Dat Quoc Nguyen
In this paper, we present the first empirical study for Vietnamese disfluency detection.
no code implementations • TU (COLING) 2022 • Linh The Nguyen, Dat Quoc Nguyen
We present an empirical study investigating the influence of automatic speech recognition (ASR) errors on the spoken implicit discourse relation recognition (IDRR) task.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
1 code implementation • 21 Oct 2024 • Quang Hieu Pham, Hoang Ngo, Anh Tuan Luu, Dat Quoc Nguyen
To analyze how current large language models (LLMs) align with our recommendation, we introduce WhoQA, a public benchmark dataset to examine model's behavior in knowledge conflict situations.
1 code implementation • 21 May 2024 • Hoang Ngo, Dat Quoc Nguyen
We present the first domain-adapted and fully-trained large language model, RecGPT-7B, and its instruction-following variant, RecGPT-7B-Instruct, for text-based recommendation.
no code implementations • 28 Mar 2024 • Nhu Vo, Dat Quoc Nguyen, Dung D. Le, Massimo Piccardi, Wray Buntine
Machine translation for Vietnamese-English in the medical domain is still an under-explored research area.
1 code implementation • 27 Mar 2024 • Thanh-Thien Le, Linh The Nguyen, Dat Quoc Nguyen
We introduce PhoWhisper in five versions for Vietnamese automatic speech recognition.
1 code implementation • 14 Dec 2023 • Thinh Pham, Dat Quoc Nguyen
JPIS incorporates the supporting profile information into its encoder and introduces a slot-to-intent attention mechanism to transfer slot information representations to intent detection.
1 code implementation • 10 Dec 2023 • Thinh Pham, Chi Tran, Dat Quoc Nguyen
The research study of detecting multiple intents and filling slots is becoming more popular because of its relevance to complicated real-world situations.
Ranked #1 on
Slot Filling
on MixATIS
1 code implementation • 6 Nov 2023 • Dat Quoc Nguyen, Linh The Nguyen, Chi Tran, Dung Ngoc Nguyen, Dinh Phung, Hung Bui
The base model, PhoGPT-4B, with exactly 3. 7B parameters, is pre-trained from scratch on a Vietnamese corpus of 102B tokens, with an 8192 context length, employing a vocabulary of 20480 token types.
2 code implementations • 31 May 2023 • Linh The Nguyen, Thinh Pham, Dat Quoc Nguyen
We present XPhoneBERT, the first multilingual model pre-trained to learn phoneme representations for the downstream text-to-speech (TTS) task.
1 code implementation • 17 Oct 2022 • Vinh Tong, Dat Quoc Nguyen, Trung Thanh Huynh, Tam Thanh Nguyen, Quoc Viet Hung Nguyen, Mathias Niepert
The proposed model combines two components that jointly accomplish KG completion and alignment.
Ranked #1 on
Knowledge Graph Completion
on DPB-5L (French)
1 code implementation • 17 Sep 2022 • Mai Hoang Dao, Thinh Hung Truong, Dat Quoc Nguyen
We present the first empirical study investigating the influence of disfluency detection on downstream tasks of intent detection and slot filling.
1 code implementation • 8 Aug 2022 • Linh The Nguyen, Nguyen Luong Tran, Long Doan, Manh Luong, Dat Quoc Nguyen
In this paper, we introduce a high-quality and large-scale benchmark dataset for English-Vietnamese speech translation with 508 audio hours, consisting of 331K triplets of (sentence-lengthed audio, English source transcript sentence, Vietnamese target subtitle sentence).
1 code implementation • 16 Dec 2021 • Vinh Tong, Dai Quoc Nguyen, Dinh Phung, Dat Quoc Nguyen
WGE also constructs another single undirected graph from relation-focused constraints, which views entities and relations as nodes.
1 code implementation • EMNLP 2021 • Long Doan, Linh The Nguyen, Nguyen Luong Tran, Thai Hoang, Dat Quoc Nguyen
We introduce a high-quality and large-scale Vietnamese-English parallel dataset of 3. 02M sentence pairs, which is 2. 9M pairs larger than the benchmark Vietnamese-English machine translation corpus IWSLT15.
3 code implementations • 20 Sep 2021 • Nguyen Luong Tran, Duong Minh Le, Dat Quoc Nguyen
We present BARTpho with two versions, BARTpho-syllable and BARTpho-word, which are the first public large-scale monolingual sequence-to-sequence models pre-trained for Vietnamese.
Ranked #4 on
Abstractive Text Summarization
on vietnews
1 code implementation • 15 Apr 2021 • Dai Quoc Nguyen, Vinh Tong, Dinh Phung, Dat Quoc Nguyen
We introduce a novel embedding model, named NoGE, which aims to integrate co-occurrence among entities and relations into graph neural networks to improve knowledge graph completion (i. e., link prediction).
1 code implementation • NAACL 2021 • Thinh Hung Truong, Mai Hoang Dao, Dat Quoc Nguyen
The current COVID-19 pandemic has lead to the creation of many corpora that facilitate NLP research and downstream applications to help fight the pandemic.
1 code implementation • 5 Apr 2021 • Mai Hoang Dao, Thinh Hung Truong, Dat Quoc Nguyen
Intent detection and slot filling are important tasks in spoken and natural language understanding.
Ranked #2 on
Intent Classification and Slot Filling
on ATIS (vi)
1 code implementation • NAACL 2021 • Linh The Nguyen, Dat Quoc Nguyen
We present the first multi-task learning model -- named PhoNLP -- for joint Vietnamese part-of-speech (POS) tagging, named entity recognition (NER) and dependency parsing.
1 code implementation • EMNLP (WNUT) 2020 • Dat Quoc Nguyen, Thanh Vu, Afshin Rahimi, Mai Hoang Dao, Linh The Nguyen, Long Doan
In this paper, we provide an overview of the WNUT-2020 shared task on the identification of informative COVID-19 English Tweets.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Anh Tuan Nguyen, Mai Hoang Dao, Dat Quoc Nguyen
We compare the two baselines with key configurations and find that: automatic Vietnamese word segmentation improves the parsing results of both baselines; the normalized pointwise mutual information (NPMI) score (Bouma, 2009) is useful for schema linking; latent syntactic features extracted from a neural dependency parser for Vietnamese also improve the results; and the monolingual language model PhoBERT for Vietnamese (Nguyen and Nguyen, 2020) helps produce higher performances than the recent best multilingual language model XLM-R (Conneau et al., 2020).
no code implementations • 1 Sep 2020 • Mai Hoang Dao, Dat Quoc Nguyen
This paper describes our VinAI system for the ChEMU task 1 of named entity recognition (NER) in chemical reactions.
2 code implementations • 13 Jul 2020 • Thanh Vu, Dat Quoc Nguyen, Anthony Nguyen
In this paper, we propose a new label attention model for automatic ICD coding, which can handle both the various lengths and the interdependence of the ICD code related text fragments.
Ranked #7 on
Medical Code Prediction
on MIMIC-III
3 code implementations • EMNLP 2020 • Dat Quoc Nguyen, Thanh Vu, Anh Tuan Nguyen
We present BERTweet, the first public large-scale pre-trained language model for English Tweets.
Ranked #1 on
Sentiment Analysis
on TweetEval
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Dat Quoc Nguyen, Anh Tuan Nguyen
We present PhoBERT with two versions, PhoBERT-base and PhoBERT-large, the first public large-scale monolingual language models pre-trained for Vietnamese.
no code implementations • 26 Nov 2019 • Dai Quoc Nguyen, Dat Quoc Nguyen, Son Bao Pham
This paper introduces a Vietnamese text-based conversational agent architecture on specific knowledge domain which is integrated in a question answering system.
no code implementations • 26 Nov 2019 • Dai Quoc Nguyen, Dat Quoc Nguyen, Son Bao Pham
Question answering systems aim to produce exact answers to users' questions instead of a list of related documents as used by current search engines.
no code implementations • 26 Nov 2019 • Tien-Thanh Vu, Dat Quoc Nguyen
A price information retrieval (IR) system allows users to search and view differences among prices of specific products.
no code implementations • 26 Nov 2019 • Dat Quoc Nguyen, Dai Quoc Nguyen, Son Bao Pham, The Duy Bui
Search engines have become an indispensable tool for browsing information on the Internet.
1 code implementation • 12 Nov 2019 • Dai Quoc Nguyen, Tu Dinh Nguyen, Dat Quoc Nguyen, Dinh Phung
In this paper, we focus on learning low-dimensional embeddings for nodes in graph-structured data.
Ranked #55 on
Node Classification
on Pubmed
1 code implementation • WS 2019 • Zenan Zhai, Dat Quoc Nguyen, Saber A. Akhondi, Camilo Thorne, Christian Druckenbrodt, Trevor Cohn, Michelle Gregory, Karin Verspoor
In this paper, we explore the NER performance of a BiLSTM-CRF model utilising pre-trained word embeddings, character-level word representations and contextualized ELMo word representations for chemical patents.
no code implementations • ALTA 2019 • Hiyori Yoshikawa, Dat Quoc Nguyen, Zenan Zhai, Christian Druckenbrodt, Camilo Thorne, Saber A. Akhondi, Timothy Baldwin, Karin Verspoor
Extracting chemical reactions from patents is a crucial task for chemists working on chemical exploration.
no code implementations • ALTA 2019 • Dat Quoc Nguyen
We propose the first multi-task learning model for joint Vietnamese word segmentation, part-of-speech (POS) tagging and dependency parsing.
1 code implementation • 29 Dec 2018 • Dat Quoc Nguyen, Karin Verspoor
We propose a neural network model for joint extraction of named entities and relations between them, without any hand-crafted features.
Ranked #5 on
Relation Extraction
on CoNLL04
no code implementations • TACL 2015 • Dat Quoc Nguyen, Richard Billingsley, Lan Du, Mark Johnson
Probabilistic topic models are widely used to discover latent topics in document collections, while latent feature vector representations of words have been used to obtain high performance in many NLP tasks.
no code implementations • WS 2018 • Zenan Zhai, Dat Quoc Nguyen, Karin Verspoor
We compare the use of LSTM-based and CNN-based character-level word embeddings in BiLSTM-CRF models to approach chemical and disease named entity recognition (NER) tasks.
2 code implementations • NAACL 2019 • Dai Quoc Nguyen, Thanh Vu, Tu Dinh Nguyen, Dat Quoc Nguyen, Dinh Phung
In this paper, we introduce an embedding model, named CapsE, exploring a capsule network to model relationship triples (subject, relation, object).
Ranked #42 on
Link Prediction
on WN18RR
2 code implementations • 11 Aug 2018 • Dat Quoc Nguyen, Karin Verspoor
Results: We perform an empirical study comparing state-of-the-art traditional feature-based and neural network-based models for two core natural language processing tasks of part-of-speech (POS) tagging and dependency parsing on two benchmark biomedical corpora, GENIA and CRAFT.
Ranked #1 on
Dependency Parsing
on GENIA - LAS
1 code implementation • 11 Aug 2018 • Dat Quoc Nguyen
In this technical report, we present jLDADMM---an easy-to-use Java toolkit for conventional topic models.
1 code implementation • CONLL 2018 • Dat Quoc Nguyen, Karin Verspoor
We propose a novel neural network model for joint part-of-speech (POS) tagging and dependency parsing.
Ranked #15 on
Dependency Parsing
on Penn Treebank
no code implementations • WS 2018 • Dat Quoc Nguyen, Karin Verspoor
We investigate the incorporation of character-based word representations into a standard CNN-based relation extraction model.
1 code implementation • SEMEVAL 2018 • Thanh Vu, Dat Quoc Nguyen, Xuan-Son Vu, Dai Quoc Nguyen, Michael Catt, Michael Trenell
This paper describes our NIHRIO system for SemEval-2018 Task 3 "Irony detection in English tweets".
2 code implementations • NAACL 2018 • Thanh Vu, Dat Quoc Nguyen, Dai Quoc Nguyen, Mark Dras, Mark Johnson
We present an easy-to-use and fast toolkit, namely VnCoreNLP---a Java NLP annotation pipeline for Vietnamese.
2 code implementations • NAACL 2018 • Dai Quoc Nguyen, Tu Dinh Nguyen, Dat Quoc Nguyen, Dinh Phung
This 3-column matrix is then fed to a convolution layer where multiple filters are operated on the matrix to generate different feature maps.
Ranked #60 on
Link Prediction
on WN18RR
1 code implementation • ALTA 2017 • Dat Quoc Nguyen, Thanh Vu, Dai Quoc Nguyen, Mark Dras, Mark Johnson
This paper presents an empirical comparison of two strategies for Vietnamese Part-of-Speech (POS) tagging from unsegmented text: (i) a pipeline strategy where we consider the output of a word segmenter as the input of a POS tagger, and (ii) a joint strategy where we predict a combined segmentation and POS tag for each syllable.
1 code implementation • LREC 2018 • Dat Quoc Nguyen, Dai Quoc Nguyen, Thanh Vu, Mark Dras, Mark Johnson
We propose a novel approach to Vietnamese word segmentation.
1 code implementation • IJCNLP 2017 • Dai Quoc Nguyen, Dat Quoc Nguyen, Cuong Xuan Chu, Stefan Thater, Manfred Pinkal
This paper presents an approach to the task of predicting an event description from a preceding sentence in a text.
no code implementations • SEMEVAL 2017 • Dai Quoc Nguyen, Dat Quoc Nguyen, Ashutosh Modi, Stefan Thater, Manfred Pinkal
Our model generalizes the previous works in that it allows to induce different weights of different senses of a word.
1 code implementation • CONLL 2017 • Dat Quoc Nguyen, Mark Dras, Mark Johnson
We present a novel neural network model that learns POS tagging and graph-based dependency parsing jointly.
Ranked #5 on
Part-Of-Speech Tagging
on UD
2 code implementations • COLING (TextGraphs) 2020 • Dat Quoc Nguyen
Knowledge graphs (KGs) of real-world facts about entities and their relationships are useful resources for a variety of natural language processing tasks.
1 code implementation • 12 Dec 2016 • Thanh Vu, Dat Quoc Nguyen, Mark Johnson, Dawei Song, Alistair Willis
Recent research has shown that the performance of search personalization depends on the richness of user profiles which normally represent the user's topical interests.
no code implementations • ALTA 2016 • Dat Quoc Nguyen, Mark Dras, Mark Johnson
This paper presents an empirical comparison of different dependency parsers for Vietnamese, which has some unusual characteristics such as copula drop and verb serialization.
1 code implementation • NAACL 2016 • Dat Quoc Nguyen, Kairit Sirts, Lizhen Qu, Mark Johnson
Knowledge bases of real-world facts about entities and their relationships are useful resources for a variety of natural language processing tasks.
no code implementations • CONLL 2016 • Dat Quoc Nguyen, Kairit Sirts, Lizhen Qu, Mark Johnson
Knowledge bases are useful resources for many natural language processing tasks, however, they are far from complete.
no code implementations • 12 Dec 2014 • Dat Quoc Nguyen, Dai Quoc Nguyen, Son Bao Pham
Recent years have witnessed a new trend of building ontology-based question answering systems.
1 code implementation • 12 Dec 2014 • Dat Quoc Nguyen, Dai Quoc Nguyen, Dang Duc Pham, Son Bao Pham
In this paper, we propose a new approach to construct a system of transformation rules for the Part-of-Speech (POS) tagging task.