Search Results for author: Kiet Van Nguyen

Found 65 papers, 27 papers with code

ViNLI: A Vietnamese Corpus for Studies on Open-Domain Natural Language Inference

no code implementations COLING 2022 Tin Van Huynh, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen

In this paper, we introduce ViNLI (Vietnamese Natural Language Inference), an open-domain and high-quality corpus for evaluating Vietnamese NLI models, which is created and evaluated with a strict process of quality control.

Natural Language Inference Sentence +1

ViANLI: Adversarial Natural Language Inference for Vietnamese

no code implementations25 Jun 2024 Tin Van Huynh, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen

The development of Natural Language Processing (NLI) datasets and models has been inspired by innovations in annotation design.

Adversarial Natural Language Inference Natural Language Inference +3

ViWikiFC: Fact-Checking for Vietnamese Wikipedia-Based Textual Knowledge Source

no code implementations13 May 2024 Hung Tuan Le, Long Truong To, Manh Trong Nguyen, Kiet Van Nguyen

BM25 and InfoXLM (Large) achieved the best results in two tasks, with BM25 achieving an accuracy of 88. 30% for SUPPORTS, 86. 93% for REFUTES, and only 56. 67% for the NEI label in the evidence retrieval task, InfoXLM (Large) achieved an F1 score of 86. 51%.

Fact Checking Fact Verification +4

New Benchmark Dataset and Fine-Grained Cross-Modal Fusion Framework for Vietnamese Multimodal Aspect-Category Sentiment Analysis

1 code implementation1 May 2024 Quy Hoang Nguyen, Minh-Van Truong Nguyen, Kiet Van Nguyen

To address this, we introduce a new Vietnamese multimodal dataset, named ViMACSA, which consists of 4, 876 text-image pairs with 14, 618 fine-grained annotations for both text and image in the hotel domain.

Aspect Category Sentiment Analysis Multimodal Sentiment Analysis +3

ViOCRVQA: Novel Benchmark Dataset and Vision Reader for Visual Question Answering by Understanding Vietnamese Text in Images

1 code implementation29 Apr 2024 Huy Quang Pham, Thang Kien-Bao Nguyen, Quan Van Nguyen, Dan Quang Tran, Nghia Hieu Nguyen, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen

To this end, we introduce a novel dataset, ViOCRVQA (Vietnamese Optical Character Recognition - Visual Question Answering dataset), consisting of 28, 000+ images and 120, 000+ question-answer pairs.

Optical Character Recognition Optical Character Recognition (OCR) +2

VLUE: A New Benchmark and Multi-task Knowledge Transfer Learning for Vietnamese Natural Language Understanding

no code implementations23 Mar 2024 Phong Nguyen-Thuan Do, Son Quoc Tran, Phu Gia Hoang, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen

The success of Natural Language Understanding (NLU) benchmarks in various languages, such as GLUE for English, CLUE for Chinese, KLUE for Korean, and IndoNLU for Indonesian, has facilitated the evaluation of new NLU models across a wide range of tasks.

Natural Language Understanding text-classification +3

VlogQA: Task, Dataset, and Baseline Models for Vietnamese Spoken-Based Machine Reading Comprehension

1 code implementation5 Feb 2024 Thinh Phuoc Ngo, Khoa Tran Anh Dang, Son T. Luu, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen

This paper presents the development process of a Vietnamese spoken language corpus for machine reading comprehension (MRC) tasks and provides insights into the challenges and opportunities associated with using real-world data for machine reading comprehension tasks.

Machine Reading Comprehension

ViLexNorm: A Lexical Normalization Corpus for Vietnamese Social Media Text

1 code implementation29 Jan 2024 Thanh-Nhi Nguyen, Thanh-Phong Le, Kiet Van Nguyen

In this work, we introduce Vietnamese Lexical Normalization (ViLexNorm), the first-ever corpus developed for the Vietnamese lexical normalization task.

Lexical Normalization Vietnamese Social Media Text Processing

ViSoBERT: A Pre-Trained Language Model for Vietnamese Social Media Text Processing

1 code implementation17 Oct 2023 Quoc-Nam Nguyen, Thang Chau Phan, Duc-Vu Nguyen, Kiet Van Nguyen

English and Chinese, known as resource-rich languages, have witnessed the strong development of transformer-based language models for natural language processing tasks.

Vietnamese Language Models Vietnamese Social Media Text Processing +1

XGV-BERT: Leveraging Contextualized Language Model and Graph Neural Network for Efficient Software Vulnerability Detection

no code implementations26 Sep 2023 Vu Le Anh Quan, Chau Thuan Phat, Kiet Van Nguyen, Phan The Duy, Van-Hau Pham

Hence, in this work, we propose XGV-BERT, a framework that combines the pre-trained CodeBERT model and Graph Neural Network (GCN) to detect software vulnerabilities.

Graph Neural Network Language Modelling +2

ViCGCN: Graph Convolutional Network with Contextualized Language Models for Social Media Mining in Vietnamese

1 code implementation6 Sep 2023 Chau-Thang Phan, Quoc-Nam Nguyen, Chi-Thanh Dang, Trong-Hop Do, Kiet Van Nguyen

Our proposed ViCGCN approach demonstrates a significant improvement of up to 6. 21%, 4. 61%, and 2. 63% over the best Contextualized Language Models, including multilingual and monolingual, on three benchmark datasets, UIT-VSMEC, UIT-ViCTSD, and UIT-VSFC, respectively.

Language Modelling text-classification +2

Link Prediction for Wikipedia Articles as a Natural Language Inference Task

1 code implementation31 Aug 2023 Chau-Thang Phan, Quoc-Nam Nguyen, Kiet Van Nguyen

Drawing inspiration from recent advancements in natural language processing and understanding, we cast link prediction as an NLI task, wherein the presence of a link between two articles is treated as a premise, and the task is to determine whether this premise holds based on the information presented in the articles.

Link Prediction Natural Language Inference +2

BARTPhoBEiT: Pre-trained Sequence-to-Sequence and Image Transformers Models for Vietnamese Visual Question Answering

no code implementations28 Jul 2023 Khiem Vinh Tran, Kiet Van Nguyen, Ngan Luu Thuy Nguyen

Visual Question Answering (VQA) is an intricate and demanding task that integrates natural language processing (NLP) and computer vision (CV), capturing the interest of researchers.

Question Answering Vietnamese Visual Question Answering

PAT: Parallel Attention Transformer for Visual Question Answering in Vietnamese

no code implementations17 Jul 2023 Nghia Hieu Nguyen, Kiet Van Nguyen

Based on these two novel modules, we introduce the Parallel Attention Transformer (PAT), achieving the best accuracy compared to all baselines on the benchmark ViVQA dataset and other SOTA methods including SAAA and MCAN.

Question Answering Vietnamese Visual Question Answering

A Multiple Choices Reading Comprehension Corpus for Vietnamese Language Education

1 code implementation31 Mar 2023 Son T. Luu, Khoi Trong Hoang, Tuong Quang Pham, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen

From the results of the error analysis, we found the challenge of the reading comprehension models is understanding the implicit context in texts and linking them together in order to find the correct answers.

Machine Reading Comprehension Multiple-choice +1

EVJVQA Challenge: Multilingual Visual Question Answering

no code implementations23 Feb 2023 Ngan Luu-Thuy Nguyen, Nghia Hieu Nguyen, Duong T. D Vo, Khanh Quoc Tran, Kiet Van Nguyen

Visual Question Answering (VQA) is a challenging task of natural language processing (NLP) and computer vision (CV), attracting significant attention from researchers.

Language Modelling Question Answering +2

ViHOS: Hate Speech Spans Detection for Vietnamese

1 code implementation24 Jan 2023 Phu Gia Hoang, Canh Duc Luu, Khanh Quoc Tran, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen

The rise in hateful and offensive language directed at other users is one of the adverse side effects of the increased use of social networking platforms.

Sequence-to-sequence Language Modeling XLM-R

UIT-HWDB: Using Transferring Method to Construct A Novel Benchmark for Evaluating Unconstrained Handwriting Image Recognition in Vietnamese

1 code implementation10 Nov 2022 Nghia Hieu Nguyen, Duong T. D. Vo, Kiet Van Nguyen

Recognizing handwriting images is challenging due to the vast variation in writing style across many people and distinct linguistic aspects of writing languages.

Handwriting Recognition

SMTCE: A Social Media Text Classification Evaluation Benchmark and BERTology Models for Vietnamese

no code implementations21 Sep 2022 Luan Thanh Nguyen, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen

Inspired by the success of the GLUE, we introduce the Social Media Text Classification Evaluation (SMTCE) benchmark, as a collection of datasets and models across a diverse set of SMTC tasks.

text-classification Text Classification +1

Vietnamese Hate and Offensive Detection using PhoBERT-CNN and Social Media Streaming Data

1 code implementation1 Jun 2022 Khanh Q. Tran, An T. Nguyen, Phu Gia Hoang, Canh Duc Luu, Trong-Hop Do, Kiet Van Nguyen

Secondly, a novel hate speech detection (HSD) model, which is the combination of a pre-trained PhoBERT model and a Text-CNN model, was proposed for solving tasks in Vietnamese.

Hate Speech Detection Vietnamese Social Media Text Processing

XLMRQA: Open-Domain Question Answering on Vietnamese Wikipedia-based Textual Knowledge Source

no code implementations14 Apr 2022 Kiet Van Nguyen, Phong Nguyen-Thuan Do, Nhat Duy Nguyen, Tin Van Huynh, Anh Gia-Tuan Nguyen, Ngan Luu-Thuy Nguyen

Question answering (QA) is a natural language understanding task within the fields of information retrieval and information extraction that has attracted much attention from the computational linguistics and artificial intelligence research community in recent years because of the strong development of machine reading comprehension-based models.

Information Retrieval Machine Reading Comprehension +3

VLSP 2021 - ViMRC Challenge: Vietnamese Machine Reading Comprehension

no code implementations22 Mar 2022 Kiet Van Nguyen, Son Quoc Tran, Luan Thanh Nguyen, Tin Van Huynh, Son T. Luu, Ngan Luu-Thuy Nguyen

To address the weakness, we provide the research community with a benchmark dataset named UIT-ViQuAD 2. 0 for evaluating the MRC task and question answering systems for the Vietnamese language.

Language Modelling Machine Reading Comprehension +7

Joint Chinese Word Segmentation and Part-of-speech Tagging via Two-stage Span Labeling

no code implementations PACLIC 2021 Duc-Vu Nguyen, Linh-Bao Vo, Ngoc-Linh Tran, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen

Previous studies on joint Chinese word segmentation and part-of-speech tagging mainly follow the character-based tagging model focusing on modeling n-gram features.

Chinese Word Segmentation Part-Of-Speech Tagging +2

VinaFood21: A Novel Dataset for Evaluating Vietnamese Food Recognition

no code implementations6 Aug 2021 Thuan Trong Nguyen, Thuan Q. Nguyen, Dung Vo, Vi Nguyen, Ngoc Ho, Nguyen D. Vo, Kiet Van Nguyen, Khang Nguyen

We use 10, 044 images for model training and 6, 682 test images to classify each food in the VinaFood21 dataset and achieved an average accuracy of 74. 81% when fine-tuning CNN EfficientNet-B0.

Diversity Food Recognition

Sentence Extraction-Based Machine Reading Comprehension for Vietnamese

no code implementations19 May 2021 Phong Nguyen-Thuan Do, Nhat Duy Nguyen, Tin Van Huynh, Kiet Van Nguyen, Anh Gia-Tuan Nguyen, Ngan Luu-Thuy Nguyen

We propose a conversion algorithm to create the dataset for sentence extraction-based machine reading comprehension and three types of approaches for sentence extraction-based machine reading comprehension in Vietnamese.

Machine Reading Comprehension Question Answering +2

Conversational Machine Reading Comprehension for Vietnamese Healthcare Texts

1 code implementation4 May 2021 Son T. Luu, Mao Nguyen Bui, Loi Duc Nguyen, Khiem Vinh Tran, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen

To help machines understand conversation texts, we present UIT-ViCoQA, a new corpus for conversational machine reading comprehension in the Vietnamese language.

Chatbot Machine Reading Comprehension +2

Constructive and Toxic Speech Detection for Open-domain Social Media Comments in Vietnamese

no code implementations18 Mar 2021 Luan Thanh Nguyen, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen

For these tasks, we propose a system for constructive and toxic speech detection with the state-of-the-art transfer learning model in Vietnamese NLP as PhoBERT.

Constructive Comment Classification General Classification +2

Augmenting Part-of-speech Tagging with Syntactic Information for Vietnamese and Chinese

1 code implementation24 Feb 2021 Duc-Vu Nguyen, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen

In this paper, we implement this idea to improve word segmentation and part of speech tagging the Vietnamese language by employing a simplified constituency parser.

Part-Of-Speech Tagging Segmentation

ReINTEL Challenge 2020: Exploiting Transfer Learning Models for Reliable Intelligence Identification on Vietnamese Social Network Sites

no code implementations VLSP 2020 Kim Thi-Thanh Nguyen, Kiet Van Nguyen

This paper presents the system that we propose for the Reliable Intelligence Indentification on Vietnamese Social Network Sites (ReINTEL) task of the Vietnamese Language and Speech Processing 2020 (VLSP 2020) Shared Task.

Fake News Detection Reliable Intelligence Identification +1

A Vietnamese Dataset for Evaluating Machine Reading Comprehension

no code implementations30 Sep 2020 Kiet Van Nguyen, Duc-Vu Nguyen, Anh Gia-Tuan Nguyen, Ngan Luu-Thuy Nguyen

Due to the lack of benchmark datasets for Vietnamese, we present the Vietnamese Question Answering Dataset (UIT-ViQuAD), a new dataset for the low-resource language as Vietnamese to evaluate MRC models.

Machine Reading Comprehension Question Answering +3

Empirical Study of Text Augmentation on Social Media Text in Vietnamese

1 code implementation25 Sep 2020 Son T. Luu, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen

Thus, when collecting the data about user comments on the social network, the data is usually skewed about one label, which leads the dataset to become imbalanced and deteriorate the model's ability.

General Classification Hate Speech Detection +5

An Experimental Study of Deep Neural Network Models for Vietnamese Multiple-Choice Reading Comprehension

no code implementations20 Aug 2020 Son T. Luu, Kiet Van Nguyen, Anh Gia-Tuan Nguyen, Ngan Luu-Thuy Nguyen

In this paper, we conduct several experiments on neural network-based model to understand the impact of word representation to the Vietnamese multiple-choice machine reading comprehension.

Machine Reading Comprehension Multiple-choice +1

Vietnamese Word Segmentation with SVM: Ambiguity Reduction and Suffix Capture

1 code implementation14 Jun 2020 Duc-Vu Nguyen, Dang Van Thin, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen

In this paper, we approach Vietnamese word segmentation as a binary classification by using the Support Vector Machine classifier.

Binary Classification Segmentation +2

UIT-ViIC: A Dataset for the First Evaluation on Vietnamese Image Captioning

3 code implementations1 Feb 2020 Quan Hoang Lam, Quang Duy Le, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen

This paper contributes to research on Image Captioning task in terms of extending dataset to a different language - Vietnamese.

Vietnamese Datasets Vietnamese Image Captioning

Comparison Between Traditional Machine Learning Models And Neural Network Models For Vietnamese Hate Speech Detection

1 code implementation31 Jan 2020 Son T. Luu, Hung P. Nguyen, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen

Consequently, we compare traditional machine learning and deep learning on a large dataset about the user's comments on social network in Vietnamese and find out what is the advantage and disadvantage of each model by comparing their accuracy on F1-score, then we pick two models in which has highest accuracy in traditional machine learning models and deep neural models respectively.

BIG-bench Machine Learning Hate Speech Detection

Enhancing lexical-based approach with external knowledge for Vietnamese multiple-choice machine reading comprehension

no code implementations16 Jan 2020 Kiet Van Nguyen, Khiem Vinh Tran, Son T. Luu, Anh Gia-Tuan Nguyen, Ngan Luu-Thuy Nguyen

Although Vietnamese is the 17th most popular native-speaker language in the world, there are not many research studies on Vietnamese machine reading comprehension (MRC), the task of understanding a text and answering questions about it.

Machine Reading Comprehension Multiple-choice +3

Job Prediction: From Deep Neural Network Models to Applications

no code implementations27 Dec 2019 Tin Van Huynh, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen, Anh Gia-Tuan Nguyen

In addition, we also proposed a simple and effective ensemble model combining different deep neural network models.

Job classification Job Prediction +1

Emotion Recognition for Vietnamese Social Media Text

no code implementations21 Nov 2019 Vong Anh Ho, Duong Huynh-Cong Nguyen, Danh Hoang Nguyen, Linh Thi-Van Pham, Duc-Vu Nguyen, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen

In this task, the result is not produced in terms of either polarity: positive or negative or in the form of rating (from 1 to 5) but of a more detailed level of analysis in which the results are depicted in more expressions like sadness, enjoyment, anger, disgust, fear, and surprise.

Emotion Recognition Sentiment Analysis

Error Analysis for Vietnamese Named Entity Recognition on Deep Neural Network Models

no code implementations17 Nov 2019 Binh An Nguyen, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen

In recent years, Vietnamese Named Entity Recognition (NER) systems have had a great breakthrough when using Deep Neural Network methods.

named-entity-recognition Named Entity Recognition +2

Vietnamese transition-based dependency parsing with supertag features

no code implementations9 Nov 2019 Kiet Van Nguyen, Ngan Luu-Thuy Nguyen

In recent years, dependency parsing is a fascinating research topic and has a lot of applications in natural language processing.

Transition-Based Dependency Parsing

Hate Speech Detection on Vietnamese Social Media Text using the Bidirectional-LSTM Model

1 code implementation9 Nov 2019 Hang Thi-Thuy Do, Huy Duc Huynh, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen, Anh Gia-Tuan Nguyen

In this paper, we describe our system which participates in the shared task of Hate Speech Detection on Social Networks of VLSP 2019 evaluation campaign.

BIG-bench Machine Learning Hate Speech Detection +1

Error Analysis for Vietnamese Dependency Parsing

no code implementations9 Nov 2019 Kiet Van Nguyen, Ngan Luu-Thuy Nguyen

Dependency parsing is needed in different applications of natural language processing.

Dependency Parsing

Cannot find the paper you are looking for? You can Submit a new open access paper.