Search Results for author: Sarana Nutanong

Found 12 papers, 9 papers with code

Handling Cross- and Out-of-Domain Samples in Thai Word Segmentation

1 code implementation • Findings (ACL) 2021 • Peerat Limkonchotiwat, Wannaphong Phatthiyaphaibun, Raheem Sarwar, Ekapol Chuangsuwanich, Sarana Nutanong

Thai Word Segmentation

Paper
Code

Thai Nested Named Entity Recognition Corpus

1 code implementation • Findings (ACL) 2022 • Weerayut Buaphet, Can Udomcharoenchaikit, Peerat Limkonchotiwat, Attapol Rutherford, Sarana Nutanong

Our work, to the best of our knowledge, presents the largest non-English N-NER dataset and the first non-English one with fine-grained classes.

Language Modelling named-entity-recognition +3

Paper
Code

Domain Adaptation of Thai Word Segmentation Models using Stacked Ensemble

1 code implementation • EMNLP 2020 • Peerat Limkonchotiwat, Wannaphong Phatthiyaphaibun, Raheem Sarwar, Ekapol Chuangsuwanich, Sarana Nutanong

Like many Natural Language Processing tasks, Thai word segmentation is domain-dependent.

Ranked #1 on Thai Word Segmentation on WS160 (using extra training data)

Domain Adaptation Ensemble Learning +2

Paper
Code

Robust Fragment-Based Framework for Cross-lingual Sentence Retrieval

no code implementations • Findings (EMNLP) 2021 • Nattapol Trijakwanich, Peerat Limkonchotiwat, Raheem Sarwar, Wannaphong Phatthiyaphaibun, Ekapol Chuangsuwanich, Sarana Nutanong

Cross-lingual Sentence Retrieval (CLSR) aims at retrieving parallel sentence pairs that are translations of each other from a multilingual set of comparable documents.

Machine Translation Retrieval +3

Paper
Add Code

CL-ReLKT: Cross-lingual Language Knowledge Transfer for Multilingual Retrieval Question Answering

1 code implementation • Findings (NAACL) 2022 • Peerat Limkonchotiwat, Wuttikorn Ponwitayarat, Can Udomcharoenchaikit, Ekapol Chuangsuwanich, Sarana Nutanong

A common approach to CL-ReQA is to create a multilingual sentence embedding space such that question-answer pairs across different languages are close to each other.

Language Modelling Question Answering +6

Paper
Code

WangchanLion and WangchanX MRC Eval

1 code implementation • 24 Mar 2024 • Wannaphong Phatthiyaphaibun, Surapon Nonesung, Patomporn Payoungkhamdee, Peerat Limkonchotiwat, Can Udomcharoenchaikit, Jitkapat Sawatphol, Chompakorn Chaksangchaichot, Ekapol Chuangsuwanich, Sarana Nutanong

Our model is based on SEA-LION and a collection of instruction following datasets.

Instruction Following Machine Reading Comprehension

Paper
Code

An Efficient Self-Supervised Cross-View Training For Sentence Embedding

1 code implementation • 6 Nov 2023 • Peerat Limkonchotiwat, Wuttikorn Ponwitayarat, Lalita Lowphansirikul, Can Udomcharoenchaikit, Ekapol Chuangsuwanich, Sarana Nutanong

In this paper, we propose a framework called Self-supervised Cross-View Training (SCT) to narrow the performance gap between large and small PLMs.

Contrastive Learning Language Modelling +5

Paper
Code

Typo-Robust Representation Learning for Dense Retrieval

1 code implementation • 17 Jun 2023 • Panuthep Tasawong, Wuttikorn Ponwitayarat, Peerat Limkonchotiwat, Can Udomcharoenchaikit, Ekapol Chuangsuwanich, Sarana Nutanong

One of the main challenges of dense retrieval in real-world settings is the handling of queries containing misspelled words.

Information Retrieval Representation Learning +1

Paper
Code

Learning Geometric-Aware Properties in 2D Representation Using Lightweight CAD Models, or Zero Real 3D Pairs

no code implementations • CVPR 2023 • Pattaramanee Arsomngern, Sarana Nutanong, Supasorn Suwajanakorn

We also achieve comparable results to SOTA methods trained on scene scans on four tasks in NYUv2, SUNRGB-D, indoor ADE20k, and indoor/outdoor COCO, despite using lightweight CAD models or pseudo data.

Scene Understanding

Paper
Add Code

Thai Wav2Vec2.0 with CommonVoice V8

1 code implementation • 9 Aug 2022 • Wannaphong Phatthiyaphaibun, Chompakorn Chaksangchaichot, Peerat Limkonchotiwat, Ekapol Chuangsuwanich, Sarana Nutanong

However, most of these ASR models are available in English; only a minority of the models are available in Thai.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Code

WangchanBERTa: Pretraining transformer-based Thai Language Models

2 code implementations • 24 Jan 2021 • Lalita Lowphansirikul, Charin Polpanumas, Nawat Jantrakulchai, Sarana Nutanong

However, for a relatively low-resource language such as Thai, the choices of models are limited to training a BERT-based model based on a much smaller dataset or finetuning multi-lingual models, both of which yield suboptimal downstream performance.

Language Modelling Sentence +2

111

Paper
Code

scb-mt-en-th-2020: A Large English-Thai Parallel Corpus

no code implementations • 7 Jul 2020 • Lalita Lowphansirikul, Charin Polpanumas, Attapol T. Rutherford, Sarana Nutanong

The primary objective of our work is to build a large-scale English-Thai dataset for machine translation.

Machine Translation Sentence +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.