Search Results for author: Jan Christian Blaise Cruz

Found 13 papers, 5 papers with code

Data Processing Matters: SRPH-Konvergen AI’s Machine Translation System for WMT’21

no code implementations • WMT (EMNLP) 2021 • Lintang Sutawika, Jan Christian Blaise Cruz

In this paper, we describe the submission of the joint Samsung Research Philippines-Konvergen AI team for the WMT’21 Large Scale Multilingual Translation Task - Small Track 2.

Machine Translation Translation

Paper
Add Code

Samsung R&D Institute Philippines at WMT 2023

no code implementations • 25 Oct 2023 • Jan Christian Blaise Cruz

In this paper, we describe the constrained MT systems submitted by Samsung R&D Institute Philippines to the WMT 2023 General Translation Task for two directions: en$\rightarrow$he and he$\rightarrow$en.

Translation

Paper
Add Code

Multilingual Large Language Models Are Not (Yet) Code-Switchers

no code implementations • 23 May 2023 • Ruochen Zhang, Samuel Cahyawijaya, Jan Christian Blaise Cruz, Genta Indra Winata, Alham Fikri Aji

Multilingual Large Language Models (LLMs) have recently shown great capabilities in a wide range of tasks, exhibiting state-of-the-art performance through zero-shot or few-shot prompting methods.

Benchmarking Language Identification +2

Paper
Add Code

Prompting Multilingual Large Language Models to Generate Code-Mixed Texts: The Case of South East Asian Languages

no code implementations • 23 Mar 2023 • Zheng-Xin Yong, Ruochen Zhang, Jessica Zosa Forde, Skyler Wang, Arjun Subramonian, Holy Lovenia, Samuel Cahyawijaya, Genta Indra Winata, Lintang Sutawika, Jan Christian Blaise Cruz, Yin Lin Tan, Long Phan, Rowena Garcia, Thamar Solorio, Alham Fikri Aji

While code-mixing is a common linguistic practice in many parts of the world, collecting high-quality and low-cost code-mixed data remains a challenge for natural language processing (NLP) research.

Paper
Add Code

Towards Automatic Construction of Filipino WordNet: Word Sense Induction and Synset Induction Using Sentence Embeddings

no code implementations • 7 Apr 2022 • Dan John Velasco, Axel Alba, Trisha Gail Pelagio, Bryce Anthony Ramirez, Unisse Chua, Briane Paul Samson, Jan Christian Blaise Cruz, Charibeth Cheng

The resulting sense inventory and synonym sets can be used in automatically creating a wordnet.

Language Modelling Sentence +3

Paper
Add Code

Using Synthetic Data for Conversational Response Generation in Low-resource Settings

no code implementations • 6 Apr 2022 • Gabriel Louis Tan, Adrian Paule Ty, Schuyler Ng, Denzel Adrian Co, Jan Christian Blaise Cruz, Charibeth Cheng

Lastly, we published the first Filipino conversational response generator capable of generating responses related to the previous 3 responses.

Conversational Response Generation Data Augmentation +1

Paper
Add Code

Data Processing Matters: SRPH-Konvergen AI's Machine Translation System for WMT'21

no code implementations • 20 Nov 2021 • Lintang Sutawika, Jan Christian Blaise Cruz

In this paper, we describe the submission of the joint Samsung Research Philippines-Konvergen AI team for the WMT'21 Large Scale Multilingual Translation Task - Small Track 2.

Machine Translation Translation

Paper
Add Code

Improving Large-scale Language Models and Resources for Filipino

no code implementations • LREC 2022 • Jan Christian Blaise Cruz, Charibeth Cheng

In this paper, we improve on existing language resources for the low-resource Filipino language in two ways.

Paper
Add Code

Exploiting News Article Structure for Automatic Corpus Generation of Entailment Datasets

1 code implementation • 22 Oct 2020 • Jan Christian Blaise Cruz, Jose Kristian Resabal, James Lin, Dan John Velasco, Charibeth Cheng

Lastly, we perform analyses on transfer learning techniques to shed light on their true performance when operating in low-data domains through the use of degradation tests.

Benchmarking Natural Language Inference +2

Paper
Code

Establishing Baselines for Text Classification in Low-Resource Languages

1 code implementation • 5 May 2020 • Jan Christian Blaise Cruz, Charibeth Cheng

We analyze our pretrained model's degradation speeds and look towards the use of this method for comparing models aimed at operating within the low-resource setting.

General Classification Multilabel Text Classification +2

Paper
Code

Simplifying Paragraph-level Question Generation via Transformer Language Models

4 code implementations • 3 May 2020 • Luis Enrico Lopez, Diane Kathryn Cruz, Jan Christian Blaise Cruz, Charibeth Cheng

Question generation (QG) is a natural language generation task where a model is trained to ask questions corresponding to some input text.

Language Modelling Question Generation +3

1,070

Paper
Code

Localization of Fake News Detection via Multitask Transfer Learning

1 code implementation • LREC 2020 • Jan Christian Blaise Cruz, Julianne Agatha Tan, Charibeth Cheng

Second, we benchmark Transfer Learning (TL) techniques and show that they can be used to train robust fake news classifiers from little data, achieving 91% accuracy on our fake news dataset, reducing the error by 14% compared to established few-shot baselines.

Fake News Detection Language Modelling +1

Paper
Code

Evaluating Language Model Finetuning Techniques for Low-resource Languages

2 code implementations • 30 Jun 2019 • Jan Christian Blaise Cruz, Charibeth Cheng

Unlike mainstream languages (such as English and French), low-resource languages often suffer from a lack of expert-annotated corpora and benchmark resources that make it hard to apply state-of-the-art techniques directly.

Language Modelling

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.