no code implementations • 27 Nov 2024 • Geoffrey Tyndall, Kurniawati Azizah, Dipta Tanaya, Ayu Purwarianti, Dessi Puji Lestari, Sakriani Sakti
Continual learning for automatic speech recognition (ASR) systems poses a challenge, especially with the need to avoid catastrophic forgetting while maintaining performance on previously learned tasks.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
1 code implementation • 14 Nov 2024 • Mohammad Rifqi Farhansyah, Muhammad Zuhdi Fikri Johari, Afinzaki Amiral, Ayu Purwarianti, Kumara Ari Yuana, Derry Tanti Wijaya
However, most of these efforts have been focused on creating manual resources thus difficult to scale to more languages.
Optical Character Recognition Optical Character Recognition (OCR)
1 code implementation • 16 Oct 2024 • Genta Indra Winata, Frederikus Hudi, Patrick Amadeus Irawan, David Anugraha, Rifki Afina Putri, Yutong Wang, Adam Nohejl, Ubaidillah Ariq Prathama, Nedjma Ousidhoum, Afifa Amriani, Anar Rzayev, Anirban Das, Ashmari Pramodya, Aulia Adila, Bryan Wilie, Candy Olivia Mawalim, Ching Lam Cheng, Daud Abolade, Emmanuele Chersoni, Enrico Santus, Fariz Ikhwantri, Garry Kuwanto, Hanyang Zhao, Haryo Akbarianto Wibowo, Holy Lovenia, Jan Christian Blaise Cruz, Jan Wira Gotama Putra, Junho Myung, Lucky Susanto, Maria Angelica Riera Machin, Marina Zhukova, Michael Anugraha, Muhammad Farid Adilazuarda, Natasha Santosa, Peerat Limkonchotiwat, Raj Dabre, Rio Alexander Audino, Samuel Cahyawijaya, Shi-Xiong Zhang, Stephanie Yulia Salim, Yi Zhou, Yinxuan Gui, David Ifeoluwa Adelani, En-Shiun Annie Lee, Shogo Okada, Ayu Purwarianti, Alham Fikri Aji, Taro Watanabe, Derry Tanti Wijaya, Alice Oh, Chong-Wah Ngo
This benchmark includes a visual question answering (VQA) dataset with text-image pairs across 30 languages and dialects, spanning 9 language families and featuring over 1 million data points, making it the largest multicultural VQA benchmark to date.
no code implementations • 11 Oct 2024 • Aulia Adila, Dessi Lestari, Ayu Purwarianti, Dipta Tanaya, Kurniawati Azizah, Sakriani Sakti
Currently, Indonesian data is dominated by read, formal, and clean speech, leading to a scarcity of Indonesian data with other speech variabilities.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • 23 Sep 2024 • Patrick Amadeus Irawan, Genta Indra Winata, Samuel Cahyawijaya, Ayu Purwarianti
Natural Language Explanation (NLE) aims to elucidate the decision-making process by providing detailed, human-friendly explanations in natural language.
2 code implementations • 14 Jun 2024 • Holy Lovenia, Rahmad Mahendra, Salsabil Maulana Akbar, Lester James V. Miranda, JENNIFER SANTOSO, Elyanah Aco, Akhdan Fadhilah, Jonibek Mansurov, Joseph Marvin Imperial, Onno P. Kampman, Joel Ruben Antony Moniz, Muhammad Ravi Shulthan Habibi, Frederikus Hudi, Railey Montalan, Ryan Ignatius, Joanito Agili Lopo, William Nixon, Börje F. Karlsson, James Jaya, Ryandito Diandaru, Yuze Gao, Patrick Amadeus, Bin Wang, Jan Christian Blaise Cruz, Chenxi Whitehouse, Ivan Halim Parmonangan, Maria Khelli, Wenyu Zhang, Lucky Susanto, Reynard Adha Ryanda, Sonny Lazuardi Hermawan, Dan John Velasco, Muhammad Dehan Al Kautsar, Willy Fitra Hendria, Yasmin Moslem, Noah Flynn, Muhammad Farid Adilazuarda, Haochen Li, Johanes Lee, R. Damanhuri, Shuo Sun, Muhammad Reza Qorib, Amirbek Djanibekov, Wei Qi Leong, Quyet V. Do, Niklas Muennighoff, Tanrada Pansuwan, Ilham Firdausi Putra, Yan Xu, Ngee Chia Tai, Ayu Purwarianti, Sebastian Ruder, William Tjhi, Peerat Limkonchotiwat, Alham Fikri Aji, Sedrick Keh, Genta Indra Winata, Ruochen Zhang, Fajri Koto, Zheng-Xin Yong, Samuel Cahyawijaya
Southeast Asia (SEA) is a region rich in linguistic diversity and cultural variety, with over 1, 300 indigenous languages and a population of 671 million people.
1 code implementation • 13 Jun 2024 • Zayd Muhammad Kawakibi Zuhri, Muhammad Farid Adilazuarda, Ayu Purwarianti, Alham Fikri Aji
Auto-regressive inference of transformers benefit greatly from Key-Value (KV) caching, but can lead to major memory bottlenecks as model size, batch size, and sequence length grow at scale.
no code implementations • 9 Apr 2024 • Samuel Cahyawijaya, Holy Lovenia, Fajri Koto, Rifki Afina Putri, Emmanuel Dave, Jhonson Lee, Nuur Shadieq, Wawan Cenggoro, Salsabil Maulana Akbar, Muhammad Ihza Mahendra, Dea Annisayanti Putri, Bryan Wilie, Genta Indra Winata, Alham Fikri Aji, Ayu Purwarianti, Pascale Fung
To bridge this quality gap, we introduce Cendol, a collection of Indonesian LLMs encompassing both decoder-only and encoder-decoder architectures across a range of model sizes.
no code implementations • 21 Feb 2024 • Ryandito Diandaru, Lucky Susanto, Zilu Tang, Ayu Purwarianti, Derry Wijaya
Large Language Models (LLMs) demonstrate strong machine translation capabilities on languages they are trained on.
no code implementations • 11 Jan 2024 • Muhammad Farid Adilazuarda, Samuel Cahyawijaya, Alham Fikri Aji, Genta Indra Winata, Ayu Purwarianti
Pretrained language models (PLMs) have become remarkably adept at task and language generalization.
no code implementations • 21 Nov 2023 • Muhammad Farid Adilazuarda, Samuel Cahyawijaya, Genta Indra Winata, Pascale Fung, Ayu Purwarianti
Significant progress has been made on Indonesian NLP.
no code implementations • 21 Nov 2023 • Muhammad Farid Adilazuarda, Samuel Cahyawijaya, Ayu Purwarianti
We expose the limitation of modular multilingual language models (MLMs) in multilingual inference scenarios with unknown languages.
1 code implementation • 3 Nov 2023 • Randy Zakya Suchrady, Ayu Purwarianti
Therefore, this research aims to implement the multitask learning and prompting approach in aspect-based sentiment analysis for Bahasa Indonesia using generative pre-trained language models.
Aspect-Based Sentiment Analysis Aspect Sentiment Triplet Extraction +3
1 code implementation • 2 Nov 2023 • Lucky Susanto, Ryandito Diandaru, Adila Krisnadhi, Ayu Purwarianti, Derry Wijaya
Neural machine translation (NMT) for low-resource local languages in Indonesia faces significant challenges, including the need for a representative benchmark and limited data availability.
1 code implementation • 2 Nov 2023 • Muhammad Dehan Al Kautsar, Rahmah Khoirussyifa' Nurdini, Samuel Cahyawijaya, Genta Indra Winata, Ayu Purwarianti
Task-oriented dialogue (ToD) systems have been mostly created for high-resource languages, such as English and Chinese.
1 code implementation • 15 Oct 2023 • Ni Putu Intan Maharani, Yoga Yustiawan, Fauzy Caesar Rochim, Ayu Purwarianti
In this paper, we construct an Indonesian self-supervised financial corpus, Indonesian financial sentiment analysis dataset, Indonesian financial topic classification dataset, and release a family of BERT models for financial NLP.
1 code implementation • 12 Oct 2023 • Muhammad Razif Rizqullah, Ayu Purwarianti, Alham Fikri Aji
This concludes Chat GPT is unsuitable for question answering task in religious domain especially for Islamic religion.
no code implementations • 12 Oct 2023 • Ni Putu Intan Maharani, Ayu Purwarianti, Alham Fikri Aji
Clickbait spoiling aims to generate a short text to satisfy the curiosity induced by a clickbait post.
1 code implementation • 19 Sep 2023 • Samuel Cahyawijaya, Holy Lovenia, Fajri Koto, Dea Adhista, Emmanuel Dave, Sarah Oktavianti, Salsabil Maulana Akbar, Jhonson Lee, Nuur Shadieq, Tjeng Wawan Cenggoro, Hanung Wahyuning Linuwih, Bryan Wilie, Galih Pradipta Muridan, Genta Indra Winata, David Moeljadi, Alham Fikri Aji, Ayu Purwarianti, Pascale Fung
We compare the effectiveness of online scraping, human translation, and paragraph writing by native speakers in constructing datasets.
1 code implementation • 19 Dec 2022 • Samuel Cahyawijaya, Holy Lovenia, Alham Fikri Aji, Genta Indra Winata, Bryan Wilie, Rahmad Mahendra, Christian Wibisono, Ade Romadhony, Karissa Vincentio, Fajri Koto, JENNIFER SANTOSO, David Moeljadi, Cahya Wirawan, Frederikus Hudi, Ivan Halim Parmonangan, Ika Alfina, Muhammad Satrio Wicaksono, Ilham Firdausi Putra, Samsul Rahmadani, Yulianti Oenang, Ali Akbar Septiandri, James Jaya, Kaustubh D. Dhole, Arie Ardiyanti Suryani, Rifki Afina Putri, Dan Su, Keith Stevens, Made Nindyatama Nityasya, Muhammad Farid Adilazuarda, Ryan Ignatius, Ryandito Diandaru, Tiezheng Yu, Vito Ghifari, Wenliang Dai, Yan Xu, Dyah Damapuspita, Cuk Tho, Ichwanul Muslim Karo Karo, Tirana Noor Fatyanosa, Ziwei Ji, Pascale Fung, Graham Neubig, Timothy Baldwin, Sebastian Ruder, Herry Sujaini, Sakriani Sakti, Ayu Purwarianti
We present NusaCrowd, a collaborative initiative to collect and unify existing resources for Indonesian languages, including opening access to previously non-public resources.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • 15 Dec 2022 • Muhammad Anwari Leksono, Ayu Purwarianti
Recent predictions are carried out with models trained on sequence with fixed splice site location which eliminates possibility of multiple splice sites existence in single sequence.
no code implementations • 21 Jul 2022 • Samuel Cahyawijaya, Alham Fikri Aji, Holy Lovenia, Genta Indra Winata, Bryan Wilie, Rahmad Mahendra, Fajri Koto, David Moeljadi, Karissa Vincentio, Ade Romadhony, Ayu Purwarianti
At the center of the underlying issues that halt Indonesian natural language processing (NLP) research advancement, we find data scarcity.
no code implementations • 1 Jun 2022 • Holy Lovenia, Hiroki Tanaka, Sakriani Sakti, Ayu Purwarianti, Satoshi Nakamura
Research about brain activities involving spoken word production is considerably underdeveloped because of the undiscovered characteristics of speech artifacts, which contaminate electroencephalogram (EEG) signals and prevent the inspection of the underlying cognitive processes.
1 code implementation • 21 Jan 2022 • Vinsen Marselino Andreas, Genta Indra Winata, Ayu Purwarianti
The recent development of language models has shown promising results by achieving state-of-the-art performance on various natural language tasks by fine-tuning pretrained models.
2 code implementations • EMNLP 2021 • Samuel Cahyawijaya, Genta Indra Winata, Bryan Wilie, Karissa Vincentio, Xiaohong Li, Adhiguna Kuncoro, Sebastian Ruder, Zhi Yuan Lim, Syafri Bahar, Masayu Leylia Khodra, Ayu Purwarianti, Pascale Fung
Natural language generation (NLG) benchmarks provide an important avenue to measure progress and develop better NLG systems.
1 code implementation • 29 Sep 2020 • Ferdiant Joshua Muis, Ayu Purwarianti
Research in automatic question generator (AQG) has been conducted for more than 10 years, mainly focused on factoid question.
no code implementations • 15 Sep 2020 • Miftahul Mahfuzh, Sidik Soleman, Ayu Purwarianti
Since JRNN in general requires a large amount of data as the training examples and creating those examples is expensive, we used a data augmentation method to increase the number of training examples.
no code implementations • 13 Sep 2020 • K. M. Shahih, Ayu Purwarianti
This paper describes combinations of word vector representation and character vector representation in English-Indonesian neural machine translation (NMT).
no code implementations • 12 Sep 2020 • Ramos Janoah Hasudungan, Ayu Purwarianti
In this paper, we employ neural network to do relation detection between two named entities for Indonesian Language.
no code implementations • 12 Sep 2020 • Ayu Purwarianti, Ida Ayu Putu Ari Crisdayanti
Meanwhile, in the problem scope of Indonesian sentiment analysis, phrases that express the sentiment of a document might not appear in the first or last part of the document that can lead to incorrect sentiment classification.
1 code implementation • 12 Sep 2020 • Ilham Firdausi Putra, Ayu Purwarianti
Using the feature-based approach, we observe its performance on various data sizes and total added English data.
1 code implementation • 11 Sep 2020 • Turfa Auliarachman, Ayu Purwarianti
In this paper, we propose a new coreference resolution system for Indonesian text with mention pair method that uses deep neural network to learn the relations of the two mentions.
3 code implementations • Asian Chapter of the Association for Computational Linguistics 2020 • Bryan Wilie, Karissa Vincentio, Genta Indra Winata, Samuel Cahyawijaya, Xiaohong Li, Zhi Yuan Lim, Sidik Soleman, Rahmad Mahendra, Pascale Fung, Syafri Bahar, Ayu Purwarianti
Although Indonesian is known to be the fourth most frequently used language over the internet, the research progress on this language in the natural language processing (NLP) is slow-moving due to a lack of available resources.
no code implementations • 11 Sep 2020 • Devin Hoesen, Ayu Purwarianti
The second is the usage of part of speech (POS) tag embedding input layer.
no code implementations • 12 May 2015 • Edwin Lunando, Ayu Purwarianti
Sarcasm is considered one of the most difficult problem in sentiment analysis.