no code implementations • COLING 2022 • Fajri Koto, Timothy Baldwin, Jey Han Lau
Summaries, keyphrases, and titles are different ways of concisely capturing the content of a document.
1 code implementation • CSRR (ACL) 2022 • Fajri Koto, Timothy Baldwin, Jey Han Lau
Story comprehension that involves complex causal and temporal relations is a critical task in NLP, but previous studies have focused predominantly on English, leaving open the question of how the findings generalize to other languages, such as Indonesian.
no code implementations • COLING (CODI, CRAC) 2022 • Andrew Shen, Fajri Koto, Jey Han Lau, Timothy Baldwin
We propose a novel unconstrained bottom-up approach for rhetorical discourse parsing based on sequence labelling of adjacent pairs of discourse units (DUs), based on the framework of Koto et al. (2021).
no code implementations • ALTA 2021 • Fajri Koto, Biaoyan Fang
In this paper, we investigate the utility of modern pretrained language models for the evidence grading system in the medical literature based on the ALTA 2021 shared task.
no code implementations • ECNLP (ACL) 2022 • Fajri Koto, Jey Han Lau, Timothy Baldwin
For any e-commerce service, persuasive, faithful, and informative product descriptions can attract shoppers and improve sales.
1 code implementation • 19 Sep 2023 • Samuel Cahyawijaya, Holy Lovenia, Fajri Koto, Dea Adhista, Emmanuel Dave, Sarah Oktavianti, Salsabil Maulana Akbar, Jhonson Lee, Nuur Shadieq, Tjeng Wawan Cenggoro, Hanung Wahyuning Linuwih, Bryan Wilie, Galih Pradipta Muridan, Genta Indra Winata, David Moeljadi, Alham Fikri Aji, Ayu Purwarianti, Pascale Fung
We compare the effectiveness of online scraping, human translation, and paragraph writing by native speakers in constructing datasets.
1 code implementation • 15 Sep 2023 • Chen Cecilia Liu, Fajri Koto, Timothy Baldwin, Iryna Gurevych
Large language models (LLMs) are highly adept at question answering and reasoning tasks, but when reasoning in situational context, human expectations vary depending on the relevant cultural common ground.
no code implementations • 30 Aug 2023 • Neha Sengupta, Sunil Kumar Sahu, Bokang Jia, Satheesh Katipomu, Haonan Li, Fajri Koto, William Marshall, Gurpreet Gosal, Cynthia Liu, Zhiming Chen, Osama Mohammed Afzal, Samta Kamboj, Onkar Pandit, Rahul Pal, Lalit Pradhan, Zain Muhammad Mujahid, Massa Baali, Xudong Han, Sondos Mahmoud Bsharat, Alham Fikri Aji, Zhiqiang Shen, Zhengzhong Liu, Natalia Vassilieva, Joel Hestness, Andy Hock, Andrew Feldman, Jonathan Lee, Andrew Jackson, Hector Xuguang Ren, Preslav Nakov, Timothy Baldwin, Eric Xing
We release two open versions of the model -- the foundation Jais model, and an instruction-tuned Jais-chat variant -- with the aim of promoting research on Arabic LLMs.
1 code implementation • 15 Jun 2023 • Haonan Li, Yixuan Zhang, Fajri Koto, Yifei Yang, Hai Zhao, Yeyun Gong, Nan Duan, Timothy Baldwin
As the capabilities of large language models (LLMs) continue to advance, evaluating their performance becomes increasingly crucial and challenging.
1 code implementation • 24 May 2023 • Haonan Li, Fajri Koto, Minghao Wu, Alham Fikri Aji, Timothy Baldwin
Instruction tuning has shown great promise in the field of natural language processing.
1 code implementation • 19 Dec 2022 • Samuel Cahyawijaya, Holy Lovenia, Alham Fikri Aji, Genta Indra Winata, Bryan Wilie, Rahmad Mahendra, Christian Wibisono, Ade Romadhony, Karissa Vincentio, Fajri Koto, JENNIFER SANTOSO, David Moeljadi, Cahya Wirawan, Frederikus Hudi, Ivan Halim Parmonangan, Ika Alfina, Muhammad Satrio Wicaksono, Ilham Firdausi Putra, Samsul Rahmadani, Yulianti Oenang, Ali Akbar Septiandri, James Jaya, Kaustubh D. Dhole, Arie Ardiyanti Suryani, Rifki Afina Putri, Dan Su, Keith Stevens, Made Nindyatama Nityasya, Muhammad Farid Adilazuarda, Ryan Ignatius, Ryandito Diandaru, Tiezheng Yu, Vito Ghifari, Wenliang Dai, Yan Xu, Dyah Damapuspita, Cuk Tho, Ichwanul Muslim Karo Karo, Tirana Noor Fatyanosa, Ziwei Ji, Pascale Fung, Graham Neubig, Timothy Baldwin, Sebastian Ruder, Herry Sujaini, Sakriani Sakti, Ayu Purwarianti
We present NusaCrowd, a collaborative initiative to collect and unify existing resources for Indonesian languages, including opening access to previously non-public resources.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 21 Jul 2022 • Samuel Cahyawijaya, Alham Fikri Aji, Holy Lovenia, Genta Indra Winata, Bryan Wilie, Rahmad Mahendra, Fajri Koto, David Moeljadi, Karissa Vincentio, Ade Romadhony, Ayu Purwarianti
At the center of the underlying issues that halt Indonesian natural language processing (NLP) research advancement, we find data scarcity.
2 code implementations • 31 May 2022 • Genta Indra Winata, Alham Fikri Aji, Samuel Cahyawijaya, Rahmad Mahendra, Fajri Koto, Ade Romadhony, Kemal Kurniawan, David Moeljadi, Radityo Eko Prasojo, Pascale Fung, Timothy Baldwin, Jey Han Lau, Rico Sennrich, Sebastian Ruder
In this work, we focus on developing resources for languages in Indonesia.
no code implementations • ACL 2022 • Alham Fikri Aji, Genta Indra Winata, Fajri Koto, Samuel Cahyawijaya, Ade Romadhony, Rahmad Mahendra, Kemal Kurniawan, David Moeljadi, Radityo Eko Prasojo, Timothy Baldwin, Jey Han Lau, Sebastian Ruder
NLP research is impeded by a lack of resources and awareness of the challenges presented by underrepresented languages and dialects.
1 code implementation • EMNLP 2021 • Fajri Koto, Jey Han Lau, Timothy Baldwin
We present IndoBERTweet, the first large-scale pretrained model for Indonesian Twitter that is trained by extending a monolingually-trained Indonesian BERT model with additive domain-specific vocabulary.
1 code implementation • Findings (ACL) 2021 • Fajri Koto, Jey Han Lau, Timothy Baldwin
We take a summarization corpus for eight different languages, and manually annotate generated summaries for focus (precision) and coverage (recall).
1 code implementation • NAACL 2021 • Fajri Koto, Jey Han Lau, Timothy Baldwin
Existing work on probing of pretrained language models (LMs) has predominantly focused on sentence-level syntactic tasks.
1 code implementation • EACL 2021 • Fajri Koto, Jey Han Lau, Timothy Baldwin
We introduce a top-down approach to discourse parsing that is conceptually simpler than its predecessors (Kobayashi et al., 2020; Zhang et al., 2020).
Ranked #7 on
Discourse Parsing
on RST-DT
(Standard Parseval (Span) metric)
2 code implementations • 27 Nov 2020 • Fajri Koto, Timothy Baldwin, Jey Han Lau
In this paper, we propose FFCI, a framework for fine-grained summarization evaluation that comprises four elements: faithfulness (degree of factual consistency with the source), focus (precision of summary content relative to the reference), coverage (recall of summary content relative to the reference), and inter-sentential coherence (document fluency between adjacent sentences).
no code implementations • COLING 2020 • Fajri Koto, Afshin Rahimi, Jey Han Lau, Timothy Baldwin
Although the Indonesian language is spoken by almost 200 million people and the 10th most spoken language in the world, it is under-represented in NLP research.
1 code implementation • Asian Chapter of the Association for Computational Linguistics 2020 • Fajri Koto, Jey Han Lau, Timothy Baldwin
In this paper, we introduce a large-scale Indonesian summarization dataset.
1 code implementation • PACLIC 2020 • Fajri Koto, Ikhwan Koto
Although some linguists (Rusmali et al., 1985; Crouch, 2009) have fairly attempted to define the morphology and syntax of Minangkabau, information processing in this language is still absent due to the scarcity of the annotated resource.
1 code implementation • ALTA 2019 • Fajri Koto, Jey Han Lau, Timothy Baldwin
We empirically investigate the benefit of the proposed approach on two different tasks: abstractive summarization and popularity prediction of online petitions.
no code implementations • LREC 2016 • Fajri Koto
In this paper we report our effort to construct the first ever Indonesian corpora for chat summarization.