Search Results for author: Fajri Koto

Found 24 papers, 15 papers with code

Cloze Evaluation for Deeper Understanding of Commonsense Stories in Indonesian

1 code implementation CSRR (ACL) 2022 Fajri Koto, Timothy Baldwin, Jey Han Lau

Story comprehension that involves complex causal and temporal relations is a critical task in NLP, but previous studies have focused predominantly on English, leaving open the question of how the findings generalize to other languages, such as Indonesian.

Cloze Test Zero-Shot Cross-Lingual Transfer

Easy-First Bottom-Up Discourse Parsing via Sequence Labelling

no code implementations COLING (CODI, CRAC) 2022 Andrew Shen, Fajri Koto, Jey Han Lau, Timothy Baldwin

We propose a novel unconstrained bottom-up approach for rhetorical discourse parsing based on sequence labelling of adjacent pairs of discourse units (DUs), based on the framework of Koto et al. (2021).

Discourse Parsing

Handling Variance of Pretrained Language Models in Grading Evidence in the Medical Literature

no code implementations ALTA 2021 Fajri Koto, Biaoyan Fang

In this paper, we investigate the utility of modern pretrained language models for the evidence grading system in the medical literature based on the ALTA 2021 shared task.

Are Multilingual LLMs Culturally-Diverse Reasoners? An Investigation into Multicultural Proverbs and Sayings

1 code implementation15 Sep 2023 Chen Cecilia Liu, Fajri Koto, Timothy Baldwin, Iryna Gurevych

Large language models (LLMs) are highly adept at question answering and reasoning tasks, but when reasoning in situational context, human expectations vary depending on the relevant cultural common ground.

Question Answering

CMMLU: Measuring massive multitask language understanding in Chinese

1 code implementation15 Jun 2023 Haonan Li, Yixuan Zhang, Fajri Koto, Yifei Yang, Hai Zhao, Yeyun Gong, Nan Duan, Timothy Baldwin

As the capabilities of large language models (LLMs) continue to advance, evaluating their performance becomes increasingly crucial and challenging.

IndoBERTweet: A Pretrained Language Model for Indonesian Twitter with Effective Domain-Specific Vocabulary Initialization

1 code implementation EMNLP 2021 Fajri Koto, Jey Han Lau, Timothy Baldwin

We present IndoBERTweet, the first large-scale pretrained model for Indonesian Twitter that is trained by extending a monolingually-trained Indonesian BERT model with additive domain-specific vocabulary.

Language Modelling

Evaluating the Efficacy of Summarization Evaluation across Languages

1 code implementation Findings (ACL) 2021 Fajri Koto, Jey Han Lau, Timothy Baldwin

We take a summarization corpus for eight different languages, and manually annotate generated summaries for focus (precision) and coverage (recall).

Discourse Probing of Pretrained Language Models

1 code implementation NAACL 2021 Fajri Koto, Jey Han Lau, Timothy Baldwin

Existing work on probing of pretrained language models (LMs) has predominantly focused on sentence-level syntactic tasks.

Top-down Discourse Parsing via Sequence Labelling

1 code implementation EACL 2021 Fajri Koto, Jey Han Lau, Timothy Baldwin

We introduce a top-down approach to discourse parsing that is conceptually simpler than its predecessors (Kobayashi et al., 2020; Zhang et al., 2020).

Ranked #7 on Discourse Parsing on RST-DT (Standard Parseval (Span) metric)

Discourse Parsing

FFCI: A Framework for Interpretable Automatic Evaluation of Summarization

2 code implementations27 Nov 2020 Fajri Koto, Timothy Baldwin, Jey Han Lau

In this paper, we propose FFCI, a framework for fine-grained summarization evaluation that comprises four elements: faithfulness (degree of factual consistency with the source), focus (precision of summary content relative to the reference), coverage (recall of summary content relative to the reference), and inter-sentential coherence (document fluency between adjacent sentences).

Question Answering Semantic Textual Similarity +1

IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP

no code implementations COLING 2020 Fajri Koto, Afshin Rahimi, Jey Han Lau, Timothy Baldwin

Although the Indonesian language is spoken by almost 200 million people and the 10th most spoken language in the world, it is under-represented in NLP research.

Benchmarking Language Modelling

Towards Computational Linguistics in Minangkabau Language: Studies on Sentiment Analysis and Machine Translation

1 code implementation PACLIC 2020 Fajri Koto, Ikhwan Koto

Although some linguists (Rusmali et al., 1985; Crouch, 2009) have fairly attempted to define the morphology and syntax of Minangkabau, information processing in this language is still absent due to the scarcity of the annotated resource.

Machine Translation Sentiment Analysis +2

Improved Document Modelling with a Neural Discourse Parser

1 code implementation ALTA 2019 Fajri Koto, Jey Han Lau, Timothy Baldwin

We empirically investigate the benefit of the proposed approach on two different tasks: abstractive summarization and popularity prediction of online petitions.

Abstractive Text Summarization Text Generation

A Publicly Available Indonesian Corpora for Automatic Abstractive and Extractive Chat Summarization

no code implementations LREC 2016 Fajri Koto

In this paper we report our effort to construct the first ever Indonesian corpora for chat summarization.

Cannot find the paper you are looking for? You can Submit a new open access paper.