Story comprehension that involves complex causal and temporal relations is a critical task in NLP, but previous studies have focused predominantly on English, leaving open the question of how the findings generalize to other languages, such as Indonesian.
Free legal assistance is critically under-resourced, and many of those who seek legal help have their needs unmet.
Although multi-document summarisation (MDS) of the biomedical literature is a highly valuable task that has recently attracted substantial interest, evaluation of the quality of biomedical summaries lacks consistency and transparency.
For any e-commerce service, persuasive, faithful, and informative product descriptions can attract shoppers and improve sales.
1 code implementation • 31 May 2022 • Genta Indra Winata, Alham Fikri Aji, Samuel Cahyawijaya, Rahmad Mahendra, Fajri Koto, Ade Romadhony, Kemal Kurniawan, David Moeljadi, Radityo Eko Prasojo, Pascale Fung, Timothy Baldwin, Jey Han Lau, Rico Sennrich, Sebastian Ruder
In this work, we focus on developing resources for languages in Indonesia.
Data artifacts incentivize machine learning models to learn non-transferable generalizations by taking advantage of shortcuts in the data, and there is growing evidence that data artifacts play a role for the strong results that deep learning models achieve in recent natural language processing benchmarks.
no code implementations • • Alham Fikri Aji, Genta Indra Winata, Fajri Koto, Samuel Cahyawijaya, Ade Romadhony, Rahmad Mahendra, Kemal Kurniawan, David Moeljadi, Radityo Eko Prasojo, Timothy Baldwin, Jey Han Lau, Sebastian Ruder
NLP research is impeded by a lack of resources and awareness of the challenges presented by underrepresented languages and dialects.
To obtain a transparent reasoning process, we introduce neuro-symbolic to perform explicit reasoning that justifies model decisions by reasoning chains.
This paper describes the submissions of the Natural Language Processing (NLP) team from the Australian Research Council Industrial Transformation Training Centre (ITTC) for Cognitive Computing in Medical Technologies to the TREC 2021 Clinical Trials Track.
Conversation disentanglement, the task to identify separate threads in conversations, is an important pre-processing step in multi-party conversational NLP applications such as conversational question answering and conversation summarization.
Most rumour detection models for social media are designed for one specific language (mostly English).
We present IndoBERTweet, the first large-scale pretrained model for Indonesian Twitter that is trained by extending a monolingually-trained Indonesian BERT model with additive domain-specific vocabulary.
Neutralisation techniques, e. g. denial of responsibility and denial of victim, are used in the narrative of climate change scepticism to justify lack of action or to promote an alternative view.
The COVID-19 pandemic has driven ever-greater demand for tools which enable efficient exploration of biomedical literature.
In this paper, we propose FFCI, a framework for fine-grained summarization evaluation that comprises four elements: faithfulness (degree of factual consistency with the source), focus (precision of summary content relative to the reference), coverage (recall of summary content relative to the reference), and inter-sentential coherence (document fluency between adjacent sentences).
Although the Indonesian language is spoken by almost 200 million people and the 10th most spoken language in the world, it is under-represented in NLP research.
We propose a new approach for learning contextualised cross-lingual word embeddings based on a small parallel corpus (e. g. a few hundred sentence pairs).
We present COVID-SEE, a system for medical literature discovery based on the concept of information exploration, which builds on several distinct text analysis and natural language processing methods to structure and organise information in publications, and augments search by providing a visual overview supporting exploration of a collection to identify key articles of interest.
In the literature, PQA is formulated as a retrieval problem with the goal to search for the most relevant reviews to answer a given product question.
We focus in particular on the role of data statements in ethically assessing research, but also discuss the topic of dual use, and examine the outcomes of similar debates in other scientific disciplines.
An adversarial example is an input transformed by small perturbations that machine learning models consistently misclassify.
We empirically investigate the benefit of the proposed approach on two different tasks: abstractive summarization and popularity prediction of online petitions.
Motivated by this, our paper focuses on the task of rumour detection; particularly, we are interested in understanding how early we can detect them.
Community question answering (cQA) forums provide a rich source of data for facilitating non-factoid question answering over many technical domains.
In this paper, we propose a joint architecture that captures language, rhyme and meter for sonnet modelling.
We investigate the influence that document context exerts on human acceptability judgements for English sentences, via two sets of experiments.
Instructional Systems Design is the practice of creating of instructional experiences that make the acquisition of knowledge and skill more efficient, effective, and appealing.
The versatility of word embeddings for various applications is attracting researchers from various fields.
With this decoupled architecture, we decrease the number of parameters in the decoder substantially, and shorten its training time.
Automatic topic labelling is the task of generating a succinct label that summarises the theme or subject of a topic, with the intention of reducing the cognitive load of end-users when interpreting these topics.
Recently, Le and Mikolov (2014) proposed doc2vec as an extension to word2vec (Mikolov et al., 2013a) to learn document-level embeddings.
Language identification is the task of automatically detecting the language(s) present in a document based on the content of the document.