4 code implementations • WS 2016 • Jey Han Lau, Timothy Baldwin
Recently, Le and Mikolov (2014) proposed doc2vec as an extension to word2vec (Mikolov et al., 2013a) to learn document-level embeddings.
1 code implementation • COLING 2016 • Shraey Bhatia, Jey Han Lau, Timothy Baldwin
Topics generated by topic models are typically represented as list of terms.
1 code implementation • ACL 2017 • Jey Han Lau, Timothy Baldwin, Trevor Cohn
Language models are typically applied at the sentence level, without access to the broader document context.
2 code implementations • 31 May 2022 • Genta Indra Winata, Alham Fikri Aji, Samuel Cahyawijaya, Rahmad Mahendra, Fajri Koto, Ade Romadhony, Kemal Kurniawan, David Moeljadi, Radityo Eko Prasojo, Pascale Fung, Timothy Baldwin, Jey Han Lau, Rico Sennrich, Sebastian Ruder
In this work, we focus on developing resources for languages in Indonesia.
1 code implementation • ACL 2018 • Jey Han Lau, Trevor Cohn, Timothy Baldwin, Julian Brooke, Adam Hammond
In this paper, we propose a joint architecture that captures language, rhyme and meter for sonnet modelling.
1 code implementation • EMNLP 2021 • Fajri Koto, Jey Han Lau, Timothy Baldwin
We present IndoBERTweet, the first large-scale pretrained model for Indonesian Twitter that is trained by extending a monolingually-trained Indonesian BERT model with additive domain-specific vocabulary.
1 code implementation • Asian Chapter of the Association for Computational Linguistics 2020 • Fajri Koto, Jey Han Lau, Timothy Baldwin
In this paper, we introduce a large-scale Indonesian summarization dataset.
1 code implementation • EMNLP (MRL) 2021 • Takashi Wada, Tomoharu Iwata, Yuji Matsumoto, Timothy Baldwin, Jey Han Lau
We propose a new approach for learning contextualised cross-lingual word embeddings based on a small parallel corpus (e. g. a few hundred sentence pairs).
Bilingual Lexicon Induction Cross-Lingual Word Embeddings +4
1 code implementation • ACL 2022 • Shiquan Yang, Rui Zhang, Sarah Erfani, Jey Han Lau
To obtain a transparent reasoning process, we introduce neuro-symbolic to perform explicit reasoning that justifies model decisions by reasoning chains.
2 code implementations • sdp (COLING) 2022 • Yulia Otmakhova, Hung Thinh Truong, Timothy Baldwin, Trevor Cohn, Karin Verspoor, Jey Han Lau
In this paper we report on our submission to the Multidocument Summarisation for Literature Review (MSLR) shared task.
1 code implementation • IJCNLP 2017 • Jey Han Lau, Lianhua Chi, Khoi-Nguyen Tran, Trevor Cohn
We propose an end-to-end neural network to predict the geolocation of a tweet.
1 code implementation • NAACL 2022 • Lin Tian, Xiuzhen Zhang, Jey Han Lau
Social media rumours, a form of misinformation, can mislead the public and cause significant economic and social disruption.
1 code implementation • 3 Mar 2022 • Miao Li, Jianzhong Qi, Jey Han Lau
We present PeerSum, a new MDS dataset using peer reviews of scientific publications.
1 code implementation • 2 May 2023 • Miao Li, Eduard Hovy, Jey Han Lau
We present PeerSum, a novel dataset for generating meta-reviews of scientific papers.
1 code implementation • ALTA 2019 • Fajri Koto, Jey Han Lau, Timothy Baldwin
We empirically investigate the benefit of the proposed approach on two different tasks: abstractive summarization and popularity prediction of online petitions.
1 code implementation • EACL 2021 • Fajri Koto, Jey Han Lau, Timothy Baldwin
We introduce a top-down approach to discourse parsing that is conceptually simpler than its predecessors (Kobayashi et al., 2020; Zhang et al., 2020).
Ranked #7 on Discourse Parsing on RST-DT (Standard Parseval (Span) metric)
1 code implementation • NAACL 2021 • Fajri Koto, Jey Han Lau, Timothy Baldwin
Existing work on probing of pretrained language models (LMs) has predominantly focused on sentence-level syntactic tasks.
1 code implementation • 2 Apr 2020 • Jey Han Lau, Carlos S. Armendariz, Shalom Lappin, Matthew Purver, Chang Shu
We study the influence of context on sentence acceptability.
1 code implementation • 26 May 2023 • Rongxin Zhu, Jianzhong Qi, Jey Han Lau
A series of datasets and models have been proposed for summaries generated for well-formatted documents such as news articles.
1 code implementation • ACL 2018 • Jean-Philippe Bernardy, Shalom Lappin, Jey Han Lau
We investigate the influence that document context exerts on human acceptability judgements for English sentences, via two sets of experiments.
1 code implementation • COLING 2022 • Takashi Wada, Timothy Baldwin, Yuji Matsumoto, Jey Han Lau
We propose a new unsupervised method for lexical substitution using pre-trained language models.
1 code implementation • 1 Nov 2023 • Takashi Wada, Timothy Baldwin, Jey Han Lau
We propose a new unsupervised lexical simplification method that uses only monolingual data and pre-trained language models.
1 code implementation • 6 Oct 2022 • Thinh Hung Truong, Yulia Otmakhova, Timothy Baldwin, Trevor Cohn, Jey Han Lau, Karin Verspoor
Negation is poorly captured by current language models, although the extent of this problem is not widely understood.
1 code implementation • 12 Mar 2023 • Miao Li, Jianzhong Qi, Jey Han Lau
We propose HGSUM, an MDS model that extends an encoder-decoder architecture, to incorporate a heterogeneous graph to represent different semantic units (e. g., words and sentences) of the documents.
2 code implementations • 27 Nov 2020 • Fajri Koto, Timothy Baldwin, Jey Han Lau
In this paper, we propose FFCI, a framework for fine-grained summarization evaluation that comprises four elements: faithfulness (degree of factual consistency with the source), focus (precision of summary content relative to the reference), coverage (recall of summary content relative to the reference), and inter-sentential coherence (document fluency between adjacent sentences).
1 code implementation • Findings (ACL) 2021 • Fajri Koto, Jey Han Lau, Timothy Baldwin
We take a summarization corpus for eight different languages, and manually annotate generated summaries for focus (precision) and coverage (recall).
1 code implementation • CSRR (ACL) 2022 • Fajri Koto, Timothy Baldwin, Jey Han Lau
Story comprehension that involves complex causal and temporal relations is a critical task in NLP, but previous studies have focused predominantly on English, leaving open the question of how the findings generalize to other languages, such as Indonesian.
1 code implementation • 13 Mar 2023 • Lin Tian, Xiuzhen Zhang, Jey Han Lau
State-sponsored trolls are the main actors of influence campaigns on social media and automatic troll detection is important to combat misinformation at scale.
1 code implementation • 9 Jul 2020 • Shiwei Zhang, Xiuzhen Zhang, Jey Han Lau, Jeffrey Chan, Cecile Paris
In the literature, PQA is formulated as a retrieval problem with the goal to search for the most relevant reviews to answer a given product question.
1 code implementation • 15 Mar 2023 • Zhuohan Xie, Miao Li, Trevor Cohn, Jey Han Lau
Numerous evaluation metrics have been developed for natural language generation tasks, but their effectiveness in evaluating stories is limited as they are not specifically tailored to assess intricate aspects of storytelling, such as fluency and interestingness.
1 code implementation • 2 Jun 2023 • Takashi Wada, Yuji Matsumoto, Timothy Baldwin, Jey Han Lau
We propose an unsupervised approach to paraphrasing multiword expressions (MWEs) in context.
no code implementations • 1 Jun 2018 • Khoi-Nguyen Tran, Jey Han Lau, Danish Contractor, Utkarsh Gupta, Bikram Sengupta, Christopher J. Butler, Mukesh Mohania
Instructional Systems Design is the practice of creating of instructional experiences that make the acquisition of knowledge and skill more efficient, effective, and appealing.
no code implementations • 11 Apr 2018 • Maryam Fanaeepour, Adam Makarucha, Jey Han Lau
The versatility of word embeddings for various applications is attracting researchers from various fields.
no code implementations • CONLL 2017 • Shraey Bhatia, Jey Han Lau, Timothy Baldwin
Topic models jointly learn topics and document-level topic distribution.
no code implementations • EMNLP 2018 • Shraey Bhatia, Jey Han Lau, Timothy Baldwin
Topic coherence is increasingly being used to evaluate topic models and filter topics for end-user applications.
no code implementations • EACL 2017 • Ionut Sorodoc, Jey Han Lau, Nikolaos Aletras, Timothy Baldwin
Automatic topic labelling is the task of generating a succinct label that summarises the theme or subject of a topic, with the intention of reducing the cognitive load of end-users when interpreting these topics.
no code implementations • WS 2018 • Steven Xu, Andrew Bennett, Doris Hoogeveen, Jey Han Lau, Timothy Baldwin
Community question answering (cQA) forums provide a rich source of data for facilitating non-factoid question answering over many technical domains.
no code implementations • WS 2017 • Ying Xu, Jey Han Lau, Timothy Baldwin, Trevor Cohn
With this decoupled architecture, we decrease the number of parameters in the decoder substantially, and shorten its training time.
no code implementations • TACL 2014 • Marco Lui, Jey Han Lau, Timothy Baldwin
Language identification is the task of automatically detecting the language(s) present in a document based on the content of the document.
no code implementations • NAACL 2019 • Kaimin Zhou, Chang Shu, Binyang Li, Jey Han Lau
Motivated by this, our paper focuses on the task of rumour detection; particularly, we are interested in understanding how early we can detect them.
no code implementations • ALTA 2019 • Zhuohan Xie, Jey Han Lau, Trevor Cohn
In this paper, we adapt Deep-speare, a joint neural network model for English sonnets, to Chinese poetry.
no code implementations • 22 Jan 2020 • Ying Xu, Xu Zhong, Antonio Jose Jimeno Yepes, Jey Han Lau
An adversarial example is an input transformed by small perturbations that machine learning models consistently misclassify.
no code implementations • 30 Apr 2020 • Shraey Bhatia, Jey Han Lau, Timothy Baldwin
The world is facing the challenge of climate crisis.
no code implementations • ACL 2020 • Kobi Leins, Jey Han Lau, Timothy Baldwin
We focus in particular on the role of data statements in ethically assessing research, but also discuss the topic of dual use, and examine the outcomes of similar debates in other scientific disciplines.
no code implementations • 18 Aug 2020 • Karin Verspoor, Simon Šuster, Yulia Otmakhova, Shevon Mendis, Zenan Zhai, Biaoyan Fang, Jey Han Lau, Timothy Baldwin, Antonio Jimeno Yepes, David Martinez
We present COVID-SEE, a system for medical literature discovery based on the concept of information exploration, which builds on several distinct text analysis and natural language processing methods to structure and organise information in publications, and augments search by providing a visual overview supporting exploration of a collection to identify key articles of interest.
no code implementations • TACL 2020 • Jey Han Lau, Carlos Armendariz, Shalom Lappin, Matthew Purver, Chang Shu
We study the influence of context on sentence acceptability.
no code implementations • COLING 2020 • Fajri Koto, Afshin Rahimi, Jey Han Lau, Timothy Baldwin
Although the Indonesian language is spoken by almost 200 million people and the 10th most spoken language in the world, it is under-represented in NLP research.
no code implementations • NAACL 2021 • Ying Xu, Xu Zhong, Antonio Jimeno Yepes, Jey Han Lau
We introduce a grey-box adversarial attack and defence framework for sentiment classification.
no code implementations • 25 May 2021 • Simon Šuster, Karin Verspoor, Timothy Baldwin, Jey Han Lau, Antonio Jimeno Yepes, David Martinez, Yulia Otmakhova
The COVID-19 pandemic has driven ever-greater demand for tools which enable efficient exploration of biomedical literature.
no code implementations • NAACL 2021 • Shraey Bhatia, Jey Han Lau, Timothy Baldwin
Neutralisation techniques, e. g. denial of responsibility and denial of victim, are used in the narrative of climate change scepticism to justify lack of action or to promote an alternative view.
no code implementations • 30 Jul 2021 • Shraey Bhatia, Jey Han Lau, Timothy Baldwin
There is unison is the scientific community about human induced climate change.
no code implementations • 27 Sep 2021 • Lin Tian, Xiuzhen Zhang, Jey Han Lau
Most rumour detection models for social media are designed for one specific language (mostly English).
no code implementations • EMNLP (NLLP) 2021 • Meladel Mistica, Jey Han Lau, Brayden Merrifield, Kate Fazio, Timothy Baldwin
Free legal assistance is critically under-resourced, and many of those who seek legal help have their needs unmet.
no code implementations • ALTA 2021 • Zhuohan Xie, Trevor Cohn, Jey Han Lau
GPT-2 has been frequently adapted in story generation models as it provides powerful generative capability.
no code implementations • ALTA 2021 • Rongxin Zhu, Jey Han Lau, Jianzhong Qi
Conversation disentanglement, the task to identify separate threads in conversations, is an important pre-processing step in multi-party conversational NLP applications such as conversational question answering and conversation summarization.
Conversational Question Answering Conversation Disentanglement +2
no code implementations • 16 Feb 2022 • Thinh Hung Truong, Yulia Otmakhova, Rahmad Mahendra, Timothy Baldwin, Jey Han Lau, Trevor Cohn, Lawrence Cavedon, Damiano Spina, Karin Verspoor
This paper describes the submissions of the Natural Language Processing (NLP) team from the Australian Research Council Industrial Transformation Training Centre (ITTC) for Cognitive Computing in Medical Technologies to the TREC 2021 Clinical Trials Track.
no code implementations • ACL 2022 • Alham Fikri Aji, Genta Indra Winata, Fajri Koto, Samuel Cahyawijaya, Ade Romadhony, Rahmad Mahendra, Kemal Kurniawan, David Moeljadi, Radityo Eko Prasojo, Timothy Baldwin, Jey Han Lau, Sebastian Ruder
NLP research is impeded by a lack of resources and awareness of the challenges presented by underrepresented languages and dialects.
no code implementations • ECNLP (ACL) 2022 • Fajri Koto, Jey Han Lau, Timothy Baldwin
For any e-commerce service, persuasive, faithful, and informative product descriptions can attract shoppers and improve sales.
no code implementations • ACL 2022 • Yulia Otmakhova, Karin Verspoor, Timothy Baldwin, Jey Han Lau
Although multi-document summarisation (MDS) of the biomedical literature is a highly valuable task that has recently attracted substantial interest, evaluation of the quality of biomedical summaries lacks consistency and transparency.
no code implementations • 20 May 2022 • Shiquan Yang, Xinting Huang, Jey Han Lau, Sarah Erfani
Data artifacts incentivize machine learning models to learn non-transferable generalizations by taking advantage of shortcuts in the data, and there is growing evidence that data artifacts play a role for the strong results that deep learning models achieve in recent natural language processing benchmarks.
no code implementations • NAACL (SIGTYP) 2022 • Yulia Otmakhova, Karin Verspoor, Jey Han Lau
Though recently there have been an increased interest in how pre-trained language models encode different linguistic features, there is still a lack of systematic comparison between languages with different morphology and syntax.
no code implementations • COLING 2022 • Fajri Koto, Timothy Baldwin, Jey Han Lau
Summaries, keyphrases, and titles are different ways of concisely capturing the content of a document.
no code implementations • COLING (CODI, CRAC) 2022 • Andrew Shen, Fajri Koto, Jey Han Lau, Timothy Baldwin
We propose a novel unconstrained bottom-up approach for rhetorical discourse parsing based on sequence labelling of adjacent pairs of discourse units (DUs), based on the framework of Koto et al. (2021).
no code implementations • 24 Jan 2023 • Zhuohan Xie, Trevor Cohn, Jey Han Lau
To enhance the quality of generated stories, recent story generation models have been investigating the utilization of higher-level attributes like plots or commonsense knowledge.
1 code implementation • 13 Feb 2024 • Lin Tian, Xiuzhen Zhang, Jey Han Lau
We apply causal mediation analysis to explain the decision-making process of neural models for rumour detection on Twitter.
no code implementations • 28 Feb 2024 • Miao Li, Jey Han Lau, Eduard Hovy
Modern natural language generation systems with LLMs exhibit the capability to generate a plausible summary of multiple documents; however, it is uncertain if models truly possess the ability of information consolidation to generate summaries, especially on those source documents with opinionated information.