Search Results for author: Jey Han Lau

Found 78 papers, 35 papers with code

Automatic Detection and Language Identification of Multilingual Documents

no code implementations TACL 2014 Marco Lui, Jey Han Lau, Timothy Baldwin

Language identification is the task of automatically detecting the language(s) present in a document based on the content of the document.

Language Identification Machine Translation

An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation

4 code implementations WS 2016 Jey Han Lau, Timothy Baldwin

Recently, Le and Mikolov (2014) proposed doc2vec as an extension to word2vec (Mikolov et al., 2013a) to learn document-level embeddings.

Document Embedding Word Embeddings

Multimodal Topic Labelling

no code implementations EACL 2017 Ionut Sorodoc, Jey Han Lau, Nikolaos Aletras, Timothy Baldwin

Automatic topic labelling is the task of generating a succinct label that summarises the theme or subject of a topic, with the intention of reducing the cognitive load of end-users when interpreting these topics.

Topic Models

Topically Driven Neural Language Model

1 code implementation ACL 2017 Jey Han Lau, Timothy Baldwin, Trevor Cohn

Language models are typically applied at the sentence level, without access to the broader document context.

Language Modelling Sentence

Evaluating Word Embedding Hyper-Parameters for Similarity and Analogy Tasks

no code implementations11 Apr 2018 Maryam Fanaeepour, Adam Makarucha, Jey Han Lau

The versatility of word embeddings for various applications is attracting researchers from various fields.

Word Embeddings

Document Chunking and Learning Objective Generation for Instruction Design

no code implementations1 Jun 2018 Khoi-Nguyen Tran, Jey Han Lau, Danish Contractor, Utkarsh Gupta, Bikram Sengupta, Christopher J. Butler, Mukesh Mohania

Instructional Systems Design is the practice of creating of instructional experiences that make the acquisition of knowledge and skill more efficient, effective, and appealing.

Chunking Descriptive

The Influence of Context on Sentence Acceptability Judgements

1 code implementation ACL 2018 Jean-Philippe Bernardy, Shalom Lappin, Jey Han Lau

We investigate the influence that document context exerts on human acceptability judgements for English sentences, via two sets of experiments.

Language Modelling Machine Translation +2

Topic Intrusion for Automatic Topic Model Evaluation

no code implementations EMNLP 2018 Shraey Bhatia, Jey Han Lau, Timothy Baldwin

Topic coherence is increasingly being used to evaluate topic models and filter topics for end-user applications.

Information Retrieval Topic Models

From Shakespeare to Li-Bai: Adapting a Sonnet Model to Chinese Poetry

no code implementations ALTA 2019 Zhuohan Xie, Jey Han Lau, Trevor Cohn

In this paper, we adapt Deep-speare, a joint neural network model for English sonnets, to Chinese poetry.

Early Rumour Detection

no code implementations NAACL 2019 Kaimin Zhou, Chang Shu, Binyang Li, Jey Han Lau

Motivated by this, our paper focuses on the task of rumour detection; particularly, we are interested in understanding how early we can detect them.

Rumour Detection

Improved Document Modelling with a Neural Discourse Parser

1 code implementation ALTA 2019 Fajri Koto, Jey Han Lau, Timothy Baldwin

We empirically investigate the benefit of the proposed approach on two different tasks: abstractive summarization and popularity prediction of online petitions.

Abstractive Text Summarization Text Generation

Elephant in the Room: An Evaluation Framework for Assessing Adversarial Examples in NLP

no code implementations22 Jan 2020 Ying Xu, Xu Zhong, Antonio Jose Jimeno Yepes, Jey Han Lau

An adversarial example is an input transformed by small perturbations that machine learning models consistently misclassify.

Sentence

Give Me Convenience and Give Her Death: Who Should Decide What Uses of NLP are Appropriate, and on What Basis?

no code implementations ACL 2020 Kobi Leins, Jey Han Lau, Timothy Baldwin

We focus in particular on the role of data statements in ethically assessing research, but also discuss the topic of dual use, and examine the outcomes of similar debates in other scientific disciplines.

Less is More: Rejecting Unreliable Reviews for Product Question Answering

1 code implementation9 Jul 2020 Shiwei Zhang, Xiuzhen Zhang, Jey Han Lau, Jeffrey Chan, Cecile Paris

In the literature, PQA is formulated as a retrieval problem with the goal to search for the most relevant reviews to answer a given product question.

Community Question Answering Conformal Prediction +1

COVID-SEE: Scientific Evidence Explorer for COVID-19 Related Research

no code implementations18 Aug 2020 Karin Verspoor, Simon Šuster, Yulia Otmakhova, Shevon Mendis, Zenan Zhai, Biaoyan Fang, Jey Han Lau, Timothy Baldwin, Antonio Jimeno Yepes, David Martinez

We present COVID-SEE, a system for medical literature discovery based on the concept of information exploration, which builds on several distinct text analysis and natural language processing methods to structure and organise information in publications, and augments search by providing a visual overview supporting exploration of a collection to identify key articles of interest.

IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP

no code implementations COLING 2020 Fajri Koto, Afshin Rahimi, Jey Han Lau, Timothy Baldwin

Although the Indonesian language is spoken by almost 200 million people and the 10th most spoken language in the world, it is under-represented in NLP research.

Benchmarking Language Modelling

FFCI: A Framework for Interpretable Automatic Evaluation of Summarization

2 code implementations27 Nov 2020 Fajri Koto, Timothy Baldwin, Jey Han Lau

In this paper, we propose FFCI, a framework for fine-grained summarization evaluation that comprises four elements: faithfulness (degree of factual consistency with the source), focus (precision of summary content relative to the reference), coverage (recall of summary content relative to the reference), and inter-sentential coherence (document fluency between adjacent sentences).

Question Answering Semantic Textual Similarity +2

Top-down Discourse Parsing via Sequence Labelling

1 code implementation EACL 2021 Fajri Koto, Jey Han Lau, Timothy Baldwin

We introduce a top-down approach to discourse parsing that is conceptually simpler than its predecessors (Kobayashi et al., 2020; Zhang et al., 2020).

Ranked #7 on Discourse Parsing on RST-DT (Standard Parseval (Span) metric)

Discourse Parsing

Discourse Probing of Pretrained Language Models

1 code implementation NAACL 2021 Fajri Koto, Jey Han Lau, Timothy Baldwin

Existing work on probing of pretrained language models (LMs) has predominantly focused on sentence-level syntactic tasks.

Sentence

Automatic Classification of Neutralization Techniques in the Narrative of Climate Change Scepticism

no code implementations NAACL 2021 Shraey Bhatia, Jey Han Lau, Timothy Baldwin

Neutralisation techniques, e. g. denial of responsibility and denial of victim, are used in the narrative of climate change scepticism to justify lack of action or to promote an alternative view.

Evaluating the Efficacy of Summarization Evaluation across Languages

1 code implementation Findings (ACL) 2021 Fajri Koto, Jey Han Lau, Timothy Baldwin

We take a summarization corpus for eight different languages, and manually annotate generated summaries for focus (precision) and coverage (recall).

IndoBERTweet: A Pretrained Language Model for Indonesian Twitter with Effective Domain-Specific Vocabulary Initialization

1 code implementation EMNLP 2021 Fajri Koto, Jey Han Lau, Timothy Baldwin

We present IndoBERTweet, the first large-scale pretrained model for Indonesian Twitter that is trained by extending a monolingually-trained Indonesian BERT model with additive domain-specific vocabulary.

Language Modelling

Findings on Conversation Disentanglement

no code implementations ALTA 2021 Rongxin Zhu, Jey Han Lau, Jianzhong Qi

Conversation disentanglement, the task to identify separate threads in conversations, is an important pre-processing step in multi-party conversational NLP applications such as conversational question answering and conversation summarization.

Conversational Question Answering Conversation Disentanglement +2

ITTC @ TREC 2021 Clinical Trials Track

no code implementations16 Feb 2022 Thinh Hung Truong, Yulia Otmakhova, Rahmad Mahendra, Timothy Baldwin, Jey Han Lau, Trevor Cohn, Lawrence Cavedon, Damiano Spina, Karin Verspoor

This paper describes the submissions of the Natural Language Processing (NLP) team from the Australian Research Council Industrial Transformation Training Centre (ITTC) for Cognitive Computing in Medical Technologies to the TREC 2021 Clinical Trials Track.

Retrieval

An Interpretable Neuro-Symbolic Reasoning Framework for Task-Oriented Dialogue Generation

1 code implementation ACL 2022 Shiquan Yang, Rui Zhang, Sarah Erfani, Jey Han Lau

To obtain a transparent reasoning process, we introduce neuro-symbolic to perform explicit reasoning that justifies model decisions by reasoning chains.

Dialogue Generation Task-Oriented Dialogue Systems +1

Robust Task-Oriented Dialogue Generation with Contrastive Pre-training and Adversarial Filtering

no code implementations20 May 2022 Shiquan Yang, Xinting Huang, Jey Han Lau, Sarah Erfani

Data artifacts incentivize machine learning models to learn non-transferable generalizations by taking advantage of shortcuts in the data, and there is growing evidence that data artifacts play a role for the strong results that deep learning models achieve in recent natural language processing benchmarks.

Contrastive Learning Dialogue Generation

The Next Chapter: A Study of Large Language Models in Storytelling

no code implementations24 Jan 2023 Zhuohan Xie, Trevor Cohn, Jey Han Lau

To enhance the quality of generated stories, recent story generation models have been investigating the utilization of higher-level attributes like plots or commonsense knowledge.

Story Generation World Knowledge

Compressed Heterogeneous Graph for Abstractive Multi-Document Summarization

1 code implementation12 Mar 2023 Miao Li, Jianzhong Qi, Jey Han Lau

We propose HGSUM, an MDS model that extends an encoder-decoder architecture, to incorporate a heterogeneous graph to represent different semantic units (e. g., words and sentences) of the documents.

Document Summarization Graph Similarity +1

MetaTroll: Few-shot Detection of State-Sponsored Trolls with Transformer Adapters

1 code implementation13 Mar 2023 Lin Tian, Xiuzhen Zhang, Jey Han Lau

State-sponsored trolls are the main actors of influence campaigns on social media and automatic troll detection is important to combat misinformation at scale.

Few-Shot Text Classification Meta-Learning +2

DeltaScore: Fine-Grained Story Evaluation with Perturbations

1 code implementation15 Mar 2023 Zhuohan Xie, Miao Li, Trevor Cohn, Jey Han Lau

Numerous evaluation metrics have been developed for natural language generation tasks, but their effectiveness in evaluating stories is limited as they are not specifically tailored to assess intricate aspects of storytelling, such as fluency and interestingness.

Language Modelling Story Generation

Annotating and Detecting Fine-grained Factual Errors for Dialogue Summarization

1 code implementation26 May 2023 Rongxin Zhu, Jianzhong Qi, Jey Han Lau

A series of datasets and models have been proposed for summaries generated for well-formatted documents such as news articles.

Multi-Label Classification Sentence

Unsupervised Paraphrasing of Multiword Expressions

1 code implementation2 Jun 2023 Takashi Wada, Yuji Matsumoto, Timothy Baldwin, Jey Han Lau

We propose an unsupervised approach to paraphrasing multiword expressions (MWEs) in context.

text similarity

Unsupervised Lexical Simplification with Context Augmentation

1 code implementation1 Nov 2023 Takashi Wada, Timothy Baldwin, Jey Han Lau

We propose a new unsupervised lexical simplification method that uses only monolingual data and pre-trained language models.

Lexical Simplification

CMA-R:Causal Mediation Analysis for Explaining Rumour Detection

1 code implementation13 Feb 2024 Lin Tian, Xiuzhen Zhang, Jey Han Lau

We apply causal mediation analysis to explain the decision-making process of neural models for rumour detection on Twitter.

Decision Making Rumour Detection

Exploring Multi-Document Information Consolidation for Scientific Sentiment Summarization

no code implementations28 Feb 2024 Miao Li, Jey Han Lau, Eduard Hovy

Modern natural language generation systems with LLMs exhibit the capability to generate a plausible summary of multiple documents; however, it is uncertain if models truly possess the ability of information consolidation to generate summaries, especially on those source documents with opinionated information.

Review Generation Text Generation

Cross-linguistic Comparison of Linguistic Feature Encoding in BERT Models for Typologically Different Languages

no code implementations NAACL (SIGTYP) 2022 Yulia Otmakhova, Karin Verspoor, Jey Han Lau

Though recently there have been an increased interest in how pre-trained language models encode different linguistic features, there is still a lack of systematic comparison between languages with different morphology and syntax.

DUCK: Rumour Detection on Social Media by Modelling User and Comment Propagation Networks

1 code implementation NAACL 2022 Lin Tian, Xiuzhen Zhang, Jey Han Lau

Social media rumours, a form of misinformation, can mislead the public and cause significant economic and social disruption.

Graph Attention Misinformation +1

Easy-First Bottom-Up Discourse Parsing via Sequence Labelling

no code implementations COLING (CODI, CRAC) 2022 Andrew Shen, Fajri Koto, Jey Han Lau, Timothy Baldwin

We propose a novel unconstrained bottom-up approach for rhetorical discourse parsing based on sequence labelling of adjacent pairs of discourse units (DUs), based on the framework of Koto et al. (2021).

Discourse Parsing

Cloze Evaluation for Deeper Understanding of Commonsense Stories in Indonesian

1 code implementation CSRR (ACL) 2022 Fajri Koto, Timothy Baldwin, Jey Han Lau

Story comprehension that involves complex causal and temporal relations is a critical task in NLP, but previous studies have focused predominantly on English, leaving open the question of how the findings generalize to other languages, such as Indonesian.

Cloze Test Sentence +1

The patient is more dead than alive: exploring the current state of the multi-document summarisation of the biomedical literature

no code implementations ACL 2022 Yulia Otmakhova, Karin Verspoor, Timothy Baldwin, Jey Han Lau

Although multi-document summarisation (MDS) of the biomedical literature is a highly valuable task that has recently attracted substantial interest, evaluation of the quality of biomedical summaries lacks consistency and transparency.

Semi-automatic Triage of Requests for Free Legal Assistance

no code implementations EMNLP (NLLP) 2021 Meladel Mistica, Jey Han Lau, Brayden Merrifield, Kate Fazio, Timothy Baldwin

Free legal assistance is critically under-resourced, and many of those who seek legal help have their needs unmet.

Fairness

Cannot find the paper you are looking for? You can Submit a new open access paper.