Search Results for author: Jey Han Lau

Found 78 papers, 35 papers with code

An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation

4 code implementations • WS 2016 • Jey Han Lau, Timothy Baldwin

Recently, Le and Mikolov (2014) proposed doc2vec as an extension to word2vec (Mikolov et al., 2013a) to learn document-level embeddings.

Document Embedding Word Embeddings

635

Paper
Code

Automatic Labelling of Topics with Neural Embeddings

1 code implementation • COLING 2016 • Shraey Bhatia, Jey Han Lau, Timothy Baldwin

Topics generated by topic models are typically represented as list of terms.

Topic Models

183

Paper
Code

Machine Reading Tea Leaves: Automatically Evaluating Topic Coherence and Topic Model Quality

1 code implementation • EACL 2014 • Jey Han Lau, David Newman, Timothy Baldwin

Machine Translation Reading Comprehension +1

180

Paper
Code

Topically Driven Neural Language Model

1 code implementation • ACL 2017 • Jey Han Lau, Timothy Baldwin, Trevor Cohn

Language models are typically applied at the sentence level, without access to the broader document context.

Language Modelling Sentence

138

Paper
Code

NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages

2 code implementations • 31 May 2022 • Genta Indra Winata, Alham Fikri Aji, Samuel Cahyawijaya, Rahmad Mahendra, Fajri Koto, Ade Romadhony, Kemal Kurniawan, David Moeljadi, Radityo Eko Prasojo, Pascale Fung, Timothy Baldwin, Jey Han Lau, Rico Sennrich, Sebastian Ruder

In this work, we focus on developing resources for languages in Indonesia.

Machine Translation Translation

Paper
Code

On-line Trend Analysis with Topic Models: \#twitter Trends Detection Topic Model Online

1 code implementation • COLING 2012 • Jey Han Lau, Nigel Collier, Timothy Baldwin

Topic Models

Paper
Code

Deep-speare: A Joint Neural Model of Poetic Language, Meter and Rhyme

1 code implementation • ACL 2018 • Jey Han Lau, Trevor Cohn, Timothy Baldwin, Julian Brooke, Adam Hammond

In this paper, we propose a joint architecture that captures language, rhyme and meter for sonnet modelling.

Language Modelling

Paper
Code

IndoBERTweet: A Pretrained Language Model for Indonesian Twitter with Effective Domain-Specific Vocabulary Initialization

1 code implementation • EMNLP 2021 • Fajri Koto, Jey Han Lau, Timothy Baldwin

We present IndoBERTweet, the first large-scale pretrained model for Indonesian Twitter that is trained by extending a monolingually-trained Indonesian BERT model with additive domain-specific vocabulary.

Language Modelling

Paper
Code

Liputan6: A Large-scale Indonesian Dataset for Text Summarization

1 code implementation • Asian Chapter of the Association for Computational Linguistics 2020 • Fajri Koto, Jey Han Lau, Timothy Baldwin

In this paper, we introduce a large-scale Indonesian summarization dataset.

Abstractive Text Summarization

Paper
Code

Learning Contextualised Cross-lingual Word Embeddings and Alignments for Extremely Low-Resource Languages Using Parallel Corpora

1 code implementation • EMNLP (MRL) 2021 • Takashi Wada, Tomoharu Iwata, Yuji Matsumoto, Timothy Baldwin, Jey Han Lau

We propose a new approach for learning contextualised cross-lingual word embeddings based on a small parallel corpus (e. g. a few hundred sentence pairs).

Bilingual Lexicon Induction Cross-Lingual Word Embeddings +4

Paper
Code

An Interpretable Neuro-Symbolic Reasoning Framework for Task-Oriented Dialogue Generation

1 code implementation • ACL 2022 • Shiquan Yang, Rui Zhang, Sarah Erfani, Jey Han Lau

To obtain a transparent reasoning process, we introduce neuro-symbolic to perform explicit reasoning that justifies model decisions by reasoning chains.

Dialogue Generation Task-Oriented Dialogue Systems +1

Paper
Code

LED down the rabbit hole: exploring the potential of global attention for biomedical multi-document summarisation

2 code implementations • sdp (COLING) 2022 • Yulia Otmakhova, Hung Thinh Truong, Timothy Baldwin, Trevor Cohn, Karin Verspoor, Jey Han Lau

In this paper we report on our submission to the Multidocument Summarisation for Literature Review (MSLR) shared task.

Paper
Code

End-to-end Network for Twitter Geolocation Prediction and Hashing

1 code implementation • IJCNLP 2017 • Jey Han Lau, Lianhua Chi, Khoi-Nguyen Tran, Trevor Cohn

We propose an end-to-end neural network to predict the geolocation of a tweet.

Feature Engineering

Paper
Code

DUCK: Rumour Detection on Social Media by Modelling User and Comment Propagation Networks

1 code implementation • NAACL 2022 • Lin Tian, Xiuzhen Zhang, Jey Han Lau

Social media rumours, a form of misinformation, can mislead the public and cause significant economic and social disruption.

Graph Attention Misinformation +1

Paper
Code

PeerSum: A Peer Review Dataset for Abstractive Multi-document Summarization

1 code implementation • 3 Mar 2022 • Miao Li, Jianzhong Qi, Jey Han Lau

We present PeerSum, a new MDS dataset using peer reviews of scientific publications.

Document Summarization Multi-Document Summarization

Paper
Code

Summarizing Multiple Documents with Conversational Structure for Meta-Review Generation

1 code implementation • 2 May 2023 • Miao Li, Eduard Hovy, Jey Han Lau

We present PeerSum, a novel dataset for generating meta-reviews of scientific papers.

Document Summarization Inductive Bias +2

Paper
Code

Improved Document Modelling with a Neural Discourse Parser

1 code implementation • ALTA 2019 • Fajri Koto, Jey Han Lau, Timothy Baldwin

We empirically investigate the benefit of the proposed approach on two different tasks: abstractive summarization and popularity prediction of online petitions.

Abstractive Text Summarization Text Generation

Paper
Code

Top-down Discourse Parsing via Sequence Labelling

1 code implementation • EACL 2021 • Fajri Koto, Jey Han Lau, Timothy Baldwin

We introduce a top-down approach to discourse parsing that is conceptually simpler than its predecessors (Kobayashi et al., 2020; Zhang et al., 2020).

Ranked #7 on Discourse Parsing on RST-DT (Standard Parseval (Span) metric)

Discourse Parsing

Paper
Code

Discourse Probing of Pretrained Language Models

1 code implementation • NAACL 2021 • Fajri Koto, Jey Han Lau, Timothy Baldwin

Existing work on probing of pretrained language models (LMs) has predominantly focused on sentence-level syntactic tasks.

Sentence

Paper
Code

unimelb: Topic Modelling-based Word Sense Induction

1 code implementation • SEMEVAL 2013 • Jey Han Lau, Paul Cook, Timothy Baldwin

Word Sense Induction

Paper
Code

How Furiously Can Colourless Green Ideas Sleep? Sentence Acceptability in Context

1 code implementation • 2 Apr 2020 • Jey Han Lau, Carlos S. Armendariz, Shalom Lappin, Matthew Purver, Chang Shu

We study the influence of context on sentence acceptability.

Sentence

Paper
Code

Annotating and Detecting Fine-grained Factual Errors for Dialogue Summarization

1 code implementation • 26 May 2023 • Rongxin Zhu, Jianzhong Qi, Jey Han Lau

A series of datasets and models have been proposed for summaries generated for well-formatted documents such as news articles.

Multi-Label Classification Sentence

Paper
Code

The Influence of Context on Sentence Acceptability Judgements

1 code implementation • ACL 2018 • Jean-Philippe Bernardy, Shalom Lappin, Jey Han Lau

We investigate the influence that document context exerts on human acceptability judgements for English sentences, via two sets of experiments.

Language Modelling Machine Translation +2

Paper
Code

Unsupervised Lexical Substitution with Decontextualised Embeddings

1 code implementation • COLING 2022 • Takashi Wada, Timothy Baldwin, Yuji Matsumoto, Jey Han Lau

We propose a new unsupervised method for lexical substitution using pre-trained language models.

Word Embeddings

Paper
Code

Unsupervised Lexical Simplification with Context Augmentation

1 code implementation • 1 Nov 2023 • Takashi Wada, Timothy Baldwin, Jey Han Lau

We propose a new unsupervised lexical simplification method that uses only monolingual data and pre-trained language models.

Lexical Simplification

Paper
Code

Not another Negation Benchmark: The NaN-NLI Test Suite for Sub-clausal Negation

1 code implementation • 6 Oct 2022 • Thinh Hung Truong, Yulia Otmakhova, Timothy Baldwin, Trevor Cohn, Jey Han Lau, Karin Verspoor

Negation is poorly captured by current language models, although the extent of this problem is not widely understood.

Natural Language Inference Negation

Paper
Code

Compressed Heterogeneous Graph for Abstractive Multi-Document Summarization

1 code implementation • 12 Mar 2023 • Miao Li, Jianzhong Qi, Jey Han Lau

We propose HGSUM, an MDS model that extends an encoder-decoder architecture, to incorporate a heterogeneous graph to represent different semantic units (e. g., words and sentences) of the documents.

Document Summarization Graph Similarity +1

Paper
Code

FFCI: A Framework for Interpretable Automatic Evaluation of Summarization

2 code implementations • 27 Nov 2020 • Fajri Koto, Timothy Baldwin, Jey Han Lau

In this paper, we propose FFCI, a framework for fine-grained summarization evaluation that comprises four elements: faithfulness (degree of factual consistency with the source), focus (precision of summary content relative to the reference), coverage (recall of summary content relative to the reference), and inter-sentential coherence (document fluency between adjacent sentences).

Question Answering Semantic Textual Similarity +2

Paper
Code

Evaluating the Efficacy of Summarization Evaluation across Languages

1 code implementation • Findings (ACL) 2021 • Fajri Koto, Jey Han Lau, Timothy Baldwin

We take a summarization corpus for eight different languages, and manually annotate generated summaries for focus (precision) and coverage (recall).

Paper
Code

Cloze Evaluation for Deeper Understanding of Commonsense Stories in Indonesian

1 code implementation • CSRR (ACL) 2022 • Fajri Koto, Timothy Baldwin, Jey Han Lau

Story comprehension that involves complex causal and temporal relations is a critical task in NLP, but previous studies have focused predominantly on English, leaving open the question of how the findings generalize to other languages, such as Indonesian.

Cloze Test Sentence +1

Paper
Code

MetaTroll: Few-shot Detection of State-Sponsored Trolls with Transformer Adapters

1 code implementation • 13 Mar 2023 • Lin Tian, Xiuzhen Zhang, Jey Han Lau

State-sponsored trolls are the main actors of influence campaigns on social media and automatic troll detection is important to combat misinformation at scale.

Few-Shot Text Classification Meta-Learning +2

Paper
Code

Less is More: Rejecting Unreliable Reviews for Product Question Answering

1 code implementation • 9 Jul 2020 • Shiwei Zhang, Xiuzhen Zhang, Jey Han Lau, Jeffrey Chan, Cecile Paris

In the literature, PQA is formulated as a retrieval problem with the goal to search for the most relevant reviews to answer a given product question.

Community Question Answering Conformal Prediction +1

Paper
Code

DeltaScore: Fine-Grained Story Evaluation with Perturbations

1 code implementation • 15 Mar 2023 • Zhuohan Xie, Miao Li, Trevor Cohn, Jey Han Lau

Numerous evaluation metrics have been developed for natural language generation tasks, but their effectiveness in evaluating stories is limited as they are not specifically tailored to assess intricate aspects of storytelling, such as fluency and interestingness.

Language Modelling Story Generation

Paper
Code

Unsupervised Paraphrasing of Multiword Expressions

1 code implementation • 2 Jun 2023 • Takashi Wada, Yuji Matsumoto, Timothy Baldwin, Jey Han Lau

We propose an unsupervised approach to paraphrasing multiword expressions (MWEs) in context.

text similarity

Paper
Code

Document Chunking and Learning Objective Generation for Instruction Design

no code implementations • 1 Jun 2018 • Khoi-Nguyen Tran, Jey Han Lau, Danish Contractor, Utkarsh Gupta, Bikram Sengupta, Christopher J. Butler, Mukesh Mohania

Instructional Systems Design is the practice of creating of instructional experiences that make the acquisition of knowledge and skill more efficient, effective, and appealing.

Chunking Descriptive

Paper
Add Code

Evaluating Word Embedding Hyper-Parameters for Similarity and Analogy Tasks

no code implementations • 11 Apr 2018 • Maryam Fanaeepour, Adam Makarucha, Jey Han Lau

The versatility of word embeddings for various applications is attracting researchers from various fields.

Word Embeddings

Paper
Add Code

An Automatic Approach for Document-level Topic Model Evaluation

no code implementations • CONLL 2017 • Shraey Bhatia, Jey Han Lau, Timothy Baldwin

Topic models jointly learn topics and document-level topic distribution.

Topic Models

Paper
Add Code

Topic Intrusion for Automatic Topic Model Evaluation

no code implementations • EMNLP 2018 • Shraey Bhatia, Jey Han Lau, Timothy Baldwin

Topic coherence is increasingly being used to evaluate topic models and filter topics for end-user applications.

Information Retrieval Topic Models

Paper
Add Code

LexSemTm: A Semantic Dataset Based on All-words Unsupervised Sense Distribution Learning

no code implementations • ACL 2016 • Andrew Bennett, Timothy Baldwin, Jey Han Lau, Diana McCarthy, Francis Bond

Lexical Simplification Natural Language Inference +1

Paper
Add Code

Multimodal Topic Labelling

no code implementations • EACL 2017 • Ionut Sorodoc, Jey Han Lau, Nikolaos Aletras, Timothy Baldwin

Automatic topic labelling is the task of generating a succinct label that summarises the theme or subject of a topic, with the intention of reducing the cognitive load of end-users when interpreting these topics.

Topic Models

Paper
Add Code

The Sensitivity of Topic Coherence Evaluation to Topic Cardinality

no code implementations • NAACL 2016 • Jey Han Lau, Timothy Baldwin

Clustering Coherence Evaluation

Paper
Add Code

Preferred Answer Selection in Stack Overflow: Better Text Representations ... and Metadata, Metadata, Metadata

no code implementations • WS 2018 • Steven Xu, Andrew Bennett, Doris Hoogeveen, Jey Han Lau, Timothy Baldwin

Community question answering (cQA) forums provide a rich source of data for facilitating non-factoid question answering over many technical domains.

Answer Selection Community Question Answering +2

Paper
Add Code

Decoupling Encoder and Decoder Networks for Abstractive Document Summarization

no code implementations • WS 2017 • Ying Xu, Jey Han Lau, Timothy Baldwin, Trevor Cohn

With this decoupled architecture, we decrease the number of parameters in the decoder substantially, and shorten its training time.

Abstractive Text Summarization Document Summarization

Paper
Add Code

Automatic Detection and Language Identification of Multilingual Documents

no code implementations • TACL 2014 • Marco Lui, Jey Han Lau, Timothy Baldwin

Language identification is the task of automatically detecting the language(s) present in a document based on the content of the document.

Language Identification Machine Translation

Paper
Add Code

Unsupervised Prediction of Acceptability Judgements

no code implementations • IJCNLP 2015 • Jey Han Lau, Alex Clark, er, Shalom Lappin

Language Modelling Machine Translation +1

Paper
Add Code

Learning Word Sense Distributions, Detecting Unattested Senses and Identifying Novel Senses Using Topic Models

no code implementations • ACL 2014 • Jey Han Lau, Paul Cook, Diana McCarthy, Sp Gella, ana, Timothy Baldwin

Domain Adaptation Topic Models +1

Paper
Add Code

Word Sense Induction for Novel Sense Detection

no code implementations • EACL 2012 • Jey Han Lau, Paul Cook, Diana McCarthy, David Newman, Timothy Baldwin

Information Retrieval Topic Models +1

Paper
Add Code

unimelb: Topic Modelling-based Word Sense Induction for Web Snippet Clustering

no code implementations • SEMEVAL 2013 • Jey Han Lau, Paul Cook, Timothy Baldwin

Clustering Word Sense Induction

Paper
Add Code

Novel Word-sense Identification

no code implementations • COLING 2014 • Paul Cook, Jey Han Lau, Diana McCarthy, Timothy Baldwin

Paper
Add Code

Bayesian Text Segmentation for Index Term Identification and Keyphrase Extraction

no code implementations • COLING 2012 • David Newman, Nagendra Koilada, Jey Han Lau, Timothy Baldwin

Keyphrase Extraction Text Segmentation

Paper
Add Code

Unsupervised Word Class Induction for Under-resourced Languages: A Case Study on Indonesian

no code implementations • IJCNLP 2013 • Meladel Mistica, Jey Han Lau, Timothy Baldwin

Semantic Textual Similarity

Paper
Add Code

Early Rumour Detection

no code implementations • NAACL 2019 • Kaimin Zhou, Chang Shu, Binyang Li, Jey Han Lau

Motivated by this, our paper focuses on the task of rumour detection; particularly, we are interested in understanding how early we can detect them.

Rumour Detection

Paper
Add Code

From Shakespeare to Li-Bai: Adapting a Sonnet Model to Chinese Poetry

no code implementations • ALTA 2019 • Zhuohan Xie, Jey Han Lau, Trevor Cohn

In this paper, we adapt Deep-speare, a joint neural network model for English sonnets, to Chinese poetry.

Paper
Add Code

Elephant in the Room: An Evaluation Framework for Assessing Adversarial Examples in NLP

no code implementations • 22 Jan 2020 • Ying Xu, Xu Zhong, Antonio Jose Jimeno Yepes, Jey Han Lau

An adversarial example is an input transformed by small perturbations that machine learning models consistently misclassify.

Sentence

Paper
Add Code

You are right. I am ALARMED -- But by Climate Change Counter Movement

no code implementations • 30 Apr 2020 • Shraey Bhatia, Jey Han Lau, Timothy Baldwin

The world is facing the challenge of climate crisis.

Misinformation

Paper
Add Code

Give Me Convenience and Give Her Death: Who Should Decide What Uses of NLP are Appropriate, and on What Basis?

no code implementations • ACL 2020 • Kobi Leins, Jey Han Lau, Timothy Baldwin

We focus in particular on the role of data statements in ethically assessing research, but also discuss the topic of dual use, and examine the outcomes of similar debates in other scientific disciplines.

Paper
Add Code

COVID-SEE: Scientific Evidence Explorer for COVID-19 Related Research

no code implementations • 18 Aug 2020 • Karin Verspoor, Simon Šuster, Yulia Otmakhova, Shevon Mendis, Zenan Zhai, Biaoyan Fang, Jey Han Lau, Timothy Baldwin, Antonio Jimeno Yepes, David Martinez

We present COVID-SEE, a system for medical literature discovery based on the concept of information exploration, which builds on several distinct text analysis and natural language processing methods to structure and organise information in publications, and augments search by providing a visual overview supporting exploration of a collection to identify key articles of interest.

Paper
Add Code

How Furiously Can Colorless Green Ideas Sleep? Sentence Acceptability in Context

no code implementations • TACL 2020 • Jey Han Lau, Carlos Armendariz, Shalom Lappin, Matthew Purver, Chang Shu

We study the influence of context on sentence acceptability.

Sentence

Paper
Add Code

IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP

no code implementations • COLING 2020 • Fajri Koto, Afshin Rahimi, Jey Han Lau, Timothy Baldwin

Although the Indonesian language is spoken by almost 200 million people and the 10th most spoken language in the world, it is under-represented in NLP research.

Benchmarking Language Modelling

Paper
Add Code

Grey-box Adversarial Attack And Defence For Sentiment Classification

no code implementations • NAACL 2021 • Ying Xu, Xu Zhong, Antonio Jimeno Yepes, Jey Han Lau

We introduce a grey-box adversarial attack and defence framework for sentiment classification.

Adversarial Attack Classification +3

Paper
Add Code

Impact of detecting clinical trial elements in exploration of COVID-19 literature

no code implementations • 25 May 2021 • Simon Šuster, Karin Verspoor, Timothy Baldwin, Jey Han Lau, Antonio Jimeno Yepes, David Martinez, Yulia Otmakhova

The COVID-19 pandemic has driven ever-greater demand for tools which enable efficient exploration of biomedical literature.

Efficient Exploration PICO +1

Paper
Add Code

Automatic Classification of Neutralization Techniques in the Narrative of Climate Change Scepticism

no code implementations • NAACL 2021 • Shraey Bhatia, Jey Han Lau, Timothy Baldwin

Neutralisation techniques, e. g. denial of responsibility and denial of victim, are used in the narrative of climate change scepticism to justify lack of action or to promote an alternative view.

Paper
Add Code

Automatic Claim Review for Climate Science via Explanation Generation

no code implementations • 30 Jul 2021 • Shraey Bhatia, Jey Han Lau, Timothy Baldwin

There is unison is the scientific community about human induced climate change.

Explanation Generation Fact Checking +1

Paper
Add Code

Rumour Detection via Zero-shot Cross-lingual Transfer Learning

no code implementations • 27 Sep 2021 • Lin Tian, Xiuzhen Zhang, Jey Han Lau

Most rumour detection models for social media are designed for one specific language (mostly English).

Pretrained Multilingual Language Models Rumour Detection +2

Paper
Add Code

Semi-automatic Triage of Requests for Free Legal Assistance

no code implementations • EMNLP (NLLP) 2021 • Meladel Mistica, Jey Han Lau, Brayden Merrifield, Kate Fazio, Timothy Baldwin

Free legal assistance is critically under-resourced, and many of those who seek legal help have their needs unmet.

Fairness

Paper
Add Code

Exploring Story Generation with Multi-task Objectives in Variational Autoencoders

no code implementations • ALTA 2021 • Zhuohan Xie, Trevor Cohn, Jey Han Lau

GPT-2 has been frequently adapted in story generation models as it provides powerful generative capability.

Story Generation

Paper
Add Code

Findings on Conversation Disentanglement

no code implementations • ALTA 2021 • Rongxin Zhu, Jey Han Lau, Jianzhong Qi

Conversation disentanglement, the task to identify separate threads in conversations, is an important pre-processing step in multi-party conversational NLP applications such as conversational question answering and conversation summarization.

Conversational Question Answering Conversation Disentanglement +2

Paper
Add Code

ITTC @ TREC 2021 Clinical Trials Track

no code implementations • 16 Feb 2022 • Thinh Hung Truong, Yulia Otmakhova, Rahmad Mahendra, Timothy Baldwin, Jey Han Lau, Trevor Cohn, Lawrence Cavedon, Damiano Spina, Karin Verspoor

This paper describes the submissions of the Natural Language Processing (NLP) team from the Australian Research Council Industrial Transformation Training Centre (ITTC) for Cognitive Computing in Medical Technologies to the TREC 2021 Clinical Trials Track.

Retrieval

Paper
Add Code

One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia

no code implementations • ACL 2022 • Alham Fikri Aji, Genta Indra Winata, Fajri Koto, Samuel Cahyawijaya, Ade Romadhony, Rahmad Mahendra, Kemal Kurniawan, David Moeljadi, Radityo Eko Prasojo, Timothy Baldwin, Jey Han Lau, Sebastian Ruder

NLP research is impeded by a lack of resources and awareness of the challenges presented by underrepresented languages and dialects.

Paper
Add Code

Can Pretrained Language Models Generate Persuasive, Faithful, and Informative Ad Text for Product Descriptions?

no code implementations • ECNLP (ACL) 2022 • Fajri Koto, Jey Han Lau, Timothy Baldwin

For any e-commerce service, persuasive, faithful, and informative product descriptions can attract shoppers and improve sales.

Text Generation

Paper
Add Code

The patient is more dead than alive: exploring the current state of the multi-document summarisation of the biomedical literature

no code implementations • ACL 2022 • Yulia Otmakhova, Karin Verspoor, Timothy Baldwin, Jey Han Lau

Although multi-document summarisation (MDS) of the biomedical literature is a highly valuable task that has recently attracted substantial interest, evaluation of the quality of biomedical summaries lacks consistency and transparency.

Paper
Add Code

Robust Task-Oriented Dialogue Generation with Contrastive Pre-training and Adversarial Filtering

no code implementations • 20 May 2022 • Shiquan Yang, Xinting Huang, Jey Han Lau, Sarah Erfani

Data artifacts incentivize machine learning models to learn non-transferable generalizations by taking advantage of shortcuts in the data, and there is growing evidence that data artifacts play a role for the strong results that deep learning models achieve in recent natural language processing benchmarks.

Contrastive Learning Dialogue Generation

Paper
Add Code

Cross-linguistic Comparison of Linguistic Feature Encoding in BERT Models for Typologically Different Languages

no code implementations • NAACL (SIGTYP) 2022 • Yulia Otmakhova, Karin Verspoor, Jey Han Lau

Though recently there have been an increased interest in how pre-trained language models encode different linguistic features, there is still a lack of systematic comparison between languages with different morphology and syntax.

Paper
Add Code

LipKey: A Large-Scale News Dataset for Absent Keyphrases Generation and Abstractive Summarization

no code implementations • COLING 2022 • Fajri Koto, Timothy Baldwin, Jey Han Lau

Summaries, keyphrases, and titles are different ways of concisely capturing the content of a document.

Abstractive Text Summarization Document Summarization

Paper
Add Code

Easy-First Bottom-Up Discourse Parsing via Sequence Labelling

no code implementations • COLING (CODI, CRAC) 2022 • Andrew Shen, Fajri Koto, Jey Han Lau, Timothy Baldwin

We propose a novel unconstrained bottom-up approach for rhetorical discourse parsing based on sequence labelling of adjacent pairs of discourse units (DUs), based on the framework of Koto et al. (2021).

Discourse Parsing

Paper
Add Code

The Next Chapter: A Study of Large Language Models in Storytelling

no code implementations • 24 Jan 2023 • Zhuohan Xie, Trevor Cohn, Jey Han Lau

To enhance the quality of generated stories, recent story generation models have been investigating the utilization of higher-level attributes like plots or commonsense knowledge.

Story Generation World Knowledge

Paper
Add Code

CMA-R:Causal Mediation Analysis for Explaining Rumour Detection

1 code implementation • 13 Feb 2024 • Lin Tian, Xiuzhen Zhang, Jey Han Lau

We apply causal mediation analysis to explain the decision-making process of neural models for rumour detection on Twitter.

Decision Making Rumour Detection

Paper
Code

Exploring Multi-Document Information Consolidation for Scientific Sentiment Summarization

no code implementations • 28 Feb 2024 • Miao Li, Jey Han Lau, Eduard Hovy

Modern natural language generation systems with LLMs exhibit the capability to generate a plausible summary of multiple documents; however, it is uncertain if models truly possess the ability of information consolidation to generate summaries, especially on those source documents with opinionated information.

Review Generation Text Generation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.