Search Results for author: Chenghua Lin

Found 135 papers, 75 papers with code

Development of a Benchmark Corpus to Support Entity Recognition in Job Descriptions

no code implementations LREC 2022 Thomas Green, Diana Maynard, Chenghua Lin

We present the development of a benchmark suite consisting of an annotation schema, training corpus and baseline model for Entity Recognition (ER) in job descriptions, published under a Creative Commons license.

Recommendation Systems

CM-Gen: A Neural Framework for Chinese Metaphor Generation with Explicit Context Modelling

1 code implementation COLING 2022 Yucheng Li, Chenghua Lin, Frank Guerin

The metaphor identification module is able to perform a self-training procedure, which discovers novel metaphors from a large-scale unlabeled corpus for NM generation.

DocMMIR: A Framework for Document Multi-modal Information Retrieval

no code implementations25 May 2025 Zirui Li, Siwei Wu, Xingyu Wang, Yi Zhou, Yizhi Li, Chenghua Lin

The rapid advancement of unsupervised representation learning and large-scale pre-trained vision-language models has significantly improved cross-modal retrieval tasks.

Cross-Modal Retrieval Information Retrieval +2

ContrastScore: Towards Higher Quality, Less Biased, More Efficient Evaluation Metrics with Contrastive Evaluation

no code implementations2 Apr 2025 Xiao Wang, Daniil Larionov, Siwei Wu, Yiqi Liu, Steffen Eger, Nafise Sadat Moosavi, Chenghua Lin

In this work, we introduce ContrastScore, a contrastive evaluation metric designed to enable higher-quality, less biased, and more efficient assessment of generated text.

Machine Translation Text Generation

Natural Language Generation

no code implementations20 Mar 2025 Emiel van Miltenburg, Chenghua Lin

The term Natural Language Generation (NLG), in its broadest definition, refers to the study of systems that verbalize some form of information through natural language.

Image Captioning Image to text +3

LVPruning: An Effective yet Simple Language-Guided Vision Token Pruning Approach for Multi-modal Large Language Models

no code implementations23 Jan 2025 Yizheng Sun, Yanze Xin, Hao Li, Jingyuan Sun, Chenghua Lin, Riza Batista-Navarro

Multi-modal Large Language Models (MLLMs) have achieved remarkable success by integrating visual and textual modalities.

Does Table Source Matter? Benchmarking and Improving Multimodal Scientific Table Understanding and Reasoning

1 code implementation22 Jan 2025 Bohao Yang, Yingji Zhang, Dong Liu, André Freitas, Chenghua Lin

While multimodal large language models (MLLMs) enable direct visual processing, they face limitations in handling scientific tables due to fixed input image resolutions and insufficient numerical reasoning capabilities.

Benchmarking

Leveraging Large Language Models for Zero-shot Lay Summarisation in Biomedicine and Beyond

no code implementations9 Jan 2025 Tomas Goldsack, Carolina Scarton, Chenghua Lin

In this work, we explore the application of Large Language Models to zero-shot Lay Summarisation.

Tougher Text, Smarter Models: Raising the Bar for Adversarial Defence Benchmarks

1 code implementation5 Jan 2025 Yang Wang, Chenghua Lin

Recent advancements in natural language processing have highlighted the vulnerability of deep learning models to adversarial attacks.

Adversarial Robustness Benchmarking +4

Disentangling Preference Representation and Text Generation for Efficient Individual Preference Alignment

1 code implementation30 Dec 2024 Jianfei Zhang, Jun Bai, Bei Li, Yanmeng Wang, Rumei Li, Chenghua Lin, Wenge Rong

Aligning Large Language Models (LLMs) with general human preferences has been proved crucial in improving the interaction quality between LLMs and human.

Text Generation

Observing Micromotives and Macrobehavior of Large Language Models

no code implementations10 Dec 2024 Yuyang Cheng, Xingwei Qu, Tomas Goldsack, Chenghua Lin, Chung-Chi Chen

Thomas C. Schelling, awarded the 2005 Nobel Memorial Prize in Economic Sciences, pointed out that ``individuals decisions (micromotives), while often personal and localized, can lead to societal outcomes (macrobehavior) that are far more complex and different from what the individuals intended.''

CAST: Corpus-Aware Self-similarity Enhanced Topic modelling

no code implementations19 Oct 2024 Yanan Ma, Chenghao Xiao, Chenhan Yuan, Sabine N van der Veer, Lamiece Hassan, Chenghua Lin, Goran Nenadic

Experiments on news benchmark datasets and one Twitter dataset demonstrate the method's superiority in generating coherent, diverse topics, and handling noisy data, outperforming strong baselines.

Contrastive Learning Diversity +1

Can MLLMs Understand the Deep Implication Behind Chinese Images?

1 code implementation17 Oct 2024 Chenhao Zhang, Xi Feng, Yuelin Bai, Xinrun Du, Jinchang Hou, Kaixin Deng, Guangzeng Han, Qinrui Li, Bingli Wang, Jiaheng Liu, Xingwei Qu, Yifei Zhang, Qixuan Zhao, Yiming Liang, Ziqiang Liu, Feiteng Fang, Min Yang, Wenhao Huang, Chenghua Lin, Ge Zhang, Shiwen Ni

To fill the gap, we introduce the **C**hinese **I**mage **I**mplication understanding **Bench**mark, **CII-Bench**, which aims to assess the higher-order perception and understanding capabilities of MLLMs for Chinese images.

A Comparative Study on Reasoning Patterns of OpenAI's o1 Model

1 code implementation17 Oct 2024 Siwei Wu, Zhongyuan Peng, Xinrun Du, Tuney Zheng, Minghao Liu, Jialong Wu, Jiachen Ma, Yizhi Li, Jian Yang, Wangchunshu Zhou, Qunshu Lin, Junbo Zhao, Zhaoxiang Zhang, Wenhao Huang, Ge Zhang, Chenghua Lin, J. H. Liu

In our work, to investigate the reasoning patterns of o1, we compare o1 with existing Test-time Compute methods (BoN, Step-wise BoN, Agent Workflow, and Self-Refine) by using OpenAI's GPT-4o as a backbone on general reasoning benchmarks in three domains (i. e., math, coding, commonsense reasoning).

Math

Increasing the Difficulty of Automatically Generated Questions via Reinforcement Learning with Synthetic Preference

no code implementations10 Oct 2024 William Thorne, Ambrose Robinson, Bohua Peng, Chenghua Lin, Diana Maynard

This research contributes: (1) A methodology for increasing question difficulty using PPO and synthetic data; (2) Empirical evidence of the method's effectiveness, including human evaluation; (3) An in-depth error analysis and study of emergent phenomena; and (4) An open-source codebase and set of three llama-2-chat adapters for reproducibility and adaptation.

Machine Reading Comprehension Question Answering +3

On the Rigour of Scientific Writing: Criteria, Analysis, and Insights

no code implementations7 Oct 2024 Joseph James, Chenghao Xiao, Yucheng Li, Chenghua Lin

Rigour is crucial for scientific research as it ensures the reproducibility and validity of results and findings.

Keyword Extraction

From Facts to Insights: A Study on the Generation and Evaluation of Analytical Reports for Deciphering Earnings Calls

no code implementations1 Oct 2024 Tomas Goldsack, Yang Wang, Chenghua Lin, Chung-Chi Chen

This paper explores the use of Large Language Models (LLMs) in the generation and evaluation of analytical reports derived from Earnings Calls (ECs).

Leveraging Estimated Transferability Over Human Intuition for Model Selection in Text Ranking

1 code implementation24 Sep 2024 Jun Bai, Zhuofan Chen, Zhenzi Li, Hanhua Hong, Jianfei Zhang, Chen Li, Chenghua Lin, Wenge Rong

As a promising alternative to human intuition and brute-force fine-tuning, Transferability Estimation (TE) has emerged as an effective approach to model selection.

Model Selection Sentence +1

With Ears to See and Eyes to Hear: Sound Symbolism Experiments with Multimodal Large Language Models

1 code implementation23 Sep 2024 Tyler Loakman, Yucheng Li, Chenghua Lin

To investigate this, we analyse the ability of VLMs and LLMs to demonstrate sound symbolism (i. e., to recognise a non-arbitrary link between sounds and concepts) as well as their ability to "hear" via the interplay of the language and vision modules of open and closed-source multimodal models.

OmniBench: Towards The Future of Universal Omni-Language Models

1 code implementation23 Sep 2024 Yizhi Li, Ge Zhang, Yinghao Ma, Ruibin Yuan, Kang Zhu, Hangyu Guo, Yiming Liang, Jiaheng Liu, Zekun Wang, Jian Yang, Siwei Wu, Xingwei Qu, Jinjie Shi, Xinyue Zhang, Zhenzhu Yang, Xiangzhou Wang, Zhaoxiang Zhang, Zachary Liu, Emmanouil Benetos, Wenhao Huang, Chenghua Lin

Recent advancements in multimodal large language models (MLLMs) have focused on integrating multiple modalities, yet their ability to simultaneously process and reason across different inputs remains underexplored.

Instruction Following

LIME: Less Is More for MLLM Evaluation

2 code implementations10 Sep 2024 King Zhu, Qianbo Zang, Shian Jia, Siwei Wu, Feiteng Fang, Yizhi Li, Shawn Gavin, Tuney Zheng, Jiawei Guo, Bo Li, HaoNing Wu, Xingwei Qu, Jian Yang, Zachary Liu, Xiang Yue, J. H. Liu, Chenghua Lin, Min Yang, Shiwen Ni, Wenhao Huang, Ge Zhang

However, many of these benchmarks include overly simple or uninformative samples, complicating the effective distinction of different MLLMs' performance.

Image Captioning Question Answering +1

Overview of the BioLaySumm 2024 Shared Task on the Lay Summarization of Biomedical Research Articles

no code implementations16 Aug 2024 Tomas Goldsack, Carolina Scarton, Matthew Shardlow, Chenghua Lin

This paper presents the setup and results of the second edition of the BioLaySumm shared task on the Lay Summarisation of Biomedical Research Articles, hosted at the BioNLP Workshop at ACL 2024.

Lay Summarization

I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative Self-Enhancement Paradigm

1 code implementation15 Aug 2024 Yiming Liang, Ge Zhang, Xingwei Qu, Tianyu Zheng, Jiawei Guo, Xinrun Du, Zhenzhu Yang, Jiaheng Liu, Chenghua Lin, Lei Ma, Wenhao Huang, Jiajun Zhang

Large Language Models (LLMs) have achieved significant advancements, however, the common learning paradigm treats LLMs as passive information repositories, neglecting their potential for active learning and alignment.

Active Learning Code Generation

Overview of the NLPCC 2024 Shared Task on Chinese Metaphor Generation

no code implementations8 Aug 2024 Xingwei Qu, Ge Zhang, Siwei Wu, Yizhi Li, Chenghua Lin

The goal of this shared task is to generate Chinese metaphors using machine learning techniques and effectively identifying basic components of metaphorical sentences.

Sentence

PFME: A Modular Approach for Fine-grained Hallucination Detection and Editing of Large Language Models

no code implementations29 Jun 2024 Kunquan Deng, Zeyu Huang, Chen Li, Chenghua Lin, Min Gao, Wenge Rong

In editing tasks, PFME further enhances the FActScore of FActScore-Alpaca13B and FActScore-ChatGPT datasets, increasing by 16. 2pp and 4. 6pp, respectively.

Hallucination Sentence

BioMNER: A Dataset for Biomedical Method Entity Recognition

no code implementations28 Jun 2024 Chen Tang, Bohao Yang, Kun Zhao, Bo Lv, Chenghao Xiao, Frank Guerin, Chenghua Lin

Named entity recognition (NER) stands as a fundamental and pivotal task within the realm of Natural Language Processing.

Information Retrieval named-entity-recognition +2

X-ray Made Simple: Lay Radiology Report Generation and Robust Evaluation

1 code implementation25 Jun 2024 Kun Zhao, Chenghao Xiao, Sixing Yan, Haoteng Tang, William K. Cheung, Noura Al Moubayed, Liang Zhan, Chenghua Lin

We show that training on the layman's terms dataset encourages models to focus on the semantics of the reports, as opposed to overfitting to learning the report templates.

Fairness

Crafting Customisable Characters with LLMs: Introducing SimsChat, a Persona-Driven Role-Playing Agent Framework

1 code implementation25 Jun 2024 Bohao Yang, Dong Liu, Chenghao Xiao, Kun Zhao, Chen Tang, Chao Li, Lin Yuan, Guang Yang, Lanxiao Huang, Chenghua Lin

Large Language Models (LLMs) demonstrate remarkable ability to comprehend instructions and generate human-like text, enabling sophisticated agent simulation beyond basic behavior replication.

MMTE: Corpus and Metrics for Evaluating Machine Translation Quality of Metaphorical Language

no code implementations19 Jun 2024 Shun Wang, Ge Zhang, Han Wu, Tyler Loakman, Wenhao Huang, Chenghua Lin

Machine Translation (MT) has developed rapidly since the release of Large Language Models and current MT evaluation is performed through comparison with reference human translations or by predicting quality scores from human-labeled data.

Machine Translation Translation

ATLAS: Improving Lay Summarisation with Attribute-based Control

no code implementations9 Jun 2024 Zhihao Zhang, Tomas Goldsack, Carolina Scarton, Chenghua Lin

Lay summarisation aims to produce summaries of scientific articles that are comprehensible to non-expert audiences.

Attribute

SLIDE: A Framework Integrating Small and Large Language Models for Open-Domain Dialogues Evaluation

1 code implementation24 May 2024 Kun Zhao, Bohao Yang, Chen Tang, Chenghua Lin, Liang Zhan

Our approach introduces several techniques: (1) Contrastive learning to differentiate between robust and non-robust response embeddings; (2) A novel metric for semantic sensitivity that combines embedding cosine distances with similarity learned through neural networks, and (3) a strategy for incorporating the evaluation results from both the SLM and LLMs.

Contrastive Learning Dialogue Evaluation

ComposerX: Multi-Agent Symbolic Music Composition with LLMs

1 code implementation28 Apr 2024 Qixin Deng, Qikai Yang, Ruibin Yuan, Yipeng Huang, Yi Wang, Xubo Liu, Zeyue Tian, Jiahao Pan, Ge Zhang, Hanfeng Lin, Yizhi Li, Yinghao Ma, Jie Fu, Chenghua Lin, Emmanouil Benetos, Wenwu Wang, Guangyu Xia, Wei Xue, Yike Guo

Music composition represents the creative side of humanity, and itself is a complex task that requires abilities to understand and generate information with long dependency and harmony constraints.

In-Context Learning Music Generation

ReproHum #0087-01: Human Evaluation Reproduction Report for Generating Fact Checking Explanations

no code implementations26 Apr 2024 Tyler Loakman, Chenghua Lin

This paper presents a partial reproduction of Generating Fact Checking Explanations by Anatanasova et al (2020) as part of the ReproHum element of the ReproNLP shared task to reproduce the findings of NLP research regarding human evaluation.

Fact Checking

Emphasising Structured Information: Integrating Abstract Meaning Representation into LLMs for Enhanced Open-Domain Dialogue Evaluation

1 code implementation1 Apr 2024 Bohao Yang, Kun Zhao, Chen Tang, Dong Liu, Liang Zhan, Chenghua Lin

Trainable evaluation metrics, typically trained with true positive and randomly selected negative responses, tend to assign higher scores to responses that share greater content similarity with a given context.

Abstract Meaning Representation Dialogue Evaluation +2

Train & Constrain: Phonologically Informed Tongue-Twister Generation from Topics and Paraphrases

no code implementations20 Mar 2024 Tyler Loakman, Chen Tang, Chenghua Lin

Previous work in phonologically and phonetically grounded language generation has mainly focused on domains such as puns and poetry.

Language Modeling Language Modelling +1

TEGEE: Task dEfinition Guided Expert Ensembling for Generalizable and Few-shot Learning

no code implementations7 Mar 2024 Xingwei Qu, Yiming Liang, Yucheng Wang, Tianyu Zheng, Tommy Yue, Xingyuan Bu, Lei Ma, Stephen W. Huang, Jiajun Zhang, Yinan Shi, Chenghua Lin, Jie Fu, Ge Zhang

Our framework employs a dual 3B model approach, with each model assigned a distinct role: one focuses on task definition extraction, while the other handles learning from demonstrations.

Continual Learning Definition Extraction +3

CIF-Bench: A Chinese Instruction-Following Benchmark for Evaluating the Generalizability of Large Language Models

no code implementations20 Feb 2024 Yizhi Li, Ge Zhang, Xingwei Qu, Jiali Li, Zhaoqun Li, Zekun Wang, Hao Li, Ruibin Yuan, Yinghao Ma, Kai Zhang, Wangchunshu Zhou, Yiming Liang, Lei Zhang, Lei Ma, Jiajun Zhang, Zuowen Li, Stephen W. Huang, Chenghua Lin, Jie Fu

The advancement of large language models (LLMs) has enhanced the ability to generalize across a wide range of unseen natural language processing (NLP) tasks through instruction-following.

Instruction Following

CMDAG: A Chinese Metaphor Dataset with Annotated Grounds as CoT for Boosting Metaphor Generation

2 code implementations20 Feb 2024 Yujie Shao, Xinrong Yao, Xingwei Qu, Chenghua Lin, Shi Wang, Stephen W. Huang, Ge Zhang, Jie Fu

These models are able to generate creative and fluent metaphor sentences more frequently induced by selected samples from our dataset, demonstrating the value of our corpus for Chinese metaphor research.

Pixel Sentence Representation Learning

2 code implementations13 Feb 2024 Chenghao Xiao, Zhuoxu Huang, Danlu Chen, G Thomas Hudson, Yizhi Li, Haoran Duan, Chenghua Lin, Jie Fu, Jungong Han, Noura Al Moubayed

To our knowledge, this is the first representation learning method devoid of traditional language models for understanding sentence and document semantics, marking a stride closer to human-like textual comprehension.

Natural Language Inference Representation Learning +3

Evaluating Large Language Models for Generalization and Robustness via Data Compression

1 code implementation1 Feb 2024 Yucheng Li, Yunhao Guo, Frank Guerin, Chenghua Lin

We measure: 1) the compression performance on the testing period as a measure of generalization on unseen data; and 2) the performance gap between the training and testing period as a measure of robustness.

Data Compression

Finding Challenging Metaphors that Confuse Pretrained Language Models

no code implementations29 Jan 2024 Yucheng Li, Frank Guerin, Chenghua Lin

In this paper, we test various NLP models on the VUA metaphor dataset and quantify to what extent metaphors affect models' performance on various downstream tasks.

Machine Translation

SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval

1 code implementation24 Jan 2024 Siwei Wu, Yizhi Li, Kang Zhu, Ge Zhang, Yiming Liang, Kaijing Ma, Chenghao Xiao, Haoran Zhang, Bohao Yang, Wenhu Chen, Wenhao Huang, Noura Al Moubayed, Jie Fu, Chenghua Lin

We further annotate the image-text pairs with two-level subset-subcategory hierarchy annotations to facilitate a more comprehensive evaluation of the baselines.

Benchmarking Image Captioning +3

CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark

1 code implementation22 Jan 2024 Ge Zhang, Xinrun Du, Bei Chen, Yiming Liang, Tongxu Luo, Tianyu Zheng, Kang Zhu, Yuyang Cheng, Chunpu Xu, Shuyue Guo, Haoran Zhang, Xingwei Qu, Junjie Wang, Ruibin Yuan, Yizhi Li, Zekun Wang, Yudong Liu, Yu-Hsuan Tsai, Fengji Zhang, Chenghua Lin, Wenhao Huang, Jie Fu

We introduce CMMMU, a new Chinese Massive Multi-discipline Multimodal Understanding benchmark designed to evaluate LMMs on tasks demanding college-level subject knowledge and deliberate reasoning in a Chinese context.

Kun: Answer Polishment for Chinese Self-Alignment with Instruction Back-Translation

1 code implementation12 Jan 2024 Tianyu Zheng, Shuyue Guo, Xingwei Qu, Jiawei Guo, Xinrun Du, Qi Jia, Chenghua Lin, Wenhao Huang, Jie Fu, Ge Zhang

In this paper, we introduce Kun, a novel approach for creating high-quality instruction-tuning datasets for large language models (LLMs) without relying on manual annotations.

Instruction Following Translation

Language Model as an Annotator: Unsupervised Context-aware Quality Phrase Generation

no code implementations28 Dec 2023 Zhihao Zhang, Yuan Zuo, Chenghua Lin, Junjie Wu

Finally, we merge the quality phrases from both the Annotator and Generator as the final predictions, considering their complementary nature and distinct characteristics.

Informativeness Language Modeling +2

LatestEval: Addressing Data Contamination in Language Model Evaluation through Dynamic and Time-Sensitive Test Construction

1 code implementation19 Dec 2023 Yucheng Li, Frank Guerin, Chenghua Lin

LatestEval avoids data contamination by only using texts published within a recent time window, ensuring no overlap with the training corpora of pre-trained language models.

Language Model Evaluation Language Modeling +2

A Cross-Attention Augmented Model for Event-Triggered Context-Aware Story Generation

1 code implementation19 Nov 2023 Chen Tang, Tyler Loakman, Chenghua Lin

These results underscore the effectiveness of our model in leveraging context and event features to improve the quality of generated narratives.

Story Generation

LLMs as Narcissistic Evaluators: When Ego Inflates Evaluation Scores

no code implementations16 Nov 2023 Yiqi Liu, Nafise Sadat Moosavi, Chenghua Lin

Automatic evaluation of generated textual content presents an ongoing challenge within the field of NLP.

Language Modeling Language Modelling

The Iron(ic) Melting Pot: Reviewing Human Evaluation in Humour, Irony and Sarcasm Generation

no code implementations9 Nov 2023 Tyler Loakman, Aaron Maladry, Chenghua Lin

Human evaluation is often considered to be the gold standard method of evaluating a Natural Language Generation system.

Text Generation

An Open Source Data Contamination Report for Large Language Models

1 code implementation26 Oct 2023 Yucheng Li, Frank Guerin, Chenghua Lin

We also introduce an open-source pipeline that enables the community to perform contamination analysis on customised data and models.

HellaSwag Language Modeling +4

Enhancing Biomedical Lay Summarisation with External Knowledge Graphs

1 code implementation24 Oct 2023 Tomas Goldsack, Zhihao Zhang, Chen Tang, Carolina Scarton, Chenghua Lin

Previous approaches for automatic lay summarisation are exclusively reliant on the source article that, given it is written for a technical audience (e. g., researchers), is unlikely to explicitly define all technical concepts or state all of the background information that is relevant for a lay audience.

Decoder Knowledge Graphs

Length is a Curse and a Blessing for Document-level Semantics

1 code implementation24 Oct 2023 Chenghao Xiao, Yizhi Li, G Thomas Hudson, Chenghua Lin, Noura Al Moubayed

In recent years, contrastive learning (CL) has been extensively utilized to recover sentence and document-level encoding capability from pre-trained language models.

Contrastive Learning Information Retrieval +3

Improving Biomedical Abstractive Summarisation with Knowledge Aggregation from Citation Papers

1 code implementation24 Oct 2023 Chen Tang, Shun Wang, Tomas Goldsack, Chenghua Lin

Abstracts derived from biomedical literature possess distinct domain-specific characteristics, including specialised writing styles and biomedical terminologies, which necessitate a deep understanding of the related literature.

Compressing Context to Enhance Inference Efficiency of Large Language Models

2 code implementations9 Oct 2023 Yucheng Li, Bo Dong, Chenghua Lin, Frank Guerin

This paper proposes a method called Selective Context that enhances the inference efficiency of LLMs by identifying and pruning redundancy in the input context to make the input more compact.

Question Answering Response Generation

Overview of the BioLaySumm 2023 Shared Task on Lay Summarization of Biomedical Research Articles

no code implementations29 Sep 2023 Tomas Goldsack, Zheheng Luo, Qianqian Xie, Carolina Scarton, Matthew Shardlow, Sophia Ananiadou, Chenghua Lin

This paper presents the results of the shared task on Lay Summarisation of Biomedical Research Articles (BioLaySumm), hosted at the BioNLP Workshop at ACL 2023.

Lay Summarization

Effective Distillation of Table-based Reasoning Ability from LLMs

1 code implementation22 Sep 2023 Bohao Yang, Chen Tang, Kun Zhao, Chenghao Xiao, Chenghua Lin

Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of natural language processing tasks.

Table-to-Text Generation

Audio Contrastive based Fine-tuning

no code implementations21 Sep 2023 Yang Wang, Qibin Liang, Chenghao Xiao, Yizhi Li, Noura Al Moubayed, Chenghua Lin

Audio classification plays a crucial role in speech and sound processing tasks with a wide range of applications.

Audio Classification Contrastive Learning

Improving Medical Dialogue Generation with Abstract Meaning Representations

1 code implementation19 Sep 2023 Bohao Yang, Chen Tang, Chenghua Lin

In this paper, We propose a novel framework that models dialogues between patients and healthcare professionals using AMR graphs, where the neural networks incorporate textual and graphical knowledge with a dual attention mechanism.

Dialogue Generation

On the Effectiveness of Speech Self-supervised Learning for Music

no code implementations11 Jul 2023 Yinghao Ma, Ruibin Yuan, Yizhi Li, Ge Zhang, Xingran Chen, Hanzhi Yin, Chenghua Lin, Emmanouil Benetos, Anton Ragni, Norbert Gyenge, Ruibo Liu, Gus Xia, Roger Dannenberg, Yike Guo, Jie Fu

Our findings suggest that training with music data can generally improve performance on MIR tasks, even when models are trained using paradigms designed for speech.

Information Retrieval Music Information Retrieval +2

LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT

1 code implementation29 Jun 2023 Le Zhuo, Ruibin Yuan, Jiahao Pan, Yinghao Ma, Yizhi Li, Ge Zhang, Si Liu, Roger Dannenberg, Jie Fu, Chenghua Lin, Emmanouil Benetos, Wei Xue, Yike Guo

We introduce LyricWhiz, a robust, multilingual, and zero-shot automatic lyrics transcription method achieving state-of-the-art performance on various lyrics transcription datasets, even in challenging genres such as rock and metal.

Automatic Lyrics Transcription Language Modeling +4

Enhancing Dialogue Generation via Dynamic Graph Knowledge Aggregation

1 code implementation28 Jun 2023 Chen Tang, Hongbo Zhang, Tyler Loakman, Chenghua Lin, Frank Guerin

Further analysis also shows that our representation learning framework can fill the semantic gap by coagulating representations of both text and graph knowledge.

Dialogue Generation Graph Attention +3

TwistList: Resources and Baselines for Tongue Twister Generation

1 code implementation6 Jun 2023 Tyler Loakman, Chen Tang, Chenghua Lin

Previous work in phonetically-grounded language generation has mainly focused on domains such as lyrics and poetry.

Text Generation

Evaluating Open-Domain Dialogues in Latent Space with Next Sentence Prediction and Mutual Information

1 code implementation26 May 2023 Kun Zhao, Bohao Yang, Chenghua Lin, Wenge Rong, Aline Villavicencio, Xiaohui Cui

The long-standing one-to-many issue of the open-domain dialogues poses significant challenges for automatic evaluation methods, i. e., there may be multiple suitable responses which differ in semantics for a given conversational context.

Semantic Similarity Semantic Textual Similarity +1

Metaphor Detection via Explicit Basic Meanings Modelling

1 code implementation26 May 2023 Yucheng Li, Shun Wang, Chenghua Lin, Guerin Frank

One noticeable trend in metaphor detection is the embrace of linguistic theories such as the metaphor identification procedure (MIP) for model architecture design.

Sentence

Interactive Natural Language Processing

no code implementations22 May 2023 Zekun Wang, Ge Zhang, Kexin Yang, Ning Shi, Wangchunshu Zhou, Shaochun Hao, Guangzheng Xiong, Yizhi Li, Mong Yuan Sim, Xiuying Chen, Qingqing Zhu, Zhenzhu Yang, Adam Nik, Qi Liu, Chenghua Lin, Shi Wang, Ruibo Liu, Wenhu Chen, Ke Xu, Dayiheng Liu, Yike Guo, Jie Fu

Interactive Natural Language Processing (iNLP) has emerged as a novel paradigm within the field of NLP, aimed at addressing limitations in existing frameworks while aligning with the ultimate goals of artificial intelligence.

Decision Making

Chinese Open Instruction Generalist: A Preliminary Release

2 code implementations17 Apr 2023 Ge Zhang, Yemin Shi, Ruibo Liu, Ruibin Yuan, Yizhi Li, Siwei Dong, Yu Shu, Zhaoqun Li, Zekun Wang, Chenghua Lin, Wenhao Huang, Jie Fu

Instruction tuning is widely recognized as a key technique for building generalist language models, which has attracted the attention of researchers and the public with the release of InstructGPT~\citep{ouyang2022training} and ChatGPT\footnote{\url{https://chat. openai. com/}}.

Requirement Formalisation using Natural Language Processing and Machine Learning: A Systematic Review

no code implementations18 Mar 2023 Shekoufeh Kolahdouz-Rahimi, Kevin Lano, Chenghua Lin

We found that heuristic NLP approaches are the most common NLP techniques used for automatic RF, primary operating on structured and semi-structured data.

Systematic Literature Review

Metaphor Detection with Effective Context Denoising

1 code implementation11 Feb 2023 Shun Wang, Yucheng Li, Chenghua Lin, Loïc Barrault, Frank Guerin

We propose a novel RoBERTa-based model, RoPPT, which introduces a target-oriented parse tree structure in metaphor detection.

Denoising

FrameBERT: Conceptual Metaphor Detection with Frame Embedding Learning

1 code implementation9 Feb 2023 Yucheng Li, Shun Wang, Chenghua Lin, Frank Guerin, Loïc Barrault

In this paper, we propose FrameBERT, a RoBERTa-based model that can explicitly learn and incorporate FrameNet Embeddings for concept-level metaphor detection.

The Secret of Metaphor on Expressing Stronger Emotion

1 code implementation30 Jan 2023 Yucheng Li, Frank Guerin, Chenghua Lin

Metaphors are proven to have stronger emotional impact than literal expressions.

Specificity

CORGI-PM: A Chinese Corpus For Gender Bias Probing and Mitigation

1 code implementation1 Jan 2023 Ge Zhang, Yizhi Li, Yaoyao Wu, Linyuan Zhang, Chenghua Lin, Jiayi Geng, Shi Wang, Jie Fu

As natural language processing (NLP) for gender bias becomes a significant interdisciplinary topic, the prevalent data-driven techniques such as large-scale language models suffer from data inadequacy and biased corpus, especially for languages with insufficient resources such as Chinese.

Sentence

Routine Outcome Monitoring in Psychotherapy Treatment using Sentiment-Topic Modelling Approach

no code implementations8 Dec 2022 Noor Fazilla Abd Yusof, Chenghua Lin

While outcome monitoring tends to improve the therapy outcomes, however, there are many challenges in the current method, e. g. time and financial burden for administering questionnaires, scoring and analysing the results.

HERB: Measuring Hierarchical Regional Bias in Pre-trained Language Models

1 code implementation5 Nov 2022 Yizhi Li, Ge Zhang, Bohao Yang, Chenghua Lin, Shi Wang, Anton Ragni, Jie Fu

In addition to verifying the existence of regional bias in LMs, we find that the biases on regional groups can be strongly influenced by the geographical clustering of the groups.

Fairness

Improving Variational Autoencoders with Density Gap-based Regularization

1 code implementation1 Nov 2022 Jianfei Zhang, Jun Bai, Chenghua Lin, Yanmeng Wang, Wenge Rong

There are effective ways proposed to prevent posterior collapse in VAEs, but we observe that they in essence make trade-offs between posterior collapse and hole problem, i. e., mismatch between the aggregated posterior distribution and the prior distribution.

Language Modeling Language Modelling +1

Terminology-aware Medical Dialogue Generation

1 code implementation27 Oct 2022 Chen Tang, Hongbo Zhang, Tyler Loakman, Chenghua Lin, Frank Guerin

In this paper, we propose a novel framework to improve medical dialogue generation by considering features centered on domain-specific terminology.

Dialogue Generation

EtriCA: Event-Triggered Context-Aware Story Generation Augmented by Cross Attention

1 code implementation22 Oct 2022 Chen Tang, Chenghua Lin, Henglin Huang, Frank Guerin, Zhihao Zhang

One of the key challenges of automatic story generation is how to generate a long narrative that can maintain fluency, relevance, and coherence.

Story Generation

Improving Chinese Story Generation via Awareness of Syntactic Dependencies and Semantics

1 code implementation19 Oct 2022 Henglin Huang, Chen Tang, Tyler Loakman, Frank Guerin, Chenghua Lin

In spite of the success of prior works with the application of pre-trained models, current neural models for Chinese stories still struggle to generate high-quality long text narratives.

Denoising Representation Learning +1

NGEP: A Graph-based Event Planning Framework for Story Generation

1 code implementation19 Oct 2022 Chen Tang, Zhihao Zhang, Tyler Loakman, Chenghua Lin, Frank Guerin

To improve the performance of long text generation, recent studies have leveraged automatically planned event structures (i. e. storylines) to guide story generation.

Hallucination Story Generation

Making Science Simple: Corpora for the Lay Summarisation of Scientific Literature

1 code implementation18 Oct 2022 Tomas Goldsack, Zhihao Zhang, Chenghua Lin, Carolina Scarton

Lay summarisation aims to jointly summarise and simplify a given text, thus making its content more comprehensible to non-experts.

Lay Summarization

PUF-Phenotype: A Robust and Noise-Resilient Approach to Aid Intra-Group-based Authentication with DRAM-PUFs Using Machine Learning

no code implementations11 Jul 2022 Owen Millwood, Jack Miskelly, Bohao Yang, Prosanta Gope, Elif Kavun, Chenghua Lin

As the demand for highly secure and dependable lightweight systems increases in the modern world, Physically Unclonable Functions (PUFs) continue to promise a lightweight alternative to high-cost encryption techniques and secure key storage.

Denoising

Nominal Metaphor Generation with Multitask Learning

1 code implementation10 Jun 2022 Yucheng Li, Chenghua Lin, Frank Geurin

Metaphor generation is a challenging task which can impact many downstream tasks such as improving user satisfaction with dialogue systems and story generation.

Story Generation

TranSHER: Translating Knowledge Graph Embedding with Hyper-Ellipsoidal Restriction

1 code implementation27 Apr 2022 Yizhi Li, Wei Fan, Chao Liu, Chenghua Lin, Jiang Qian

However, such a method strictly restricts entities on the hyper-ellipsoid surfaces which limits the optimization of entity distribution, leading to suboptimal performance of knowledge graph completion.

Knowledge Graph Embedding Link Prediction +2

Recent Advances in Neural Text Generation: A Task-Agnostic Survey

no code implementations6 Mar 2022 Chen Tang, Frank Guerin, Chenghua Lin

In recent years, considerable research has been dedicated to the application of neural models in the field of natural language generation (NLG).

Survey Text Generation

Tell Me How to Survey: Literature Review Made Simple with Automatic Reading Path Generation

1 code implementation12 Oct 2021 Jiayuan Ding, Tong Xiang, Zijing Ou, Wangyang Zuo, Ruihui Zhao, Chenghua Lin, Yefeng Zheng, Bang Liu

In this paper, we introduce a new task named Reading Path Generation (RPG) which aims at automatically producing a path of papers to read for a given query.

Survey

On the Latent Holes of VAEs for Text Generation

no code implementations7 Oct 2021 Ruizhe Li, Xutan Peng, Chenghua Lin

In this paper, we provide the first focused study on the discontinuities (aka.

Decoder Text Generation

On the Latent Holes 🧀 of VAEs for Text Generation

no code implementations29 Sep 2021 Ruizhe Li, Xutan Peng, Chenghua Lin

In this paper, we provide the first focused study on the discontinuities (aka.

Decoder Text Generation

Extractive and Abstractive Sentence Labelling of Sentiment-bearing Topics

no code implementations29 Aug 2021 Mohamad Hardyman Barawi, Chenghua Lin, Advaith Siddharthan, Yinbin Liu

Our experimental results on three real-world datasets show that both the extractive and abstractive approaches outperform four strong baselines in terms of facilitating topic understanding and interpretation.

Descriptive Sentence +1

Affective Decoding for Empathetic Response Generation

1 code implementation INLG (ACL) 2021 Chengkun Zeng, Guanyi Chen, Chenghua Lin, Ruizhe Li, Zhigang Chen

Understanding speaker's feelings and producing appropriate responses with emotion connection is a key communicative skill for empathetic dialogue systems.

Empathetic Response Generation Response Generation

Guiding the Growth: Difficulty-Controllable Question Generation through Step-by-Step Rewriting

no code implementations ACL 2021 Yi Cheng, SiYao Li, Bang Liu, Ruihui Zhao, Sujian Li, Chenghua Lin, Yefeng Zheng

This paper explores the task of Difficulty-Controllable Question Generation (DCQG), which aims at generating questions with required difficulty levels.

Question Answering Question Generation +1

Combining Pre-trained Word Embeddings and Linguistic Features for Sequential Metaphor Identification

no code implementations7 Apr 2021 Rui Mao, Chenghua Lin, Frank Guerin

The pre-trained word embeddings GloVe, ELMo and BERT have individually shown good performance on sequential metaphor identification.

Word Embeddings

Interpreting Verbal Metaphors by Paraphrasing

no code implementations7 Apr 2021 Rui Mao, Chenghua Lin, Frank Guerin

Metaphorical expressions are difficult linguistic phenomena, challenging diverse Natural Language Processing tasks.

Machine Translation Translation

Summarising Historical Text in Modern Languages

1 code implementation EACL 2021 Xutan Peng, Yi Zheng, Chenghua Lin, Advaith Siddharthan

We introduce the task of historical text summarisation, where documents in historical forms of a language are summarised in the corresponding modern language.

Cross-Lingual Transfer Transfer Learning

Generating Descriptions for Sequential Images with Local-Object Attention and Global Semantic Context Modelling

no code implementations2 Dec 2020 Jing Su, Chenghua Lin, Mian Zhou, Qingyun Dai, Haoyu Lv

In this paper, we propose an end-to-end CNN-LSTM model for generating descriptions for sequential images with a local-object attention mechanism.

A Text Reassembling Approach to Natural Language Generation

no code implementations16 May 2020 Xiao Li, Kees Van Deemter, Chenghua Lin

Recent years have seen a number of proposals for performing Natural Language Generation (NLG) based in large part on statistical techniques.

Text Generation

Fast and Scalable Dialogue State Tracking with Explicit Modular Decomposition

no code implementations NAACL 2021 Dingmin Wang, Chenghua Lin, Qi Liu, Kam-Fai Wong

We present a fast and scalable architecture called Explicit Modular Decomposition (EMD), in which we incorporate both classification-based and extraction-based methods and design four modules (for classification and sequence labelling) to jointly extract dialogue states.

Classification Dialogue State Tracking +3

Understanding Linearity of Cross-Lingual Word Embedding Mappings

1 code implementation2 Apr 2020 Xutan Peng, Mark Stevenson, Chenghua Lin, Chen Li

The technique of Cross-Lingual Word Embedding (CLWE) plays a fundamental role in tackling Natural Language Processing challenges for low-resource languages.

Representation Learning Word Embeddings

A Stable Variational Autoencoder for Text Modelling

1 code implementation WS 2019 Ruizhe Li, Xiao Li, Chenghua Lin, Matthew Collinson, Rui Mao

Variational Autoencoder (VAE) is a powerful method for learning representations of high-dimensional data.

Generating Quantified Descriptions of Abstract Visual Scenes

no code implementations WS 2019 Guanyi Chen, Kees Van Deemter, Chenghua Lin

Quantified expressions have always taken up a central position in formal theories of meaning and language use.

Position Text Generation

QTUNA: A Corpus for Understanding How Speakers Use Quantification

1 code implementation WS 2019 Guanyi Chen, Kees Van Deemter, Silvia Pagliaro, Louk Smalbil, Chenghua Lin

To inform these algorithms, we conducted on a series of elicitation experiments in which human speakers were asked to perform a linguistic task that invites the use of quantified expressions.

Text Generation

Deep Ensemble Learning for News Stance Detection

no code implementations13 Sep 2019 Wenjun Liao, Chenghua Lin

The second approach is based on word embedding, where word2vec model is introduced and two document similarities calculation algorithms are implemented: wor2vec cosine similarity and WMD distance.

Ensemble Learning Fact Checking +1

Latent Space Factorisation and Manipulation via Matrix Subspace Projection

2 code implementations ICML 2020 Xiao Li, Chenghua Lin, Ruizhe Li, Chaozheng Wang, Frank Guerin

We demonstrate the utility of our method for attribute manipulation in autoencoders trained across varied domains, using both human evaluation and automated methods.

Ranked #7 on Image Generation on CelebA 256x256 (FID metric)

Attribute Face Generation +1

End-to-End Sequential Metaphor Identification Inspired by Linguistic Theories

1 code implementation ACL 2019 Rui Mao, Chenghua Lin, Frank Guerin

End-to-end training with Deep Neural Networks (DNN) is a currently popular method for metaphor identification.

SimpleNLG-ZH: a Linguistic Realisation Engine for Mandarin

1 code implementation WS 2018 Guanyi Chen, Kees Van Deemter, Chenghua Lin

We introduce SimpleNLG-ZH, a realisation engine for Mandarin that follows the software design paradigm of SimpleNLG (Gatt and Reiter, 2009).

Morphological Inflection Text Generation

Statistical NLG for Generating the Content and Form of Referring Expressions

no code implementations WS 2018 Xiao Li, Kees Van Deemter, Chenghua Lin

This paper argues that a new generic approach to statistical NLG can be made to perform Referring Expression Generation (REG) successfully.

Attribute Form +3

Modelling Pro-drop with the Rational Speech Acts Model

no code implementations WS 2018 Guanyi Chen, Kees Van Deemter, Chenghua Lin

We extend the classic Referring Expressions Generation task by considering zero pronouns in {``}pro-drop{''} languages such as Chinese, modelling their use by means of the Bayesian Rational Speech Acts model (Frank and Goodman, 2012).

Coreference Resolution Machine Translation +1

Word Embedding and WordNet Based Metaphor Identification and Interpretation

no code implementations ACL 2018 Rui Mao, Chenghua Lin, Frank Guerin

Metaphoric expressions are widespread in natural language, posing a significant challenge for various natural language processing tasks such as Machine Translation.

Decision Making Machine Translation +4

Analysing the Causes of Depressed Mood from Depression Vulnerable Individuals

no code implementations WS 2017 Noor Fazilla Abd Yusof, Chenghua Lin, Frank Guerin

We develop a computational model to discover the potential causes of depression by analysing the topics in a usergenerated text.

Cannot find the paper you are looking for? You can Submit a new open access paper.