Search Results for author: Leyang Cui

Found 42 papers, 31 papers with code

Mitigating Catastrophic Forgetting in Large Language Models with Self-Synthesized Rehearsal

no code implementations2 Mar 2024 Jianheng Huang, Leyang Cui, Ante Wang, Chengyi Yang, Xinting Liao, Linfeng Song, Junfeng Yao, Jinsong Su

When conducting continual learning based on a publicly-released LLM checkpoint, the availability of the original training data may be non-existent.

Continual Learning In-Context Learning

GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem Solvers

1 code implementation29 Feb 2024 Qintong Li, Leyang Cui, Xueliang Zhao, Lingpeng Kong, Wei Bi

Large language models (LLMs) have achieved impressive performance across various mathematical reasoning benchmarks.

GSM8K Math +1

Retrieval is Accurate Generation

no code implementations27 Feb 2024 Bowen Cao, Deng Cai, Leyang Cui, Xuxin Cheng, Wei Bi, Yuexian Zou, Shuming Shi

To address this, we propose to initialize the training oracles using linguistic heuristics and, more importantly, bootstrap the oracles through iterative self-reinforcement.

Language Modelling Retrieval +1

Knowledge Verification to Nip Hallucination in the Bud

1 code implementation19 Jan 2024 Fanqi Wan, Xinting Huang, Leyang Cui, Xiaojun Quan, Wei Bi, Shuming Shi

While large language models (LLMs) have demonstrated exceptional performance across various tasks following human alignment, they may still generate responses that sound plausible but contradict factual knowledge, a phenomenon known as \emph{hallucination}.

Hallucination World Knowledge

Inferflow: an Efficient and Highly Configurable Inference Engine for Large Language Models

1 code implementation16 Jan 2024 Shuming Shi, Enbo Zhao, Deng Cai, Leyang Cui, Xinting Huang, Huayang Li

We present Inferflow, an efficient and highly configurable inference engine for large language models (LLMs).


Alleviating Hallucinations of Large Language Models through Induced Hallucinations

2 code implementations25 Dec 2023 Yue Zhang, Leyang Cui, Wei Bi, Shuming Shi

Experimental results on both discrimination-based and generation-based hallucination evaluation benchmarks, such as TruthfulQA and \textsc{FActScore}, demonstrate that our proposed ICD methods can effectively enhance the factuality of LLMs across various model sizes and families.

Hallucination Hallucination Evaluation

Neuro-Symbolic Integration Brings Causal and Reliable Reasoning Proofs

1 code implementation16 Nov 2023 Sen yang, Xin Li, Leyang Cui, Lidong Bing, Wai Lam

Though prompting LLMs with various reasoning structures produces reasoning proofs along with answers, these proofs are not ensured to be causal and reliable due to the inherent defects of LLMs.


A Systematic Evaluation of GPT-4V's Multimodal Capability for Medical Image Analysis

no code implementations31 Oct 2023 Yingshu Li, Yunyi Liu, Zhanyu Wang, Xinyu Liang, Lei Wang, Lingqiao Liu, Leyang Cui, Zhaopeng Tu, Longyue Wang, Luping Zhou

This work conducts an evaluation of GPT-4V's multimodal capability for medical image analysis, with a focus on three representative tasks of radiology report generation, medical visual question answering, and medical visual grounding.

Descriptive Medical Visual Question Answering +3

Collaborative Evaluation: Exploring the Synergy of Large Language Models and Humans for Open-ended Generation Evaluation

1 code implementation30 Oct 2023 Qintong Li, Leyang Cui, Lingpeng Kong, Wei Bi

To explore the synergy between humans and LLM-based evaluators and address the challenges of existing inconsistent evaluation criteria in open-ended NLG tasks, we propose a Collaborative Evaluation pipeline CoEval, involving the design of a checklist of task-specific criteria and the detailed evaluation of texts, in which LLM generates initial ideation, and then humans engage in scrutiny.

Text Generation

RobustGEC: Robust Grammatical Error Correction Against Subtle Context Perturbation

1 code implementation11 Oct 2023 Yue Zhang, Leyang Cui, Enbo Zhao, Wei Bi, Shuming Shi

In this paper, we introduce RobustGEC, a benchmark designed to evaluate the context robustness of GEC systems.

Grammatical Error Correction Sentence

Non-autoregressive Text Editing with Copy-aware Latent Alignments

1 code implementation11 Oct 2023 Yu Zhang, Yue Zhang, Leyang Cui, Guohong Fu

In this work, we propose a novel non-autoregressive text editing method to circumvent the above issues, by modeling the edit process with latent CTC alignments.

Management Sentence +1

Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models

1 code implementation3 Sep 2023 Yue Zhang, Yafu Li, Leyang Cui, Deng Cai, Lemao Liu, Tingchen Fu, Xinting Huang, Enbo Zhao, Yu Zhang, Yulong Chen, Longyue Wang, Anh Tuan Luu, Wei Bi, Freda Shi, Shuming Shi

While large language models (LLMs) have demonstrated remarkable capabilities across a range of downstream tasks, a significant concern revolves around their propensity to exhibit hallucinations: LLMs occasionally generate content that diverges from the user input, contradicts previously generated context, or misaligns with established world knowledge.

Hallucination World Knowledge

Automated Action Model Acquisition from Narrative Texts

no code implementations17 Jul 2023 RuiQi Li, Leyang Cui, Songtuan Lin, Patrik Haslum

Action models, which take the form of precondition/effect axioms, facilitate causal and motivational connections between actions for AI agents.

Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language Modelling

no code implementations16 Jul 2023 Longyue Wang, Zefeng Du, Donghuai Liu, Deng Cai, Dian Yu, Haiyun Jiang, Yan Wang, Leyang Cui, Shuming Shi, Zhaopeng Tu

Modeling discourse -- the linguistic phenomena that go beyond individual sentences, is a fundamental yet challenging aspect of natural language processing (NLP).

Language Modelling Sentence

Enhancing Grammatical Error Correction Systems with Explanations

1 code implementation25 May 2023 Yuejiao Fei, Leyang Cui, Sen yang, Wai Lam, Zhenzhong Lan, Shuming Shi

Grammatical error correction systems improve written communication by detecting and correcting language mistakes.

Grammatical Error Correction

Multi-Task Instruction Tuning of LLaMa for Specific Scenarios: A Preliminary Study on Writing Assistance

no code implementations22 May 2023 Yue Zhang, Leyang Cui, Deng Cai, Xinting Huang, Tao Fang, Wei Bi

Proprietary Large Language Models (LLMs), such as ChatGPT, have garnered significant attention due to their exceptional capabilities in handling a diverse range of tasks.

Instruction Following

Deepfake Text Detection in the Wild

1 code implementation22 May 2023 Yafu Li, Qintong Li, Leyang Cui, Wei Bi, Longyue Wang, Linyi Yang, Shuming Shi, Yue Zhang

In practical scenarios, the detector faces texts from various domains or LLMs without knowing their sources.

Face Swapping Story Generation +1

LogiCoT: Logical Chain-of-Thought Instruction-Tuning

1 code implementation20 May 2023 Hanmeng Liu, Zhiyang Teng, Leyang Cui, Chaoli Zhang, Qiji Zhou, Yue Zhang

LogiCoT serves as an instruction set for teaching models of logical reasoning and elicits general reasoning skills.

Logical Reasoning Text Generation

EDeR: A Dataset for Exploring Dependency Relations Between Events

1 code implementation4 Apr 2023 RuiQi Li, Patrik Haslum, Leyang Cui

We argue that an important type of relation not explored in NLP or IR research to date is that of an event being an argument - required or optional - of another event.

Event Extraction Information Retrieval +3

Cross-domain Generalization for AMR Parsing

1 code implementation22 Oct 2022 Xuefeng Bai, Seng Yang, Leyang Cui, Linfeng Song, Yue Zhang

Based on our observation, we investigate two approaches to reduce the domain distribution divergence of text and AMR features, respectively.

AMR Parsing Domain Generalization

Multi-Granularity Optimization for Non-Autoregressive Translation

1 code implementation20 Oct 2022 Yafu Li, Leyang Cui, Yongjing Yin, Yue Zhang

Despite low latency, non-autoregressive machine translation (NAT) suffers severe performance deterioration due to the naive independence assumption.

Machine Translation Translation

Effidit: Your AI Writing Assistant

no code implementations3 Aug 2022 Shuming Shi, Enbo Zhao, Duyu Tang, Yan Wang, Piji Li, Wei Bi, Haiyun Jiang, Guoping Huang, Leyang Cui, Xinting Huang, Cong Zhou, Yong Dai, Dongyang Ma

In Effidit, we significantly expand the capacities of a writing assistant by providing functions in five categories: text completion, error checking, text polishing, keywords to sentences (K2S), and cloud input methods (cloud IME).

Keywords to Sentences Retrieval +3

Towards Robust Online Dialogue Response Generation

no code implementations7 Mar 2022 Leyang Cui, Fandong Meng, Yijin Liu, Jie zhou, Yue Zhang

Although pre-trained sequence-to-sequence models have achieved great success in dialogue response generation, chatbots still suffer from generating inconsistent responses in real-world practice, especially in multi-turn settings.

Chatbot Re-Ranking +1

Do Prompts Solve NLP Tasks Using Natural Language?

no code implementations2 Mar 2022 Sen yang, Yunchen Zhang, Leyang Cui, Yue Zhang

Thanks to the advanced improvement of large pre-trained language models, prompt-based fine-tuning is shown to be effective on a variety of downstream tasks.

Investigating Non-local Features for Neural Constituency Parsing

1 code implementation ACL 2022 Leyang Cui, Sen yang, Yue Zhang

Besides, our method achieves state-of-the-art BERT-based performance on PTB (95. 92 F1) and strong performance on CTB (92. 31 F1).

Constituency Parsing

Knowledge Enhanced Fine-Tuning for Better Handling Unseen Entities in Dialogue Generation

1 code implementation EMNLP 2021 Leyang Cui, Yu Wu, Shujie Liu, Yue Zhang

To deal with this problem, instead of introducing knowledge base as the input, we force the model to learn a better semantic representation by predicting the information in the knowledge base, only based on the input context.

Dialogue Generation Retrieval

Template-Based Named Entity Recognition Using BART

1 code implementation Findings (ACL) 2021 Leyang Cui, Yu Wu, Jian Liu, Sen yang, Yue Zhang

To address the issue, we propose a template-based method for NER, treating NER as a language model ranking problem in a sequence-to-sequence framework, where original sentences and statement templates filled by candidate named entity span are regarded as the source sequence and the target sequence, respectively.

Few-shot NER Language Modelling +2

Uni-Encoder: A Fast and Accurate Response Selection Paradigm for Generation-Based Dialogue Systems

1 code implementation2 Jun 2021 Chiyu Song, Hongliang He, Haofei Yu, Pengfei Fang, Leyang Cui, Zhenzhong Lan

The current state-of-the-art ranking methods mainly use an encoding paradigm called Cross-Encoder, which separately encodes each context-candidate pair and ranks the candidates according to their fitness scores.

Computational Efficiency Conversational Response Selection

Natural Language Inference in Context -- Investigating Contextual Reasoning over Long Texts

1 code implementation10 Nov 2020 Hanmeng Liu, Leyang Cui, Jian Liu, Yue Zhang

Natural language inference (NLI) is a fundamental NLP task, investigating the entailment relationship between two texts.

Logical Reasoning Natural Language Inference +1

Does Chinese BERT Encode Word Structure?

1 code implementation COLING 2020 Yile Wang, Leyang Cui, Yue Zhang

Contextualized representations give significantly improved results for a wide range of NLP tasks.

Chunking Natural Language Inference +2

What Have We Achieved on Text Summarization?

1 code implementation EMNLP 2020 Dandan Huang, Leyang Cui, Sen yang, Guangsheng Bao, Kun Wang, Jun Xie, Yue Zhang

Deep learning has led to significant improvement in text summarization with various methods investigated and improved ROUGE scores reported over the years.

Text Summarization

On Commonsense Cues in BERT for Solving Commonsense Tasks

no code implementations Findings (ACL) 2021 Leyang Cui, Sijie Cheng, Yu Wu, Yue Zhang

We quantitatively investigate the presence of structural commonsense cues in BERT when solving commonsense tasks, and the importance of such cues for the model prediction.

Sentiment Analysis Sentiment Classification

LogiQA: A Challenge Dataset for Machine Reading Comprehension with Logical Reasoning

2 code implementations16 Jul 2020 Jian Liu, Leyang Cui, Hanmeng Liu, Dandan Huang, Yile Wang, Yue Zhang

Machine reading is a fundamental task for testing the capability of natural language understanding, which is closely related to human cognition in many aspects.

Logical Reasoning Machine Reading Comprehension +1

MuTual: A Dataset for Multi-Turn Dialogue Reasoning

1 code implementation ACL 2020 Leyang Cui, Yu Wu, Shujie Liu, Yue Zhang, Ming Zhou

Non-task oriented dialogue systems have achieved great success in recent years due to largely accessible conversation data and the development of deep learning techniques.

Task-Oriented Dialogue Systems

Evaluating Commonsense in Pre-trained Language Models

1 code implementation27 Nov 2019 Xuhui Zhou, Yue Zhang, Leyang Cui, Dandan Huang

However, relatively little work has been done investigating commonsense knowledge contained in contextualized representations, which is crucial for human question answering and reading comprehension.

Language Modelling Question Answering +1

How Can BERT Help Lexical Semantics Tasks?

no code implementations7 Nov 2019 Yile Wang, Leyang Cui, Yue Zhang

Contextualized embeddings such as BERT can serve as strong input representations to NLP tasks, outperforming their static embeddings counterparts such as skip-gram, CBOW and GloVe.

Sentence Word Embeddings

Cannot find the paper you are looking for? You can Submit a new open access paper.