Search Results for author: Leyang Cui

Found 48 papers, 36 papers with code

ThinkBench: Dynamic Out-of-Distribution Evaluation for Robust LLM Reasoning

no code implementations22 Feb 2025 Shulin Huang, Linyi Yang, Yan Song, Shuang Chen, Leyang Cui, Ziyu Wan, Qingcheng Zeng, Ying Wen, Kun Shao, Weinan Zhang, Jun Wang, Yue Zhang

Evaluating large language models (LLMs) poses significant challenges, particularly due to issues of data contamination and the leakage of correct answers.

Selection-p: Self-Supervised Task-Agnostic Prompt Compression for Faithfulness and Transferability

no code implementations15 Oct 2024 Tsz Ting Chung, Leyang Cui, Lemao Liu, Xinting Huang, Shuming Shi, Dit-yan Yeung

Large Language Models (LLMs) have demonstrated impressive capabilities in a wide range of natural language processing tasks when leveraging in-context learning.

In-Context Learning

Gated Slot Attention for Efficient Linear-Time Sequence Modeling

2 code implementations11 Sep 2024 Yu Zhang, Songlin Yang, Ruijie Zhu, Yue Zhang, Leyang Cui, Yiqiao Wang, Bolun Wang, Freda Shi, Bailin Wang, Wei Bi, Peng Zhou, Guohong Fu

Linear attention Transformers and their gated variants, celebrated for enabling parallel training and efficient recurrent inference, still fall short in recall-intensive tasks compared to traditional Transformers and demand significant resources for training from scratch.

Not All Preference Pairs Are Created Equal: A Recipe for Annotation-Efficient Iterative Preference Learning

no code implementations25 Jun 2024 Sen yang, Leyang Cui, Deng Cai, Xinting Huang, Shuming Shi, Wai Lam

Iterative preference learning, though yielding superior performances, requires online annotated preference labels.

All

On the Transformations across Reward Model, Parameter Update, and In-Context Prompt

no code implementations24 Jun 2024 Deng Cai, Huayang Li, Tingchen Fu, Siheng Li, Weiwen Xu, Shuaiyi Li, Bowen Cao, Zhisong Zhang, Xinting Huang, Leyang Cui, Yan Wang, Lemao Liu, Taro Watanabe, Shuming Shi

Despite the general capabilities of pre-trained large language models (LLMs), they still need further adaptation to better serve practical applications.

Spotting AI's Touch: Identifying LLM-Paraphrased Spans in Text

1 code implementation21 May 2024 Yafu Li, Zhilin Wang, Leyang Cui, Wei Bi, Shuming Shi, Yue Zhang

To this end, we propose a novel detection framework, paraphrased text span detection (PTD), aiming to identify paraphrased text spans within a text.

Diversity Text Detection

Mitigating Catastrophic Forgetting in Large Language Models with Self-Synthesized Rehearsal

1 code implementation2 Mar 2024 Jianheng Huang, Leyang Cui, Ante Wang, Chengyi Yang, Xinting Liao, Linfeng Song, Junfeng Yao, Jinsong Su

When conducting continual learning based on a publicly-released LLM checkpoint, the availability of the original training data may be non-existent.

Continual Learning In-Context Learning

Retrieval is Accurate Generation

1 code implementation27 Feb 2024 Bowen Cao, Deng Cai, Leyang Cui, Xuxin Cheng, Wei Bi, Yuexian Zou, Shuming Shi

To address this, we propose to initialize the training oracles using linguistic heuristics and, more importantly, bootstrap the oracles through iterative self-reinforcement.

Language Modeling Language Modelling +2

Knowledge Verification to Nip Hallucination in the Bud

1 code implementation19 Jan 2024 Fanqi Wan, Xinting Huang, Leyang Cui, Xiaojun Quan, Wei Bi, Shuming Shi

While large language models (LLMs) have demonstrated exceptional performance across various tasks following human alignment, they may still generate responses that sound plausible but contradict factual knowledge, a phenomenon known as hallucination.

Hallucination World Knowledge

Inferflow: an Efficient and Highly Configurable Inference Engine for Large Language Models

1 code implementation16 Jan 2024 Shuming Shi, Enbo Zhao, Deng Cai, Leyang Cui, Xinting Huang, Huayang Li

We present Inferflow, an efficient and highly configurable inference engine for large language models (LLMs).

Quantization

Alleviating Hallucinations of Large Language Models through Induced Hallucinations

2 code implementations25 Dec 2023 Yue Zhang, Leyang Cui, Wei Bi, Shuming Shi

Experimental results on both discrimination-based and generation-based hallucination evaluation benchmarks, such as TruthfulQA and \textsc{FActScore}, demonstrate that our proposed ICD methods can effectively enhance the factuality of LLMs across various model sizes and families.

Hallucination Hallucination Evaluation +1

A Systematic Evaluation of GPT-4V's Multimodal Capability for Medical Image Analysis

no code implementations31 Oct 2023 Yingshu Li, Yunyi Liu, Zhanyu Wang, Xinyu Liang, Lei Wang, Lingqiao Liu, Leyang Cui, Zhaopeng Tu, Longyue Wang, Luping Zhou

This work conducts an evaluation of GPT-4V's multimodal capability for medical image analysis, with a focus on three representative tasks of radiology report generation, medical visual question answering, and medical visual grounding.

Descriptive Medical Image Analysis +4

Exploring the Reliability of Large Language Models as Customized Evaluators for Diverse NLP Tasks

1 code implementation30 Oct 2023 Qintong Li, Leyang Cui, Lingpeng Kong, Wei Bi

Previous work adopts large language models (LLMs) as evaluators to evaluate natural language process (NLP) tasks.

Fairness Math +1

RobustGEC: Robust Grammatical Error Correction Against Subtle Context Perturbation

1 code implementation11 Oct 2023 Yue Zhang, Leyang Cui, Enbo Zhao, Wei Bi, Shuming Shi

In this paper, we introduce RobustGEC, a benchmark designed to evaluate the context robustness of GEC systems.

Grammatical Error Correction Sentence

Non-autoregressive Text Editing with Copy-aware Latent Alignments

1 code implementation11 Oct 2023 Yu Zhang, Yue Zhang, Leyang Cui, Guohong Fu

In this work, we propose a novel non-autoregressive text editing method to circumvent the above issues, by modeling the edit process with latent CTC alignments.

Management Sentence +1

Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models

1 code implementation3 Sep 2023 Yue Zhang, Yafu Li, Leyang Cui, Deng Cai, Lemao Liu, Tingchen Fu, Xinting Huang, Enbo Zhao, Yu Zhang, Yulong Chen, Longyue Wang, Anh Tuan Luu, Wei Bi, Freda Shi, Shuming Shi

While large language models (LLMs) have demonstrated remarkable capabilities across a range of downstream tasks, a significant concern revolves around their propensity to exhibit hallucinations: LLMs occasionally generate content that diverges from the user input, contradicts previously generated context, or misaligns with established world knowledge.

Hallucination World Knowledge

Automated Action Model Acquisition from Narrative Texts

no code implementations17 Jul 2023 RuiQi Li, Leyang Cui, Songtuan Lin, Patrik Haslum

Action models, which take the form of precondition/effect axioms, facilitate causal and motivational connections between actions for AI agents.

model

Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language Modelling

1 code implementation16 Jul 2023 Longyue Wang, Zefeng Du, Donghuai Liu, Deng Cai, Dian Yu, Haiyun Jiang, Yan Wang, Leyang Cui, Shuming Shi, Zhaopeng Tu

Modeling discourse -- the linguistic phenomena that go beyond individual sentences, is a fundamental yet challenging aspect of natural language processing (NLP).

Diagnostic Language Modelling +1

Enhancing Grammatical Error Correction Systems with Explanations

1 code implementation25 May 2023 Yuejiao Fei, Leyang Cui, Sen yang, Wai Lam, Zhenzhong Lan, Shuming Shi

Grammatical error correction systems improve written communication by detecting and correcting language mistakes.

Grammatical Error Correction

Multi-Task Instruction Tuning of LLaMa for Specific Scenarios: A Preliminary Study on Writing Assistance

no code implementations22 May 2023 Yue Zhang, Leyang Cui, Deng Cai, Xinting Huang, Tao Fang, Wei Bi

Proprietary Large Language Models (LLMs), such as ChatGPT, have garnered significant attention due to their exceptional capabilities in handling a diverse range of tasks.

Instruction Following

LogiCoT: Logical Chain-of-Thought Instruction-Tuning

1 code implementation20 May 2023 Hanmeng Liu, Zhiyang Teng, Leyang Cui, Chaoli Zhang, Qiji Zhou, Yue Zhang

LogiCoT serves as an instruction set for teaching models of logical reasoning and elicits general reasoning skills.

Logical Reasoning Text Generation

EDeR: A Dataset for Exploring Dependency Relations Between Events

1 code implementation4 Apr 2023 RuiQi Li, Patrik Haslum, Leyang Cui

We argue that an important type of relation not explored in NLP or IR research to date is that of an event being an argument - required or optional - of another event.

Event Extraction Information Retrieval +3

Cross-domain Generalization for AMR Parsing

1 code implementation22 Oct 2022 Xuefeng Bai, Seng Yang, Leyang Cui, Linfeng Song, Yue Zhang

Based on our observation, we investigate two approaches to reduce the domain distribution divergence of text and AMR features, respectively.

Abstract Meaning Representation AMR Parsing +1

Multi-Granularity Optimization for Non-Autoregressive Translation

1 code implementation20 Oct 2022 Yafu Li, Leyang Cui, Yongjing Yin, Yue Zhang

Despite low latency, non-autoregressive machine translation (NAT) suffers severe performance deterioration due to the naive independence assumption.

Machine Translation Translation

Effidit: Your AI Writing Assistant

no code implementations3 Aug 2022 Shuming Shi, Enbo Zhao, Duyu Tang, Yan Wang, Piji Li, Wei Bi, Haiyun Jiang, Guoping Huang, Leyang Cui, Xinting Huang, Cong Zhou, Yong Dai, Dongyang Ma

In Effidit, we significantly expand the capacities of a writing assistant by providing functions in five categories: text completion, error checking, text polishing, keywords to sentences (K2S), and cloud input methods (cloud IME).

Keywords to Sentences Retrieval +3

Towards Robust Online Dialogue Response Generation

no code implementations7 Mar 2022 Leyang Cui, Fandong Meng, Yijin Liu, Jie zhou, Yue Zhang

Although pre-trained sequence-to-sequence models have achieved great success in dialogue response generation, chatbots still suffer from generating inconsistent responses in real-world practice, especially in multi-turn settings.

Chatbot Re-Ranking +1

Do Prompts Solve NLP Tasks Using Natural Language?

no code implementations2 Mar 2022 Sen yang, Yunchen Zhang, Leyang Cui, Yue Zhang

Thanks to the advanced improvement of large pre-trained language models, prompt-based fine-tuning is shown to be effective on a variety of downstream tasks.

Investigating Non-local Features for Neural Constituency Parsing

1 code implementation ACL 2022 Leyang Cui, Sen yang, Yue Zhang

Besides, our method achieves state-of-the-art BERT-based performance on PTB (95. 92 F1) and strong performance on CTB (92. 31 F1).

Constituency Parsing

Knowledge Enhanced Fine-Tuning for Better Handling Unseen Entities in Dialogue Generation

1 code implementation EMNLP 2021 Leyang Cui, Yu Wu, Shujie Liu, Yue Zhang

To deal with this problem, instead of introducing knowledge base as the input, we force the model to learn a better semantic representation by predicting the information in the knowledge base, only based on the input context.

Dialogue Generation Retrieval

Template-Based Named Entity Recognition Using BART

1 code implementation Findings (ACL) 2021 Leyang Cui, Yu Wu, Jian Liu, Sen yang, Yue Zhang

To address the issue, we propose a template-based method for NER, treating NER as a language model ranking problem in a sequence-to-sequence framework, where original sentences and statement templates filled by candidate named entity span are regarded as the source sequence and the target sequence, respectively.

Few-shot NER Language Modeling +3

Uni-Encoder: A Fast and Accurate Response Selection Paradigm for Generation-Based Dialogue Systems

1 code implementation2 Jun 2021 Chiyu Song, Hongliang He, Haofei Yu, Pengfei Fang, Leyang Cui, Zhenzhong Lan

The current state-of-the-art ranking methods mainly use an encoding paradigm called Cross-Encoder, which separately encodes each context-candidate pair and ranks the candidates according to their fitness scores.

Computational Efficiency Conversational Response Selection

Natural Language Inference in Context -- Investigating Contextual Reasoning over Long Texts

1 code implementation10 Nov 2020 Hanmeng Liu, Leyang Cui, Jian Liu, Yue Zhang

Natural language inference (NLI) is a fundamental NLP task, investigating the entailment relationship between two texts.

Logical Reasoning Natural Language Inference +1

Does Chinese BERT Encode Word Structure?

1 code implementation COLING 2020 Yile Wang, Leyang Cui, Yue Zhang

Contextualized representations give significantly improved results for a wide range of NLP tasks.

Chunking Natural Language Inference +2

What Have We Achieved on Text Summarization?

1 code implementation EMNLP 2020 Dandan Huang, Leyang Cui, Sen yang, Guangsheng Bao, Kun Wang, Jun Xie, Yue Zhang

Deep learning has led to significant improvement in text summarization with various methods investigated and improved ROUGE scores reported over the years.

Text Summarization

On Commonsense Cues in BERT for Solving Commonsense Tasks

no code implementations Findings (ACL) 2021 Leyang Cui, Sijie Cheng, Yu Wu, Yue Zhang

We quantitatively investigate the presence of structural commonsense cues in BERT when solving commonsense tasks, and the importance of such cues for the model prediction.

Sentiment Analysis Sentiment Classification

LogiQA: A Challenge Dataset for Machine Reading Comprehension with Logical Reasoning

2 code implementations16 Jul 2020 Jian Liu, Leyang Cui, Hanmeng Liu, Dandan Huang, Yile Wang, Yue Zhang

Machine reading is a fundamental task for testing the capability of natural language understanding, which is closely related to human cognition in many aspects.

Logical Reasoning Machine Reading Comprehension +1

MuTual: A Dataset for Multi-Turn Dialogue Reasoning

1 code implementation ACL 2020 Leyang Cui, Yu Wu, Shujie Liu, Yue Zhang, Ming Zhou

Non-task oriented dialogue systems have achieved great success in recent years due to largely accessible conversation data and the development of deep learning techniques.

Task-Oriented Dialogue Systems

Evaluating Commonsense in Pre-trained Language Models

1 code implementation27 Nov 2019 Xuhui Zhou, Yue Zhang, Leyang Cui, Dandan Huang

However, relatively little work has been done investigating commonsense knowledge contained in contextualized representations, which is crucial for human question answering and reading comprehension.

Language Modeling Language Modelling +2

How Can BERT Help Lexical Semantics Tasks?

no code implementations7 Nov 2019 Yile Wang, Leyang Cui, Yue Zhang

Contextualized embeddings such as BERT can serve as strong input representations to NLP tasks, outperforming their static embeddings counterparts such as skip-gram, CBOW and GloVe.

Sentence Word Embeddings

Cannot find the paper you are looking for? You can Submit a new open access paper.