Search Results for author: Kaijie Zhu

Found 10 papers, 6 papers with code

DyVal 2: Dynamic Evaluation of Large Language Models by Meta Probing Agents

no code implementations21 Feb 2024 Kaijie Zhu, Jindong Wang, Qinlin Zhao, Ruochen Xu, Xing Xie

Our multifaceted analysis demonstrated the strong correlation between the basic abilities and an implicit Matthew effect on model size, i. e., larger models possess stronger correlations of the abilities.

Data Augmentation

The Good, The Bad, and Why: Unveiling Emotions in Generative AI

no code implementations18 Dec 2023 Cheng Li, Jindong Wang, Yixuan Zhang, Kaijie Zhu, Xinyi Wang, Wenxin Hou, Jianxun Lian, Fang Luo, Qiang Yang, Xing Xie

Through extensive experiments involving language and multi-modal models on semantic understanding, logical reasoning, and generation tasks, we demonstrate that both textual and visual EmotionPrompt can boost the performance of AI models while EmotionAttack can hinder it.

Logical Reasoning

PromptBench: A Unified Library for Evaluation of Large Language Models

1 code implementation13 Dec 2023 Kaijie Zhu, Qinlin Zhao, Hao Chen, Jindong Wang, Xing Xie

The evaluation of large language models (LLMs) is crucial to assess their performance and mitigate potential security risks.

Prompt Engineering

CompeteAI: Understanding the Competition Behaviors in Large Language Model-based Agents

no code implementations26 Oct 2023 Qinlin Zhao, Jindong Wang, Yixuan Zhang, Yiqiao Jin, Kaijie Zhu, Hao Chen, Xing Xie

Large language models (LLMs) have been widely used as agents to complete different tasks, such as personal assistance or event planning.

Language Modelling Large Language Model

DyVal: Dynamic Evaluation of Large Language Models for Reasoning Tasks

1 code implementation29 Sep 2023 Kaijie Zhu, Jiaao Chen, Jindong Wang, Neil Zhenqiang Gong, Diyi Yang, Xing Xie

Moreover, DyVal-generated samples are not only evaluation sets, but also helpful data for fine-tuning to improve the performance of LLMs on existing benchmarks.

Logical Reasoning

Improving Generalization of Adversarial Training via Robust Critical Fine-Tuning

1 code implementation ICCV 2023 Kaijie Zhu, Jindong Wang, Xixu Hu, Xing Xie, Ge Yang

The core idea of RiFT is to exploit the redundant capacity for robustness by fine-tuning the adversarially trained model on its non-robust-critical module.

Adversarial Robustness

Large Language Models Understand and Can be Enhanced by Emotional Stimuli

no code implementations14 Jul 2023 Cheng Li, Jindong Wang, Yixuan Zhang, Kaijie Zhu, Wenxin Hou, Jianxun Lian, Fang Luo, Qiang Yang, Xing Xie

In addition to those deterministic tasks that can be automatically evaluated using existing metrics, we conducted a human study with 106 participants to assess the quality of generative tasks using both vanilla and emotional prompts.

Emotional Intelligence Informativeness

A Survey on Evaluation of Large Language Models

1 code implementation6 Jul 2023 Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, Linyi Yang, Kaijie Zhu, Hao Chen, Xiaoyuan Yi, Cunxiang Wang, Yidong Wang, Wei Ye, Yue Zhang, Yi Chang, Philip S. Yu, Qiang Yang, Xing Xie

Large language models (LLMs) are gaining increasing popularity in both academia and industry, owing to their unprecedented performance in various applications.

Ethics

Cannot find the paper you are looking for? You can Submit a new open access paper.