Search Results for author: Kaijie Zhu

Found 10 papers, 6 papers with code

NPHardEval4V: A Dynamic Reasoning Benchmark of Multimodal Large Language Models

1 code implementation • 4 Mar 2024 • Lizhou Fan, Wenyue Hua, Xiang Li, Kaijie Zhu, Mingyu Jin, Lingyao Li, Haoyang Ling, Jinkui Chi, Jindong Wang, Xin Ma, Yongfeng Zhang

Understanding the reasoning capabilities of Multimodal Large Language Models (MLLMs) is an important area of research.

Instruction Following

Paper
Code

DyVal 2: Dynamic Evaluation of Large Language Models by Meta Probing Agents

no code implementations • 21 Feb 2024 • Kaijie Zhu, Jindong Wang, Qinlin Zhao, Ruochen Xu, Xing Xie

Our multifaceted analysis demonstrated the strong correlation between the basic abilities and an implicit Matthew effect on model size, i. e., larger models possess stronger correlations of the abilities.

Data Augmentation

Paper
Add Code

The Good, The Bad, and Why: Unveiling Emotions in Generative AI

no code implementations • 18 Dec 2023 • Cheng Li, Jindong Wang, Yixuan Zhang, Kaijie Zhu, Xinyi Wang, Wenxin Hou, Jianxun Lian, Fang Luo, Qiang Yang, Xing Xie

Through extensive experiments involving language and multi-modal models on semantic understanding, logical reasoning, and generation tasks, we demonstrate that both textual and visual EmotionPrompt can boost the performance of AI models while EmotionAttack can hinder it.

Logical Reasoning

Paper
Add Code

PromptBench: A Unified Library for Evaluation of Large Language Models

1 code implementation • 13 Dec 2023 • Kaijie Zhu, Qinlin Zhao, Hao Chen, Jindong Wang, Xing Xie

The evaluation of large language models (LLMs) is crucial to assess their performance and mitigate potential security risks.

Prompt Engineering

1,974

Paper
Code

CompeteAI: Understanding the Competition Behaviors in Large Language Model-based Agents

no code implementations • 26 Oct 2023 • Qinlin Zhao, Jindong Wang, Yixuan Zhang, Yiqiao Jin, Kaijie Zhu, Hao Chen, Xing Xie

Large language models (LLMs) have been widely used as agents to complete different tasks, such as personal assistance or event planning.

Language Modelling Large Language Model

Paper
Add Code

DyVal: Dynamic Evaluation of Large Language Models for Reasoning Tasks

1 code implementation • 29 Sep 2023 • Kaijie Zhu, Jiaao Chen, Jindong Wang, Neil Zhenqiang Gong, Diyi Yang, Xing Xie

Moreover, DyVal-generated samples are not only evaluation sets, but also helpful data for fine-tuning to improve the performance of LLMs on existing benchmarks.

Logical Reasoning

1,974

Paper
Code

Improving Generalization of Adversarial Training via Robust Critical Fine-Tuning

1 code implementation • ICCV 2023 • Kaijie Zhu, Jindong Wang, Xixu Hu, Xing Xie, Ge Yang

The core idea of RiFT is to exploit the redundant capacity for robustness by fine-tuning the adversarially trained model on its non-robust-critical module.

Adversarial Robustness

421

Paper
Code

Large Language Models Understand and Can be Enhanced by Emotional Stimuli

no code implementations • 14 Jul 2023 • Cheng Li, Jindong Wang, Yixuan Zhang, Kaijie Zhu, Wenxin Hou, Jianxun Lian, Fang Luo, Qiang Yang, Xing Xie

In addition to those deterministic tasks that can be automatically evaluated using existing metrics, we conducted a human study with 106 participants to assess the quality of generative tasks using both vanilla and emotional prompts.

Emotional Intelligence Informativeness

Paper
Add Code

A Survey on Evaluation of Large Language Models

1 code implementation • 6 Jul 2023 • Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, Linyi Yang, Kaijie Zhu, Hao Chen, Xiaoyuan Yi, Cunxiang Wang, Yidong Wang, Wei Ye, Yue Zhang, Yi Chang, Philip S. Yu, Qiang Yang, Xing Xie

Large language models (LLMs) are gaining increasing popularity in both academia and industry, owing to their unprecedented performance in various applications.

Ethics

1,225

Paper
Code

PromptBench: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts

1 code implementation • 7 Jun 2023 • Kaijie Zhu, Jindong Wang, Jiaheng Zhou, Zichen Wang, Hao Chen, Yidong Wang, Linyi Yang, Wei Ye, Yue Zhang, Neil Zhenqiang Gong, Xing Xie

The increasing reliance on Large Language Models (LLMs) across academia and industry necessitates a comprehensive understanding of their robustness to prompts.

Cross-Lingual Paraphrase Identification Machine Translation +5

1,974

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.