EQ-Bench: An Emotional Intelligence Benchmark for Large Language Models

11 Dec 2023  ยท  Samuel J. Paech ยท

We introduce EQ-Bench, a novel benchmark designed to evaluate aspects of emotional intelligence in Large Language Models (LLMs). We assess the ability of LLMs to understand complex emotions and social interactions by asking them to predict the intensity of emotional states of characters in a dialogue. The benchmark is able to discriminate effectively between a wide range of models. We find that EQ-Bench correlates strongly with comprehensive multi-domain benchmarks like MMLU (Hendrycks et al., 2020) (r=0.97), indicating that we may be capturing similar aspects of broad intelligence. Our benchmark produces highly repeatable results using a set of 60 English-language questions. We also provide open-source code for an automated benchmarking pipeline at https://github.com/EQ-bench/EQ-Bench and a leaderboard at https://eqbench.com

PDF Abstract

Datasets


Introduced in the Paper:

EQ-Bench

Used in the Paper:

MMLU HellaSwag TruthfulQA MT-Bench

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Emotional Intelligence EQ-Bench OpenAI gpt-4-0613 EQ-Bench Score 62.52 # 1
Emotional Intelligence EQ-Bench OpenAI ADA EQ-Bench Score 2.25 # 23
EQ-Bench Score 2.25 # 23
Emotional Intelligence EQ-Bench OpenAI text-davinci-001 EQ-Bench Score 15.19 # 22
Emotional Intelligence EQ-Bench lmsys/vicuna-7b-v1.1 EQ-Bench Score 22.24 # 21
Emotional Intelligence EQ-Bench Koala 13B EQ-Bench Score 24.92 # 20
Emotional Intelligence EQ-Bench meta-llama/Llama-2-7b-chat-hf EQ-Bench Score 25.43 # 19
Emotional Intelligence EQ-Bench lmsys/vicuna-13b-v1.1 EQ-Bench Score 32.85 # 18
Emotional Intelligence EQ-Bench meta-llama/Llama-2-13b-chat-hf EQ-Bench Score 33.02 # 17
Emotional Intelligence EQ-Bench lmsys/vicuna-33b-v1.3 EQ-Bench Score 36.52 # 16
Emotional Intelligence EQ-Bench openchat/openchat 3.5 EQ-Bench Score 37.08 # 15
Emotional Intelligence EQ-Bench OpenAI text-davinci-002 EQ-Bench Score 39.44 # 14
Emotional Intelligence EQ-Bench Intel/neural-chat-7b-v3-1 EQ-Bench Score 43.61 # 13
Emotional Intelligence EQ-Bench OpenAI text-davinci-003 EQ-Bench Score 43.73 # 12
Emotional Intelligence EQ-Bench Qwen/Qwen-14B-Chat EQ-Bench Score 43.76 # 11
Emotional Intelligence EQ-Bench Open-Orca/Mistral-7B-OpenOrca EQ-Bench Score 44.40 # 10
Emotional Intelligence EQ-Bench OpenAI gpt-3.5-turbo-0301 EQ-Bench Score 47.61 # 9
Emotional Intelligence EQ-Bench OpenAI gpt-3.5-0613 EQ-Bench Score 49.17 # 8
Emotional Intelligence EQ-Bench 01-ai/Yi-34B-Chat EQ-Bench Score 51.03 # 7
Emotional Intelligence EQ-Bench meta-llama/Llama-2-70b-chat-hf EQ-Bench Score 51.56 # 6
Emotional Intelligence EQ-Bench Anthropic Claude2 EQ-Bench Score 52.14 # 5
Emotional Intelligence EQ-Bench Qwen/Qwen-72B-Chat EQ-Bench Score 52.44 # 4
Emotional Intelligence EQ-Bench migtissera/SynthIA-70B-v1.5 EQ-Bench Score 54.83 # 2
Emotional Intelligence EQ-Bench OpenAI gpt-4-0314 EQ-Bench Score 53.39 # 3

Methods