EQ-Bench: An Emotional Intelligence Benchmark for Large Language Models

11 Dec 2023  ยท  Samuel J. Paech ยท

We introduce EQ-Bench, a novel benchmark designed to evaluate aspects of emotional intelligence in Large Language Models (LLMs). We assess the ability of LLMs to understand complex emotions and social interactions by asking them to predict the intensity of emotional states of characters in a dialogue. The benchmark is able to discriminate effectively between a wide range of models. We find that EQ-Bench correlates strongly with comprehensive multi-domain benchmarks like MMLU (Hendrycks et al., 2020) (r=0.97), indicating that we may be capturing similar aspects of broad intelligence. Our benchmark produces highly repeatable results using a set of 60 English-language questions. We also provide open-source code for an automated benchmarking pipeline at https://github.com/EQ-bench/EQ-Bench and a leaderboard at https://eqbench.com

PDF Abstract

Datasets


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Emotional Intelligence Emotional Intelligence OpenAI gpt-4-0613 EQ-Bench Score 62.52 # 1
Emotional Intelligence Emotional Intelligence OpenAI ADA EQ-Bench Score 2.25 # 23
EQ-Bench Score 2.25 # 23
Emotional Intelligence Emotional Intelligence OpenAI text-davinci-001 EQ-Bench Score 15.19 # 22
Emotional Intelligence Emotional Intelligence lmsys/vicuna-7b-v1.1 EQ-Bench Score 22.24 # 21
Emotional Intelligence Emotional Intelligence Koala 13B EQ-Bench Score 24.92 # 20
Emotional Intelligence Emotional Intelligence meta-llama/Llama-2-7b-chat-hf EQ-Bench Score 25.43 # 19
Emotional Intelligence Emotional Intelligence lmsys/vicuna-13b-v1.1 EQ-Bench Score 32.85 # 18
Emotional Intelligence Emotional Intelligence meta-llama/Llama-2-13b-chat-hf EQ-Bench Score 33.02 # 17
Emotional Intelligence Emotional Intelligence lmsys/vicuna-33b-v1.3 EQ-Bench Score 36.52 # 16
Emotional Intelligence Emotional Intelligence openchat/openchat 3.5 EQ-Bench Score 37.08 # 15
Emotional Intelligence Emotional Intelligence OpenAI text-davinci-002 EQ-Bench Score 39.44 # 14
Emotional Intelligence Emotional Intelligence Intel/neural-chat-7b-v3-1 EQ-Bench Score 43.61 # 13
Emotional Intelligence Emotional Intelligence OpenAI text-davinci-003 EQ-Bench Score 43.73 # 12
Emotional Intelligence Emotional Intelligence Qwen/Qwen-14B-Chat EQ-Bench Score 43.76 # 11
Emotional Intelligence Emotional Intelligence Open-Orca/Mistral-7B-OpenOrca EQ-Bench Score 44.40 # 10
Emotional Intelligence Emotional Intelligence OpenAI gpt-3.5-turbo-0301 EQ-Bench Score 47.61 # 9
Emotional Intelligence Emotional Intelligence OpenAI gpt-3.5-0613 EQ-Bench Score 49.17 # 8
Emotional Intelligence Emotional Intelligence 01-ai/Yi-34B-Chat EQ-Bench Score 51.03 # 7
Emotional Intelligence Emotional Intelligence meta-llama/Llama-2-70b-chat-hf EQ-Bench Score 51.56 # 6
Emotional Intelligence Emotional Intelligence Anthropic Claude2 EQ-Bench Score 52.14 # 5
Emotional Intelligence Emotional Intelligence Qwen/Qwen-72B-Chat EQ-Bench Score 52.44 # 4
Emotional Intelligence Emotional Intelligence migtissera/SynthIA-70B-v1.5 EQ-Bench Score 54.83 # 2
Emotional Intelligence Emotional Intelligence OpenAI gpt-4-0314 EQ-Bench Score 53.39 # 3

Methods


No methods listed for this paper. Add relevant methods here