Decoding Stumpers: Large Language Models vs. Human Problem-Solvers

25 Oct 2023 · Alon Goldstein, Miriam Havin, Roi Reichart, Ariel Goldstein ·

This paper investigates the problem-solving capabilities of Large Language Models (LLMs) by evaluating their performance on stumpers, unique single-step intuition problems that pose challenges for human solvers but are easily verifiable. We compare the performance of four state-of-the-art LLMs (Davinci-2, Davinci-3, GPT-3.5-Turbo, GPT-4) to human participants. Our findings reveal that the new-generation LLMs excel in solving stumpers and surpass human performance. However, humans exhibit superior skills in verifying solutions to the same problems. This research enhances our understanding of LLMs' cognitive abilities and provides insights for enhancing their problem-solving potential across various domains.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Datasets

Add Datasets introduced or used in this paper

Results from the Paper

Add Remove

Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods

Add Remove

Adam • Attention Dropout • BPE • Cosine Annealing • Dense Connections • Dropout • Fixed Factorized Attention • GELU • GPT-3 • Layer Normalization • Linear Layer • Linear Warmup With Cosine Annealing • Multi-Head Attention • Residual Connection • Scaled Dot-Product Attention • Softmax • Strided Attention • Weight Decay

Edit Social Preview

Decoding Stumpers: Large Language Models vs. Human Problem-Solvers

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove