TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Question Answering	GraphQuestions	ChatGPT	Accuracy	53.1	# 1
Question Answering	KQA Pro	ChatGPT	Accuracy	47.93	# 1
Knowledge Base Question Answering	WebQuestionsSP	ChatGPT	Accuracy	83.7	# 1
Question Answering	WebQuestionsSP	ChatGPT	Accuracy	83.7	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/evaluation-of-chatgpt-as-a-question-answering/question-answering-on-graphquestions)](https://paperswithcode.com/sota/question-answering-on-graphquestions?p=evaluation-of-chatgpt-as-a-question-answering)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/evaluation-of-chatgpt-as-a-question-answering/question-answering-on-kqa-pro)](https://paperswithcode.com/sota/question-answering-on-kqa-pro?p=evaluation-of-chatgpt-as-a-question-answering)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/evaluation-of-chatgpt-as-a-question-answering/knowledge-base-question-answering-on-1)](https://paperswithcode.com/sota/knowledge-base-question-answering-on-1?p=evaluation-of-chatgpt-as-a-question-answering)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/evaluation-of-chatgpt-as-a-question-answering/question-answering-on-webquestionssp)](https://paperswithcode.com/sota/question-answering-on-webquestionssp?p=evaluation-of-chatgpt-as-a-question-answering)`

Can ChatGPT Replace Traditional KBQA Models? An In-depth Analysis of the Question Answering Performance of the GPT LLM Family

14 Mar 2023 · Yiming Tan, Dehai Min, Yu Li, Wenbo Li, Nan Hu, Yongrui Chen, Guilin Qi ·

ChatGPT is a powerful large language model (LLM) that covers knowledge resources such as Wikipedia and supports natural language question answering using its own knowledge. Therefore, there is growing interest in exploring whether ChatGPT can replace traditional knowledge-based question answering (KBQA) models. Although there have been some works analyzing the question answering performance of ChatGPT, there is still a lack of large-scale, comprehensive testing of various types of complex questions to analyze the limitations of the model. In this paper, we present a framework that follows the black-box testing specifications of CheckList proposed by Ribeiro et. al. We evaluate ChatGPT and its family of LLMs on eight real-world KB-based complex question answering datasets, which include six English datasets and two multilingual datasets. The total number of test cases is approximately 190,000. In addition to the GPT family of LLMs, we also evaluate the well-known FLAN-T5 to identify commonalities between the GPT family and other LLMs. The dataset and code are available at https://github.com/tan92hl/Complex-Question-Answering-Evaluation-of-GPT-family.git

PDF Abstract

Code

Add Remove Mark official

tan92hl/complex-question-answering-… official

Tasks

Add Remove

Knowledge Base Question Answering

Language Modelling

Large Language Model

Natural Language Understanding

Question Answering

Semantic Parsing

Datasets

WebQuestions HELM WebQuestionsSP

MKQA GrailQA KQA Pro GraphQuestions

Results from the Paper

Edit

Ranked #1 on Knowledge Base Question Answering on WebQuestionsSP (Accuracy metric)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Question Answering	GraphQuestions	ChatGPT	Accuracy	53.1	# 1	Compare
Question Answering	KQA Pro	ChatGPT	Accuracy	47.93	# 1	Compare
Knowledge Base Question Answering	WebQuestionsSP	ChatGPT	Accuracy	83.7	# 1	Compare
Question Answering	WebQuestionsSP	ChatGPT	Accuracy	83.7	# 1	Compare

Methods

Add Remove

Adam • Attention Dropout • BPE • Cosine Annealing • Dense Connections • Discriminative Fine-Tuning • Dropout • Fixed Factorized Attention • Flan-T5 • GELU • GPT • GPT-3 • Layer Normalization • Linear Layer • Linear Warmup With Cosine Annealing • Multi-Head Attention • Residual Connection • Scaled Dot-Product Attention • Softmax • Strided Attention • Test • Weight Decay

Edit Social Preview

Can ChatGPT Replace Traditional KBQA Models? An In-depth Analysis of the Question Answering Performance of the GPT LLM Family

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove