TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Sentence Completion	HellaSwag	DeBERTa-Large 304M (classification-based)	Accuracy	95.6	# 2
Sentence Completion	HellaSwag	DeBERTa-Large 304M	Accuracy	94.7	# 5
Question Answering	PIQA	DeBERTa-Large 304M (classification-based)	Accuracy	85.9	# 5
Question Answering	PIQA	DeBERTa-Large 304M	Accuracy	87.4	# 3
Question Answering	SIQA	DeBERTa-Large 304M (classification-based)	Accuracy	79.9	# 5
Question Answering	SIQA	DeBERTa-Large 304M	Accuracy	80.2	# 4

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/two-is-better-than-many-binary-classification/sentence-completion-on-hellaswag)](https://paperswithcode.com/sota/sentence-completion-on-hellaswag?p=two-is-better-than-many-binary-classification)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/two-is-better-than-many-binary-classification/question-answering-on-piqa)](https://paperswithcode.com/sota/question-answering-on-piqa?p=two-is-better-than-many-binary-classification)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/two-is-better-than-many-binary-classification/question-answering-on-social-iqa)](https://paperswithcode.com/sota/question-answering-on-social-iqa?p=two-is-better-than-many-binary-classification)`

Two is Better than Many? Binary Classification as an Effective Approach to Multi-Choice Question Answering

29 Oct 2022 · Deepanway Ghosal, Navonil Majumder, Rada Mihalcea, Soujanya Poria ·

We propose a simple refactoring of multi-choice question answering (MCQA) tasks as a series of binary classifications. The MCQA task is generally performed by scoring each (question, answer) pair normalized over all the pairs, and then selecting the answer from the pair that yield the highest score. For n answer choices, this is equivalent to an n-class classification setup where only one class (true answer) is correct. We instead show that classifying (question, true answer) as positive instances and (question, false answer) as negative instances is significantly more effective across various models and datasets. We show the efficacy of our proposed approach in different tasks -- abductive reasoning, commonsense question answering, science question answering, and sentence completion. Our DeBERTa binary classification model reaches the top or close to the top performance on public leaderboards for these tasks. The source code of the proposed approach is available at https://github.com/declare-lab/TEAM.

PDF Abstract

Code

Add Remove Mark official

declare-lab/team official

Tasks

Add Remove

Binary Classification

Question Answering

Science Question Answering

Sentence

Sentence Completion

Datasets

HellaSwag

PIQA

CommonsenseQA

SWAG

QASC

CosmosQA

SIQA

CICERO

CICEROv2

Results from the Paper

Edit

Ranked #2 on Sentence Completion on HellaSwag

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Sentence Completion	HellaSwag	DeBERTa-Large 304M (classification-based)	Accuracy	95.6	# 2	Compare
Sentence Completion	HellaSwag	DeBERTa-Large 304M	Accuracy	94.7	# 5	Compare
Question Answering	PIQA	DeBERTa-Large 304M (classification-based)	Accuracy	85.9	# 5	Compare
Question Answering	PIQA	DeBERTa-Large 304M	Accuracy	87.4	# 3	Compare
Question Answering	SIQA	DeBERTa-Large 304M (classification-based)	Accuracy	79.9	# 5	Compare
Question Answering	SIQA	DeBERTa-Large 304M	Accuracy	80.2	# 4	Compare

Methods

Add Remove

DeBERTa • Disentangled Attention Mechanism

Edit Social Preview

Two is Better than Many? Binary Classification as an Effective Approach to Multi-Choice Question Answering

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove