TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Question Answering	CheGeKa	RuGPT-3 Small	Accuracy	00	# 2
Question Answering	CheGeKa	Human benchmark	Accuracy	64.5	# 1
Question Answering	CheGeKa	RuGPT-3 Large	Accuracy	00	# 2
Question Answering	CheGeKa	RuGPT-3 Medium	Accuracy	00	# 2
Ethics	Ethics	RuGPT-3 Small	Accuracy	55.5	# 3
Ethics	Ethics	RuGPT-3 Meduim	Accuracy	68.3	# 2
Ethics	Ethics	RuGPT-3 Large	Accuracy	68.6	# 1
Ethics	Ethics	Human benchmark	Accuracy	52.9	# 4
Ethics	Ethics (per ethics)	Human benchmark	Accuracy	67.6	# 1
Ethics	Ethics (per ethics)	RuGPT-3 Small	Accuracy	60.9	# 2
Ethics	Ethics (per ethics)	RuGPT-3 Medium	Accuracy	44.1	# 4
Ethics	Ethics (per ethics)	RuGPT-3 Large	Accuracy	44.9	# 3
Question Answering	MultiQ	Human benchmark	Accuracy	91.0	# 1
Question Answering	MultiQ	RuGPT-3 Small	Accuracy	00	# 2
Question Answering	MultiQ	RuGPT-3 Medium	Accuracy	00	# 2
Question Answering	MultiQ	RuGPT-3 Large	Accuracy	00	# 2
Question Answering	RuOpenBookQA	RuGPT-3 Small	Accuracy	57.9	# 2
Question Answering	RuOpenBookQA	Human benchmark	Accuracy	86.5	# 1
Question Answering	RuOpenBookQA	RuGPT-3 Large	Accuracy	55.5	# 4
Question Answering	RuOpenBookQA	RuGPT-3 Medium	Accuracy	57.2	# 3
Logical Reasoning	RuWorldTree	Human benchmark	Accuracy	83.7	# 1
Logical Reasoning	RuWorldTree	RuGPT-3 Large	Accuracy	40.7	# 2
Logical Reasoning	RuWorldTree	RuGPT-3 Small	Accuracy	34.0	# 4
Logical Reasoning	RuWorldTree	RuGPT-3 Medium	Accuracy	38.0	# 3
Logical Reasoning	Winograd Automatic	Human benchmark	Accuracy	87.0	# 1
Logical Reasoning	Winograd Automatic	RuGPT-3 Small	Accuracy	57.9	# 2
Logical Reasoning	Winograd Automatic	RuGPT-3 Medium	Accuracy	57.2	# 3
Logical Reasoning	Winograd Automatic	RuGPT-3 Large	Accuracy	55.5	# 4

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tape-assessing-few-shot-russian-language/question-answering-on-chegeka)](https://paperswithcode.com/sota/question-answering-on-chegeka?p=tape-assessing-few-shot-russian-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tape-assessing-few-shot-russian-language/ethics-on-ethics)](https://paperswithcode.com/sota/ethics-on-ethics?p=tape-assessing-few-shot-russian-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tape-assessing-few-shot-russian-language/ethics-on-ethics-2)](https://paperswithcode.com/sota/ethics-on-ethics-2?p=tape-assessing-few-shot-russian-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tape-assessing-few-shot-russian-language/question-answering-on-multiq)](https://paperswithcode.com/sota/question-answering-on-multiq?p=tape-assessing-few-shot-russian-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tape-assessing-few-shot-russian-language/question-answering-on-ruopenbookqa)](https://paperswithcode.com/sota/question-answering-on-ruopenbookqa?p=tape-assessing-few-shot-russian-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tape-assessing-few-shot-russian-language/logical-reasoning-on-ruworldtree)](https://paperswithcode.com/sota/logical-reasoning-on-ruworldtree?p=tape-assessing-few-shot-russian-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tape-assessing-few-shot-russian-language/logical-reasoning-on-winograd-automatic)](https://paperswithcode.com/sota/logical-reasoning-on-winograd-automatic?p=tape-assessing-few-shot-russian-language)`

TAPE: Assessing Few-shot Russian Language Understanding

23 Oct 2022 · Ekaterina Taktasheva, Tatiana Shavrina, Alena Fenogenova, Denis Shevelev, Nadezhda Katricheva, Maria Tikhonova, Albina Akhmetgareeva, Oleg Zinkevich, Anastasiia Bashmakova, Svetlana Iordanskaia, Alena Spiridonova, Valentina Kurenshchikova, Ekaterina Artemova, Vladislav Mikhailov ·

Recent advances in zero-shot and few-shot learning have shown promise for a scope of research and practical purposes. However, this fast-growing area lacks standardized evaluation suites for non-English languages, hindering progress outside the Anglo-centric paradigm. To address this line of research, we propose TAPE (Text Attack and Perturbation Evaluation), a novel benchmark that includes six more complex NLU tasks for Russian, covering multi-hop reasoning, ethical concepts, logic and commonsense knowledge. The TAPE's design focuses on systematic zero-shot and few-shot NLU evaluation: (i) linguistic-oriented adversarial attacks and perturbations for analyzing robustness, and (ii) subpopulations for nuanced interpretation. The detailed analysis of testing the autoregressive baselines indicates that simple spelling-based perturbations affect the performance the most, while paraphrasing the input has a more negligible effect. At the same time, the results demonstrate a significant gap between the neural and human baselines for most tasks. We publicly release TAPE (tape-benchmark.com) to foster research on robust LMs that can generalize to new tasks when little to no supervision is available.

PDF Abstract

Code

Add Remove Mark official

RussianNLP/TAPE official

Tasks

Add Remove

Adversarial Attack

Adversarial Text

Ethics

Few-Shot Learning

Logical Reasoning

Question Answering

Zero-Shot Learning

Datasets

Introduced in the Paper:

RuOpenBookQA CheGeKa RuWorldTree Ethics (per ethics) MultiQ Winograd Automatic

Used in the Paper:

GLUE

SuperGLUE

OpenBookQA

WSC ETHICS MuSeRC

Results from the Paper

Add Remove

Ranked #1 on Ethics on Ethics (per ethics)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Question Answering	CheGeKa	RuGPT-3 Small	Accuracy	00	# 2	Compare
Question Answering	CheGeKa	Human benchmark	Accuracy	64.5	# 1	Compare
Question Answering	CheGeKa	RuGPT-3 Large	Accuracy	00	# 2	Compare
Question Answering	CheGeKa	RuGPT-3 Medium	Accuracy	00	# 2	Compare
Ethics	Ethics	RuGPT-3 Small	Accuracy	55.5	# 3	Compare
Ethics	Ethics	RuGPT-3 Meduim	Accuracy	68.3	# 2	Compare
Ethics	Ethics	RuGPT-3 Large	Accuracy	68.6	# 1	Compare
Ethics	Ethics	Human benchmark	Accuracy	52.9	# 4	Compare
Ethics	Ethics (per ethics)	Human benchmark	Accuracy	67.6	# 1	Compare
Ethics	Ethics (per ethics)	RuGPT-3 Small	Accuracy	60.9	# 2	Compare
Ethics	Ethics (per ethics)	RuGPT-3 Medium	Accuracy	44.1	# 4	Compare
Ethics	Ethics (per ethics)	RuGPT-3 Large	Accuracy	44.9	# 3	Compare
Question Answering	MultiQ	Human benchmark	Accuracy	91.0	# 1	Compare
Question Answering	MultiQ	RuGPT-3 Small	Accuracy	00	# 2	Compare
Question Answering	MultiQ	RuGPT-3 Medium	Accuracy	00	# 2	Compare
Question Answering	MultiQ	RuGPT-3 Large	Accuracy	00	# 2	Compare
Question Answering	RuOpenBookQA	RuGPT-3 Small	Accuracy	57.9	# 2	Compare
Question Answering	RuOpenBookQA	Human benchmark	Accuracy	86.5	# 1	Compare
Question Answering	RuOpenBookQA	RuGPT-3 Large	Accuracy	55.5	# 4	Compare
Question Answering	RuOpenBookQA	RuGPT-3 Medium	Accuracy	57.2	# 3	Compare
Logical Reasoning	RuWorldTree	Human benchmark	Accuracy	83.7	# 1	Compare
Logical Reasoning	RuWorldTree	RuGPT-3 Large	Accuracy	40.7	# 2	Compare
Logical Reasoning	RuWorldTree	RuGPT-3 Small	Accuracy	34.0	# 4	Compare
Logical Reasoning	RuWorldTree	RuGPT-3 Medium	Accuracy	38.0	# 3	Compare
Logical Reasoning	Winograd Automatic	Human benchmark	Accuracy	87.0	# 1	Compare
Logical Reasoning	Winograd Automatic	RuGPT-3 Small	Accuracy	57.9	# 2	Compare
Logical Reasoning	Winograd Automatic	RuGPT-3 Medium	Accuracy	57.2	# 3	Compare
Logical Reasoning	Winograd Automatic	RuGPT-3 Large	Accuracy	55.5	# 4	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

TAPE: Assessing Few-shot Russian Language Understanding

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove