TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Machine Reading Comprehension	UQuAD	BERT	Exact Match	66%	# 1
Machine Reading Comprehension	UQuAD	XLM-RoBERTa	Exact Match	.36	# 2
Machine Reading Comprehension	UQuAD	XLM-RoBERTa	F1	66%	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/uquad1-0-development-of-an-urdu-question/machine-reading-comprehension-on-uquad)](https://paperswithcode.com/sota/machine-reading-comprehension-on-uquad?p=uquad1-0-development-of-an-urdu-question)`

UQuAD1.0: Development of an Urdu Question Answering Training Data for Machine Reading Comprehension

2 Nov 2021 · Samreen Kazi, Shakeel Khoja ·

In recent years, low-resource Machine Reading Comprehension (MRC) has made significant progress, with models getting remarkable performance on various language datasets. However, none of these models have been customized for the Urdu language. This work explores the semi-automated creation of the Urdu Question Answering Dataset (UQuAD1.0) by combining machine-translated SQuAD with human-generated samples derived from Wikipedia articles and Urdu RC worksheets from Cambridge O-level books. UQuAD1.0 is a large-scale Urdu dataset intended for extractive machine reading comprehension tasks consisting of 49k question Answers pairs in question, passage, and answer format. In UQuAD1.0, 45000 pairs of QA were generated by machine translation of the original SQuAD1.0 and approximately 4000 pairs via crowdsourcing. In this study, we used two types of MRC models: rule-based baseline and advanced Transformer-based models. However, we have discovered that the latter outperforms the others; thus, we have decided to concentrate solely on Transformer-based architectures. Using XLMRoBERTa and multi-lingual BERT, we acquire an F1 score of 0.66 and 0.63, respectively.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Machine Reading Comprehension

Machine Translation

Question Answering

Translation

Datasets

Introduced in the Paper:

UQuAD

Used in the Paper:

SQuAD

Results from the Paper

Add Remove

Ranked #1 on Machine Reading Comprehension on UQuAD

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Machine Reading Comprehension	UQuAD	BERT	Exact Match	66%	# 1	Compare
Machine Reading Comprehension	UQuAD	XLM-RoBERTa	Exact Match	.36	# 2	Compare
Machine Reading Comprehension	UQuAD	XLM-RoBERTa	F1	66%	# 1	Compare

Methods

Add Remove

Adam • Attention Dropout • BERT • Dense Connections • Dropout • GELU • Layer Normalization • Linear Layer • Linear Warmup With Linear Decay • Multi-Head Attention • Residual Connection • Scaled Dot-Product Attention • Softmax • Weight Decay • WordPiece

Edit Social Preview

UQuAD1.0: Development of an Urdu Question Answering Training Data for Machine Reading Comprehension

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove