Speech

Spoken-SQuAD

Introduced by Li et al. in Spoken SQuAD: A Study of Mitigating the Impact of Speech Recognition Errors on Listening Comprehension

In SpokenSQuAD, the document is in spoken form, the input question is in the form of text and the answer to each question is always a span in the document. The following procedures were used to generate spoken documents from the original SQuAD dataset. First, the Google text-to-speech system was used to generate the spoken version of the articles in SQuAD. Then CMU Sphinx was sued to generate the corresponding ASR transcriptions. The SQuAD training set was used to generate the training set of Spoken SQuAD, and SQuAD development set was used to generate the testing set for Spoken SQuAD. If the answer of a question did not exist in the ASR transcriptions of the associated article, the question-answer pair was removed from the dataset because these examples are too difficult for listening comprehension machine at this stage.

Source: https://github.com/chiahsuan156/Spoken-SQuAD

Homepage

Benchmarks

Add a new result Link an existing benchmark

Trend	Task	Dataset Variant	Best Model	Paper	Code
	Spoken Language Understanding	Spoken-SQuAD	ALBERT

Papers

Paper	Code	Results	Date	Stars

Timers and Such

ASR-GLUE

ODSQA

Source: https://github.com/chiahsuan156/Spoken-SQuAD.

Usage

License

Unknown

Modalities

Speech

Spoken-SQuAD

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit

Similar Datasets

VlogQA

Timers and Such

ASR-GLUE

ODSQA

Usage

License Edit

Modalities Edit

Languages Edit

Benchmarks

Add a new result Link an existing benchmark

Dataset Loaders

Add Remove

Tasks

License

Modalities

Languages