MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering

27 Mar 2022  ·  Ankit Pal, Logesh Kumar Umapathi, Malaikannan Sankarasubbu ·

This paper introduces MedMCQA, a new large-scale, Multiple-Choice Question Answering (MCQA) dataset designed to address real-world medical entrance exam questions. More than 194k high-quality AIIMS \& NEET PG entrance exam MCQs covering 2.4k healthcare topics and 21 medical subjects are collected with an average token length of 12.77 and high topical diversity. Each sample contains a question, correct answer(s), and other options which requires a deeper language understanding as it tests the 10+ reasoning abilities of a model across a wide range of medical subjects \& topics. A detailed explanation of the solution, along with the above information, is provided in this study.

PDF Abstract

Datasets


Introduced in the Paper:

MedMCQA
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Multiple Choice Question Answering (MCQA) MedMCQA PubmedBERT(Gu et al., 2022) Dev Set (Acc-%) 0.40 # 11
Test Set (Acc-%) 0.41 # 7
Multiple Choice Question Answering (MCQA) MedMCQA BERT (Devlin et al., 2019)-Base Dev Set (Acc-%) 0.35 # 14
Test Set (Acc-%) 0.33 # 10
Multiple Choice Question Answering (MCQA) MedMCQA BioBERT (Lee et al.,2020) Dev Set (Acc-%) 0.38 # 13
Test Set (Acc-%) 0.37 # 9
Multiple Choice Question Answering (MCQA) MedMCQA SciBERT (Beltagy et al., 2019) Dev Set (Acc-%) 0.39 # 12
Test Set (Acc-%) 0.39 # 8

Methods


No methods listed for this paper. Add relevant methods here