MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering

27 Mar 2022  ·  Ankit Pal, Logesh Kumar Umapathi, Malaikannan Sankarasubbu ·

This paper introduces MedMCQA, a new large-scale, Multiple-Choice Question Answering (MCQA) dataset designed to address real-world medical entrance exam questions. More than 194k high-quality AIIMS \& NEET PG entrance exam MCQs covering 2.4k healthcare topics and 21 medical subjects are collected with an average token length of 12.77 and high topical diversity. Each sample contains a question, correct answer(s), and other options which requires a deeper language understanding as it tests the 10+ reasoning abilities of a model across a wide range of medical subjects \& topics. A detailed explanation of the solution, along with the above information, is provided in this study.

Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Multiple Choice Question Answering (MCQA) MedMCQA PubmedBERT(Gu et al., 2022) Dev Set (Acc-%) 0.40 # 11
Test Set (Acc-%) 0.41 # 7
Multiple Choice Question Answering (MCQA) MedMCQA BERT (Devlin et al., 2019)-Base Dev Set (Acc-%) 0.35 # 14
Test Set (Acc-%) 0.33 # 10
Multiple Choice Question Answering (MCQA) MedMCQA BioBERT (Lee et al.,2020) Dev Set (Acc-%) 0.38 # 13
Test Set (Acc-%) 0.37 # 9
Multiple Choice Question Answering (MCQA) MedMCQA SciBERT (Beltagy et al., 2019) Dev Set (Acc-%) 0.39 # 12
Test Set (Acc-%) 0.39 # 8


