We provide the BCOPA-CE test set, which has balanced token distribution in the correct and wrong alternatives and increases the difficulty of being aware of cause and effect.
construction
- for each premise of the 500 samples in COPA-test set, we generate one event manually which is a plausible answer to the opposite question type of the original sample.
- obtain 500 triplets of <premise, cause, effect>
- construct 1000 samples by giving two different questions (cause or effect) to each triplet.