SciGen is a challenge dataset for the task of reasoning-aware data-to-text generation consisting of tables from scientific articles and their corresponding descriptions. The unique properties of SciGen are that (1) tables mostly contain numerical values, and (2) the corresponding descriptions require arithmetic reasoning. SciGen is therefore the first dataset that assesses the arithmetic reasoning capabilities of generation models on complex input structures, i.e., tables from scientific articles. SciGen opens new avenues for future research in reasoning-aware text generation and evaluation.
The dataset consists of 1.3K pairs of tables with their descriptions, with an average of 53 cells in each table.