A large-scale English paraphrase dataset that surpasses prior work in both quantity and quality.
11 PAPERS • NO BENCHMARKS YET
A human-curated ChineseReading Comprehension dataset on Opinion. The questions in ReCO are opinion based queries issued to the commercial search engine. The passages are provided by the crowdworkers who extract the support snippet from the retrieved documents.
6 PAPERS • NO BENCHMARKS YET
A maintained database tracks ICLR submissions and reviews, augmented with author profiles and higher-level textual features.
1 PAPER • NO BENCHMARKS YET
NAIST COVID is a multilingual dataset of social media posts related to COVID-19, consisting of microblogs in English and Japanese from Twitter and those in Chinese from Weibo. The data cover microblogs from January 20, 2020, to March 24, 2020.
The NTPairs dataset consists of the pairs of news articles and their corresponding tweets that were published by eight media outlets in 2018. The eight outlets were selected to consider diverse outlets, which employ a different editing style for news sharing, in terms of publishing channels and political leaning.