A set of approximately 100K podcast episodes comprised of raw audio files along with accompanying ASR transcripts. This represents over 47,000 hours of transcribed audio, and is an order of magnitude larger than previous speech-to-text corpora.
3 PAPERS • NO BENCHMARKS YET
Uses a platform with 77 candies and sweets to rank. Over 2000 users submitted over 44000 grades resulting in a matrix with 28% coverage.
2 PAPERS • NO BENCHMARKS YET
TG-ReDial is a a topic-guided conversational recommendation dataset for research on conversational/interactive recommender systems.
23 PAPERS • NO BENCHMARKS YET
We ran 21 recommender systems on three datasets (BeerAdvocate, LibraryThing and MovieLens 1M). The output of these recommenders was evaluated using rec_eval tool. We also measured statistically significant improvements using permutation test. The output of both tools can be found in data.
The WeChat dataset for fake news detection contains more than 20k news labelled as fake news or not.
7 PAPERS • 1 BENCHMARK
The Yelp Dataset is a valuable resource for academic research, teaching, and learning. It provides a rich collection of real-world data related to businesses, reviews, and user interactions. Here are the key details about the Yelp Dataset: Reviews: A whopping 6,990,280 reviews from users. Businesses: Information on 150,346 businesses. Pictures: A collection of 200,100 pictures. Metropolitan Areas: Data from 11 metropolitan areas. Tips: Over 908,915 tips provided by 1,987,897 users. Business Attributes: Details like hours, parking availability, and ambiance for more than 1.2 million businesses. Aggregated Check-ins: Historical check-in data for each of the 131,930 businesses.
71 PAPERS • 22 BENCHMARKS