🔔 Share your dataset with the ML community!

Filter by Modality (clear)

Filter by Task (clear)

Filter by Language

19 dataset results for Recommendation Systems AND Texts

MIcrosoft News Dataset (MIND) is a large-scale dataset for news recommendation research. It was collected from anonymized behavior logs of Microsoft News website. The mission of MIND is to serve as a benchmark dataset for news recommendation and facilitate the research in news recommendation and recommender systems area.

130 PAPERS • 1 BENCHMARK

ReDial

ReDial (Recommendation Dialogues) is an annotated dataset of dialogues, where users recommend movies to each other. The dataset consists of over 10,000 conversations centered around the theme of providing movie recommendations.

91 PAPERS • 2 BENCHMARKS

Douban

Douban (Douban Conversation Corpus)

We release Douban Conversation Corpus, comprising a training data set, a development set and a test set for retrieval based chatbot. The statistics of Douban Conversation Corpus are shown in the following table.

77 PAPERS • 4 BENCHMARKS

MemeTracker

The Memetracker corpus contains articles from mainstream media and blogs from August 1 to October 31, 2008 with about 1 million documents per day. It has 10,967 hyperlink cascades among 600 media sites.

37 PAPERS • NO BENCHMARKS YET

Amazon Product Data

This dataset contains product reviews and metadata from Amazon, including 142.8 million reviews spanning May 1996 - July 2014.

33 PAPERS • 6 BENCHMARKS

DuRecDial

A human-to-human Chinese dialog dataset (about 10k dialogs, 156k utterances), which contains multiple sequential dialogs for every pair of a recommendation seeker (user) and a recommender (bot).

27 PAPERS • NO BENCHMARKS YET

Learning to Rank Challenge (Yahoo! Learning to Rank Challenge)

The Yahoo! Learning to Rank Challenge dataset consists of 709,877 documents encoded in 700 features and sampled from query logs of the Yahoo! search engine, spanning 29,921 queries.

24 PAPERS • NO BENCHMARKS YET

TG-ReDial

TG-ReDial is a a topic-guided conversational recommendation dataset for research on conversational/interactive recommender systems.

23 PAPERS • NO BENCHMARKS YET

MMD (Multimodal Dialogs)

The MMD (MultiModal Dialogs) dataset is a dataset for multimodal domain-aware conversations. It consists of over 150K conversation sessions between shoppers and sales agents, annotated by a group of in-house annotators using a semi-automated manually intense iterative process.

18 PAPERS • NO BENCHMARKS YET

WeChat

The WeChat dataset for fake news detection contains more than 20k news labelled as fake news or not.

7 PAPERS • 1 BENCHMARK

CITE

CITE is a crowd-sourced resource for multimodal discourse: this resource characterises inferences in image-text contexts in the domain of cooking recipes in the form of coherence relations.

6 PAPERS • 1 BENCHMARK

Coached Conversational Preference Elicitation

Coached Conversational Preference Elicitation is a dataset consisting of 502 English dialogs with 12,000 annotated utterances between a user and an assistant discussing movie preferences in natural language. It was collected using a Wizard-of-Oz methodology between two paid crowd-workers, where one worker plays the role of an 'assistant', while the other plays the role of a 'user'.

5 PAPERS • NO BENCHMARKS YET

Tripadvisor Restaurant Reviews

Dataset of restaurant reviews from TripAdvisor that includes images and texts uploaded in reviews by users. Reviews in six different cities are included: Gijón (Spain), Barcelona (Spain), Madrid (Spain), New York City (USA), Paris (France) and London (United Kingdom). In the original publication, the following task is proposed: Can we explain, using the existing image or text from a different user, why a given restaurant was recommended to a certain user?

3 PAPERS • 6 BENCHMARKS

Wikidata-14M

Wikidata-14M is a recommender system dataset for recommending items to Wikidata editors. It consists of 220,000 editors responsible for 14 million interactions with 4 million items.

2 PAPERS • NO BENCHMARKS YET

xMIND

xMIND (A Multilingual Dataset for Cross-lingual News Recommendation)

xMIND is an open, large-scale multilingual news dataset for multi- and cross-lingual news recommendation. xMIND is derived from the English MIND dataset using open-source neural machine translation (i.e., NLLB 3.3B).

2 PAPERS • NO BENCHMARKS YET

E-ReDial (Explainable Recommendation Dialogues)

E-ReDial is a conversational recommender system dataset with high-quality explanations. It consists of 756 dialogues with 12,003 utterances, each with 15.9 turns on average. 2,058 high-quality explanations are included, each with 79.2 tokens on average.

1 PAPER • NO BENCHMARKS YET

Google Local review

Google Local review (Google Local Data)

Description This Dataset contains review information on Google map (ratings, text, images, etc.), business metadata (address, geographical info, descriptions, category information, price, open hours, and MISC info), and links (relative businesses) up to Sep 2021 in the United States.

1 PAPER • NO BENCHMARKS YET

MerRec (MerRec Recommendation Dataset)

A large scale, C2C marketplace e-commerce dataset.

1 PAPER • NO BENCHMARKS YET

X-Wines (A Wine Dataset for Recommender Systems and Machine Learning)

X-Wines is a consistent wine dataset containing 100,646 instances and 21 million real evaluations carried out by users. Data were collected on the open Web in 2022 and pre-processed for wider free use. They refer to the scale 1–5 ratings carried out over a period of 10 years (2012–2021) for wines produced in 62 different countries.

0 PAPER • NO BENCHMARKS YET

Datasets

19 dataset results for Recommendation Systems AND Texts