TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Recommendation Systems	MovieLens 10M	SGD MF	RMSE	0.772	# 7
Recommendation Systems	MovieLens 10M	Bayesian SVD++	RMSE	0.7563	# 3
Recommendation Systems	MovieLens 10M	U-RBM	RMSE	0.823	# 13
Recommendation Systems	MovieLens 10M	Bayesian timeSVD++ flipped	RMSE	0.7485	# 1
Recommendation Systems	MovieLens 10M	Bayesian timeSVD++	RMSE	0.7523	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/on-the-difficulty-of-evaluating-baselines-a/collaborative-filtering-on-movielens-10m)](https://paperswithcode.com/sota/collaborative-filtering-on-movielens-10m?p=on-the-difficulty-of-evaluating-baselines-a)`

On the Difficulty of Evaluating Baselines: A Study on Recommender Systems

4 May 2019 · Steffen Rendle, Li Zhang, Yehuda Koren ·

Numerical evaluations with comparisons to baselines play a central role when judging research in recommender systems. In this paper, we show that running baselines properly is difficult. We demonstrate this issue on two extensively studied datasets. First, we show that results for baselines that have been used in numerous publications over the past five years for the Movielens 10M benchmark are suboptimal. With a careful setup of a vanilla matrix factorization baseline, we are not only able to improve upon the reported results for this baseline but even outperform the reported results of any newly proposed method. Secondly, we recap the tremendous effort that was required by the community to obtain high quality results for simple methods on the Netflix Prize. Our results indicate that empirical findings in research papers are questionable unless they were obtained on standardized benchmarks where baselines have been tuned extensively by the research community.

PDF Abstract

Code

Add Remove Mark official

srendle/libfm

1,481

tohtsky/myFM

Tasks

Add Remove

Collaborative Filtering

Recommendation Systems

Datasets

MovieLens

Netflix Prize

Results from the Paper

Edit

Ranked #1 on Recommendation Systems on MovieLens 10M

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Recommendation Systems	MovieLens 10M	SGD MF	RMSE	0.772	# 7	Compare
Recommendation Systems	MovieLens 10M	Bayesian SVD++	RMSE	0.7563	# 3	Compare
Recommendation Systems	MovieLens 10M	U-RBM	RMSE	0.823	# 13	Compare
Recommendation Systems	MovieLens 10M	Bayesian timeSVD++ flipped	RMSE	0.7485	# 1	Compare
Recommendation Systems	MovieLens 10M	Bayesian timeSVD++	RMSE	0.7523	# 2	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

On the Difficulty of Evaluating Baselines: A Study on Recommender Systems

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove