Yelp Review Rating Prediction: Machine Learning and Deep Learning Models

12 Dec 2020  ·  Zefang Liu ·

We predict restaurant ratings from Yelp reviews based on Yelp Open Dataset. Data distribution is presented, and one balanced training dataset is built. Two vectorizers are experimented for feature engineering. Four machine learning models including Naive Bayes, Logistic Regression, Random Forest, and Linear Support Vector Machine are implemented. Four transformer-based models containing BERT, DistilBERT, RoBERTa, and XLNet are also applied. Accuracy, weighted F1 score, and confusion matrix are used for model evaluation. XLNet achieves 70% accuracy for 5-star classification compared with Logistic Regression with 64% accuracy.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods