SST (Stanford Sentiment Treebank)

Introduced by Socher et al. in Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

The Stanford Sentiment Treebank is a corpus with fully labeled parse trees that allows for a complete analysis of the compositional effects of sentiment in language. The corpus is based on the dataset introduced by Pang and Lee (2005) and consists of 11,855 single sentences extracted from movie reviews. It was parsed with the Stanford parser and includes a total of 215,154 unique phrases from those parse trees, each annotated by 3 human judges.

Each phrase is labelled as either negative, somewhat negative, neutral, somewhat positive or positive. The corpus with all 5 labels is referred to as SST-5 or SST fine-grained. Binary classification experiments on full sentences (negative or somewhat negative vs somewhat positive or positive with neutral sentences discarded) refer to the dataset as SST-2 or SST binary.

Homepage

Benchmarks

Add a new result Link an existing benchmark

Task	Dataset Variant	Best Model
Sentiment Analysis	SST-2 Binary classification	T5-11B
Sentiment Analysis	SST-5 Fine-grained classification	Heinsen Routing + RoBERTa Large
Text Classification	SST-2	DeBERTa
Text Classification	SST2	distilbert-base-uncased-finetuned-sst-2-english
Few-Shot Text Classification	SST-5	SetFit + OCD
Explanation Fidelity Evaluation	SST2	GCN
Explanation Fidelity Evaluation	SST-5	GCN
Out-of-Distribution Detection	SST	2-Layered GRU
Few-Shot Learning	SST-2 Binary classification	DART