UPFD Dataset | Papers With Code

Name:*

Full name (optional):

Description (Markdown and $\LaTeX$ enabled):*

For benchmarking, please refer to its variant [UPFD-POL](https://paperswithcode.com/dataset/upfd-pol) and [UPFD-GOS](https://paperswithcode.com/dataset/upfd-gos).

The dataset has been integrated with [Pytorch Geometric](https://github.com/rusty1s/pytorch_geometric/blob/master/examples/upfd.py) (PyG) and [Deep Graph Library](https://github.com/dmlc/dgl/blob/master/python/dgl/data/fakenews.py) (DGL). You can load the dataset after installing the latest versions of PyG or DGL.

The UPFD dataset includes two sets of tree-structured graphs curated for evaluating binary graph classification, graph anomaly detection, and fake/real news detection tasks. The dataset is dumped in the form of [Pytorch-Geometric](https://github.com/rusty1s/pytorch_geometric) dataset object. You can easily load the data and run various GNN models using PyG.

The dataset includes fake&real news propagation (retweet) networks on Twitter built according to fact-check information from [Politifact](https://www.politifact.com/) and [Gossipcop](https://www.gossipcop.com/).
The news retweet graphs were originally extracted by [FakeNewsNet](https://github.com/KaiDMML/FakeNewsNet).
Each graph is a hierarchical tree-structured graph where the root node represents the news; the leaf nodes are Twitter users who retweeted the root news.
A user node has an edge to the news node if he/she retweeted the news tweet. Two user nodes have an edge if one user retweeted the news tweet from the other user.

We crawled near 20 million historical tweets from users who participated in fake news propagation in FakeNewsNet to generate node features in the dataset.
We incorporate four node feature types in the dataset, the 768-dimensional `bert` and 300-dimensional `spacy` features 
are encoded using pretrained [BERT](https://github.com/hanxiao/bert-as-service) and [spaCy](https://spacy.io/models/en#en_core_web_lg) word2vec, respectively.
The 10-dimensional `profile` feature is obtained from a Twitter account's profile.
You can refer to [profile_feature.py](https://github.com/safe-graph/GNN-FakeNews/blob/master/utils/profile_feature.py) for profile feature extraction.
The 310-dimensional `content` feature is composed of a 300-dimensional user comment word2vec (spaCy) embedding plus a 10-dimensional profile feature.

The dataset statistics is shown below:

| Data  | #Graphs  | #Fake News| #Total Nodes  | #Total Edges  | #Avg. Nodes per Graph  |
|-------|--------|--------|--------|--------|--------|
| Politifact | 314   |   157    |  41,054  | 40,740 |  131 |
| Gossipcop |  5464  |   2732   |  314,262  | 308,798  |  58  |

Please refer to the [paper](https://arxiv.org/pdf/2104.12259.pdf) for more details about the UPFD dataset.

Due to the Twitter policy, we could not release the crawled user's historical tweets publicly.
To get the corresponding Twitter user information, you can refer to the news lists under `\data` in our [github repo](https://github.com/safe-graph/GNN-FakeNews)
and map the news id to [FakeNewsNet](https://github.com/KaiDMML/FakeNewsNet).
Then, you can crawl the user information by following the instruction on FakeNewsNet.
In the UPFD project, we use [Tweepy](https://www.tweepy.org/) and [Twitter Developer API](https://developer.twitter.com/en) to get the user information.

Homepage URL (optional):

Paper where the dataset was introduced:

Introduction date:

Dataset license:

URL to full license terms:

Image

Currently

datasets/overview_VSgAeDo.png Clear

Change

---

Trend	Task	Dataset Variant	Best Model	Paper	Code
	Graph Classification	UPFD-POL	HGFND
	Graph Classification	UPFD-GOS	UPFD-SAGE

UPFD (User Preference-aware Fake News Detection)

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Similar Datasets

Project CodeNet

UPFD-GOS

UPFD-POL

Usage

License

Modalities

Languages

Data	#Graphs	#Fake News	#Total Nodes	#Total Edges	#Avg. Nodes per Graph
Politifact	314	157	41,054	40,740	131
Gossipcop	5464	2732	314,262	308,798	58

UPFD (User Preference-aware Fake News Detection)

Benchmarks Edit Add a new result Link an existing benchmark