graph2vec: Learning Distributed Representations of Graphs

Recent works on representation learning for graph structured data predominantly focus on learning distributed representations of graph substructures such as nodes and subgraphs. However, many graph analytics tasks such as graph classification and clustering require representing entire graphs as fixed length feature vectors. While the aforementioned approaches are naturally unequipped to learn such representations, graph kernels remain as the most effective way of obtaining them. However, these graph kernels use handcrafted features (e.g., shortest paths, graphlets, etc.) and hence are hampered by problems such as poor generalization. To address this limitation, in this work, we propose a neural embedding framework named graph2vec to learn data-driven distributed representations of arbitrary sized graphs. graph2vec's embeddings are learnt in an unsupervised manner and are task agnostic. Hence, they could be used for any downstream task such as graph classification, clustering and even seeding supervised representation learning approaches. Our experiments on several benchmark and large real-world datasets show that graph2vec achieves significant improvements in classification and clustering accuracies over substructure representation learning approaches and are competitive with state-of-the-art graph kernels.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Malware Detection Android Malware Dataset Graph2Vec Accuracy 99.03 # 1
Malware Clustering Android Malware Dataset Graph2Vec ARI 56.28 # 1
Graph Classification MUTAG graph2vec Accuracy 83.15% ± 9.25% # 66
Graph Classification NCI1 graph2vec Accuracy 73.22% ± 1.81% # 45
Graph Classification NCI109 Graph2Vec Accuracy 74.26 # 20
Graph Classification PROTEINS graph2vec Accuracy 73.3% ± 2.05% # 75
Graph Classification PTC graph2vec Accuracy 60.17% ± 6.86% # 33


No methods listed for this paper. Add relevant methods here