How can we enhance the node features acquired from Pretrained Models (PMs) to better suit downstream graph learning tasks?
Safe deployment of graph neural networks (GNNs) under distribution shift requires models to provide accurate confidence indicators (CI).
Based on these insights, we propose a new model, Interpretable Graph Sparsification (IGS), which enhances graph classification performance by up to 5. 1% with 55. 0% fewer edges.
We ground the practical implications of this work through granular analysis on five real-world datasets with varying global homophily levels, demonstrating that (a) GNNs can fail to generalize to test nodes that deviate from the global homophily of a graph, and (b) high local homophily does not necessarily confer high performance for a node.
Particularly, in prevalent GNN frameworks (e. g., DGL and PyTorch-Geometric), the target edges (i. e., the edges being predicted) consistently exist as message passing edges in the graph during training.
We identify a distribution shift between small and large graphs in the eigenvalues of the normalized Laplacian/adjacency matrix, indicating a difference in the global node connectivity, which is found to be correlated with the node closeness centrality.
Backed by our theoretical analysis, instead of maximizing the recovery of cross-instance node dependencies -- which has been considered the key behind closing the performance gap between model aggregation and centralized training -- , our framework leverages randomized assignment of nodes or super-nodes (i. e., collections of original nodes) to partition the training graph such that it improves data uniformity and minimizes the discrepancy of gradient and loss function across instances.
Advances in the expressivity of pretrained models have increased interest in the design of adaptation protocols which enable safe and effective transfer learning.
Overall, our work carefully studies the effectiveness of popular scoring functions in realistic settings and helps to better understand their limitations.
Network alignment, or the task of finding corresponding nodes in different networks, is an important problem formulation in many application domains.
Overall, our work rigorously contextualizes, both empirically and theoretically, the effects of data-centric properties on augmentation strategies and learning paradigms for graph SSL.
While directly fine-tuning (FT) large-scale, pretrained models on task-specific data is well-known to induce strong in-distribution task performance, recent works have demonstrated that different adaptation protocols, such as linear probing (LP) prior to FT, can improve out-of-distribution generalization.
We study the task of node classification for graph neural networks (GNNs) and establish a connection between group fairness, as measured by statistical parity and equal opportunity, and local assortativity, i. e., the tendency of linked nodes to have similar attributes.
Recent works try to improve scalability via graph summarization -- i. e., they learn embeddings on a smaller summary graph, and then restore the node embeddings of the original graph.
Understanding the training dynamics of deep neural networks (DNNs) is important as it can lead to improved training efficiency and task performance.
Unsupervised graph representation learning is critical to a wide range of applications where labels may be scarce or expensive to procure.
AdaMEL models the attribute importance that is used to match entities through an attribute-level self-attention mechanism, and leverages the massive unlabeled data from new data sources through domain adaptation to make it generic and data-source agnostic.
Using the recent population augmentation graph-based analysis of self-supervised learning, we show theoretically that the success of GCL with popular augmentations is bounded by the graph edit distance between different classes.
We bridge two research directions on graph neural networks (GNNs), by formalizing the relation between heterophily of node labels (i. e., connected nodes tend to have dissimilar labels) and the robustness of GNNs to adversarial attacks.
While most network embedding techniques model the relative positions of nodes in a network, recently there has been significant interest in structural embeddings that model node role equivalences, irrespective of their distances to any specific nodes.
Thus, link prediction methods, which often rely on proximity-preserving embeddings or heuristic notions of node similarity, face a vast search space, with many pairs that are in close proximity, but that should not be linked.
We are the first to take a unified perspective to jointly explain the oversmoothing and heterophily problems at the node level.
Catastrophic forgetting undermines the effectiveness of deep neural networks (DNNs) in scenarios such as continual learning and lifelong learning.
While most network embedding techniques model the proximity between nodes in a network, recently there has been significant interest in structural embeddings that are based on node equivalences, a notion rooted in sociology: equivalences or positions are collections of nodes that have similar roles--i. e., similar functions, ties or interactions with nodes in other positions--irrespective of their distance or reachability in the network.
Network Embedding Social and Information Networks
While previous work on dynamic modeling and embedding has focused on representing a stream of timestamped edges using a time-series of graphs based on a specific time-scale (e. g., 1 month), we propose the notion of an $\epsilon$-graph time-series that uses a fixed number of edges for each graph, and show its superiority over the time-scale representation used in previous work.
We present CoDEx, a set of knowledge graph completion datasets extracted from Wikidata and Wikipedia that improve upon existing knowledge graph completion benchmarks in scope and level of difficulty.
Ranked #2 on Link Prediction on CoDEx Large
In this paper, we propose a framework, called G-CREWE (Graph CompREssion With Embedding) to solve the network alignment problem.
Frequent pattern mining is a key area of study that gives insights into the structure and dynamics of evolving networks, such as social or road networks.
We investigate the representation power of graph neural networks in the semi-supervised node classification task under heterophily or low homophily, i. e., in networks where connected nodes may have different class labels and dissimilar features.
A significant effort has been made to train neural networks that replicate algorithmic reasoning, but they often fail to learn the abstract concepts underlying these algorithms.
Network alignment, the process of finding correspondences between nodes in different graphs, has many scientific and industrial applications.
We first conduct an evaluation under the standard closed-world assumption (CWA), in which predicted triples not already in the knowledge graph are considered false, and show that existing calibration techniques are effective for KGE under this common but narrow assumption.
We apply our rules to three large KGs (NELL, DBpedia, and Yago), and tasks such as compression, various types of error detection, and identification of incomplete information.
Unfortunately, recent work has sometimes confused the notion of structural roles and communities (based on proximity) leading to misleading or incorrect claims about the capabilities of network embedding methods.
Identity stitching, the task of identifying and matching various online references (e. g., sessions over different devices and timespans) to the same user in real-world web services, is crucial for personalization and recommendations.
Motivated by the computational and storage challenges that dense embeddings pose, we introduce the problem of latent network summarization that aims to learn a compact, latent representation of the graph structure with dimensionality that is independent of the input graph size (i. e., #nodes and #edges), while retaining the ability to derive node representations on the fly.
Social and Information Networks
Contrary to baseline methods, which generally learn explicit graph representations by solely using an adjacency matrix, t-PINE avails a multi-view information graph, the adjacency matrix represents the first view, and a nearest neighbor adjacency, computed over the node features, is the second view, in order to learn explicit and implicit node representations, using the Canonical Polyadic (a. k. a.
Representation learning algorithms aim to preserve local and global network structure by identifying node neighborhood notions.
Understanding the existence of patterns and trends in this data could be useful to a variety of stakeholders, particularly as Detroit emerges from Chapter 9 bankruptcy, but the patterns in such data are often complex and multivariate and the city lacks dedicated resources for detailed analysis of this data.
Computers and Society
While advances in computing resources have made processing enormous amounts of data possible, human ability to identify patterns in such data has not scaled accordingly.
Often, we can answer such questions and label nodes in a network based on the labels of their neighbors and appropriate assumptions of homophily ("birds of a feather flock together") or heterophily ("opposites attract").
This survey aims to provide a general, comprehensive, and structured overview of the state-of-the-art methods for anomaly detection in data represented as graphs.
Social and Information Networks Cryptography and Security
Having such features will enable a wealth of graph mining tasks, including clustering, outlier detection, visualization, etc.
Social and Information Networks Physics and Society Applications