While VC-dimension does result in trivial generalisation error bounds in this setting as well, we show that transductive Rademacher complexity can explain the generalisation properties of graph convolutional networks for stochastic block models.
Here, we study the problem of *causal generalization* -- generalizing from the observational to interventional distributions -- in forecasting.
Despite the ubiquity of kernel-based clustering, surprisingly few statistical guarantees exist beyond settings that consider strong structural assumptions on the data generation process.
This paper focuses on semi-supervised learning on graphs, and explains the above observations through the lens of Neural Tangent Kernels (NTKs).
Using the proposed graph distance, we present two clustering algorithms and show that they achieve state-of-the-art results.
Hypothesis testing for graphs has been an important tool in applied research fields for more than two decades, and still remains a challenging problem as one often needs to draw inference from few replicates of large graphs.
We address the classical problem of hierarchical clustering, but in a framework where one does not have access to a representation of the objects or their pairwise similarities.
Given a population of $m$ graphs from each model, we derive minimax separation rates for the problem of testing $P=Q$ against $d(P, Q)>\rho$.
We consider a two-sample hypothesis testing problem, where the distributions are defined on the space of undirected graphs, and one has access to only one observation from each model.
We consider machine learning in a comparison-based setting where we are given a set of points in a metric space, but we have no access to the actual distances between the points.
This work is motivated by two issues that arise when a hypergraph partitioning approach is used to tackle computer vision problems: (i) The uniform hypergraphs constructed for higher-order learning contain all edges, but most have negligible weights.
Hypergraph partitioning lies at the heart of a number of problems in machine learning and network sciences.
Spectral graph partitioning methods have received significant attention from both practitioners and theorists in computer science.
Motivated by multi-distribution divergences, which originate in information theory, we propose a notion of `multi-point' kernels, and study their applications.
This motivates us to study SF schemes for gradient estimation using the q-Gaussian distribution.
In this paper, we introduce a maximum entropy classification method with feature selection for large dimensional data such as text datasets that is generative in nature.
Motivated by the importance of power-law distributions in statistical modeling, in this paper, we propose the notion of power-law kernels to investigate power-laws in learning problem.