Surprisingly, DeepSet outperforms transformers across a variety of distribution shifts, implying that preserving permutation invariance symmetry to input demonstrations is crucial for OOD ICL.
Under this framework, we create comprehensive datasets to benchmark (1) the state-of-the-art ML approaches for reaction prediction in the OOD setting and (2) the state-of-the-art graph OOD methods in kinetics property prediction problems.
In this paper, we develop Deep Graph Inference (DGI) -- a system for easy and efficient GNN model inference, which automatically translates the training code of a GNN model for layer-wise execution.
In this paper, we resolve this issue and derive the first high-probability bounds for the private stochastic method with clipping.
Graph neural networks (GNNs) have been widely used for representation learning on graph data.
Representation learning on graphs, also called graph embedding, has demonstrated its significant impact on a series of machine learning applications such as classification, prediction and recommendation.
The AutoAttack (AA) has been the most reliable method to evaluate adversarial robustness when considerable computational resources are available.
Recently, there has been a growing surge of interest in enabling machine learning systems to generalize well to Out-of-Distribution (OOD) data.
Recently Graph Injection Attack (GIA) emerges as a practical attack scenario on Graph Neural Networks (GNNs), where the adversary can merely inject few malicious nodes instead of modifying existing nodes or edges, i. e., Graph Modification Attack (GMA).
Despite recent success in using the invariance principle for out-of-distribution (OOD) generalization on Euclidean data (e. g., images), studies on graph data are still limited.
We show that stochastic acceleration can be achieved under the perturbed iterate framework (Mania et al., 2017) in asynchronous lock-free optimization, which leads to the optimal incremental gradient complexity for finite-sum objectives.
However, when tested on attacks different from the given attack simulated in training, the robustness may drop significantly (e. g., even worse than no reweighting).
Graph neural networks (GNNs) have achieved remarkable performance in many graph analytics tasks such as node classification, link prediction and graph clustering.
In convex optimization, the problem of finding near-stationary points has not been adequately studied yet, unlike other optimality measures such as the function value.
Graph neural networks (GNNs) have gained increasing popularity in many areas such as e-commerce, social networks and bio-informatics.
Graph neural networks (GNNs) have achieved breakthrough performance in graph analytics such as node classification, link prediction and graph clustering.
We study how to support elasticity, that is, the ability to dynamically adjust the parallelism (i. e., the number of GPUs), for deep neural network (DNN) training in a GPU cluster.
To assess the discrepancy between the prediction and the ground-truth in the downstream tasks for these contrastive pairs, we adapt the expected calibration error (ECE) to graph contrastive learning.
When a new user just signs up on a website, we usually have no information about him/her, i. e. no interaction with items, no user profile and no social links with other users.
The graph Laplacian regularization term is usually used in semi-supervised representation learning to provide graph structure information for a model $f(X)$.
This paper aims to provide a theoretical framework to understand GNNs, specifically, spectral graph convolutional networks and graph attention networks, from graph signal denoising perspectives.
Specifically, instead of tackling the original objective directly, we construct a shifted objective function that has the same minimizer as the original objective and encodes both the smoothness and strong convexity of the original objective in an interpolation condition.
A good parallelization strategy can significantly improve the efficiency or reduce the cost for the distributed training of deep neural networks (DNNs).
In this paper, we propose self-enhanced GNN (SEG), which improves the quality of the input data using the outputs of existing GNN models for better performance on semi-supervised node classification.
Edit-distance-based string similarity search has many applications such as spell correction, data de-duplication, and sequence alignment.
In particular, at the high compression ratio end, HSQ provides a low per-iteration communication cost of $O(\log d)$, which is favorable for federated learning.
In this paper, we present a new angle to analyze the quantization error, which decomposes the quantization error into norm error and direction error.
Then we explain the good performance of ip-NSW as matching the norm bias of the MIPS problem - large norm items have big in-degrees in the ip-NSW proximity graph and a walk on the graph spends the majority of computation on these items, thus effectively avoids unnecessary computation on small norm items.
Stochastic Gradient Descent (SGD) with Nesterov's momentum is a widely used optimizer in deep learning, which is observed to have excellent generalization performance.
Collaborative filtering, a widely-used recommendation technique, predicts a user's preference by aggregating the ratings from similar users.
Recently, locality sensitive hashing (LSH) was shown to be effective for MIPS and several algorithms including $L_2$-ALSH, Sign-ALSH and Simple-LSH have been proposed.
The heavy-tailed distributions of corrupted outliers and singular values of all channels in low-level vision have proven effective priors for many applications such as background modeling, photometric stereo and image alignment.
This paper proposes an accelerated proximal stochastic variance reduced gradient (ASVRG) method, in which we design a simple and effective momentum acceleration trick.
Neyshabur and Srebro proposed Simple-LSH, which is the state-of-the-art hashing method for maximum inner product search (MIPS) with performance guarantee.
Recent years have witnessed exciting progress in the study of stochastic variance reduced gradient methods (e. g., SVRG, SAGA), their accelerated variants (e. g, Katyusha) and their extensions in many different settings (e. g., online, sparse, asynchronous, distributed).
In this paper, we propose a simple variant of the original SVRG, called variance reduced stochastic gradient descent (VR-SGD).
In order to make sufficient decrease for stochastic optimization, we design a new sufficient decrease criterion, which yields sufficient decrease versions of stochastic variance reduction algorithms such as SVRG-SD and SAGA-SD as a byproduct.
In this paper, we propose an accelerated first-order method for geodesically convex optimization, which is the generalization of the standard Nesterov's accelerated method from Euclidean space to nonlinear Riemannian space.
Besides having a low per-iteration complexity as existing stochastic ADMM methods, ASVRG-ADMM improves the convergence rate on general convex problems from O(1/T) to O(1/T^2).
Recently, research on accelerated stochastic gradient descent methods (e. g., SVRG) has made exciting progress (e. g., linear convergence for strongly convex problems).
In order to make sufficient decrease for stochastic optimization, we design a new sufficient decrease criterion, which yields sufficient decrease versions of variance reduction algorithms such as SVRG-SD and SAGA-SD as a byproduct.
In this paper, we first define two tractable Schatten quasi-norms, i. e., the Frobenius/nuclear hybrid and bi-nuclear quasi-norms, and then prove that they are in essence the Schatten-2/3 and 1/2 quasi-norms, respectively, which lead to the design of very efficient algorithms that only need to update two much smaller factor matrices.
In this paper, we rigorously prove that for any p, p1, p2>0 satisfying 1/p=1/p1+1/p2, the Schatten-p quasi-norm of any matrix is equivalent to minimizing the product of the Schatten-p1 norm (or quasi-norm) and Schatten-p2 norm (or quasi-norm) of its two factor matrices.
We first induce the equivalence relation of the Schatten p-norm (0<p<\infty) of a low multi-linear rank tensor and its core tensor.
Then the Schatten 1-norm of the core tensor is used to replace that of the whole tensor, which leads to a much smaller-scale matrix SNM problem.
In this paper, we propose a scalable, provable structured low-rank matrix factorization method to recover low-rank and sparse matrices from missing and grossly corrupted data, i. e., robust matrix completion (RMC) problems, or incomplete and grossly corrupted measurements, i. e., compressive principal component pursuit (CPCP) problems.
To address these problems, we first propose a parallel trace norm regularized tensor decomposition method, and formulate it as a convex optimization problem.
We further investigate the evolution of user-level sentiments and latent feature vectors in an online framework and devise an efficient online algorithm to sequentially update the clustering of tweets, users and features with newly arrived data.