Recent years have witnessed intensive research interests on training deep neural networks (DNNs) more efficiently by quantization-based compression methods, which facilitate DNNs training in two ways: (1) activations are quantized to shrink the memory consumption, and (2) gradients are quantized to decrease the communication cost.
This paper aims to reduce the monetary cost for serving LLMs by leveraging preemptible GPU instances on modern clouds, which offer accesses to spare GPUs at a much cheaper price than regular instances but may be preempted by the cloud at any time.
Learnable embedding vector is one of the most important applications in machine learning, and is widely used in various database-related domains.
To further accelerate Scalable GNNs inference in this inductive setting, we propose an online propagation framework and two novel node-adaptive propagation methods that can customize the optimal propagation depth for each node based on its topological information and thereby avoid redundant feature propagation.
1 code implementation • 23 Sep 2023 • Hailin Zhang, Yujing Wang, Qi Chen, Ruiheng Chang, Ting Zhang, Ziming Miao, Yingyan Hou, Yang Ding, Xupeng Miao, Haonan Wang, Bochen Pang, Yuefeng Zhan, Hao Sun, Weiwei Deng, Qi Zhang, Fan Yang, Xing Xie, Mao Yang, Bin Cui
We empirically show that our model achieves better performance on the commonly used academic benchmarks MSMARCO Passage and Natural Questions, with comparable serving latency to dense retrieval solutions.
The distributed data analytic system -- Spark is a common choice for processing massive volumes of heterogeneous data, while it is challenging to tune its parameters to achieve high performance.
Graph Active Learning (GAL), which aims to find the most informative nodes in graphs for annotation to maximize the Graph Neural Networks (GNNs) performance, has attracted many research efforts but remains non-trivial challenges.
To address this issue, we propose to learn a new powerful graph representation space by directly labeling nodes' diverse local structures for GNN-to-MLP distillation.
By swapping the context embeddings between nodes and edges and measuring the agreement in the embedding space, we enable the mutual detection of node and edge anomalies.
Transformer models have emerged as the leading approach for achieving state-of-the-art performance across various application domains, serving as the foundation for advanced large-scale deep learning (DL) models.
To remove class spurious feature caused by distribution shifts, we propose Individual Graph Information Bottleneck (I-GIB) which discards irrelevant information by minimizing the mutual information between the input graph and its embeddings.
The key intuition behind our approach is to utilize the semantic mapping between the minor modifications on the input text and the affected regions on the output image.
Black-box optimization (BBO) has a broad range of applications, including automatic machine learning, experimental design, and database knob tuning.
We first present an empirical analysis on the problems and opportunities of training MoE models, which motivates us to overcome the routing imbalance and fluctuation problems by a dynamic expert management and device placement mechanism.
To tune different components for DBMS, a coordinating mechanism is needed to make the multiple agents cognizant of each other.
Recent years have witnessed the unprecedented achievements of large-scale pre-trained models, especially the Transformer models.
A wide spectrum of design and decision problems, including parameter tuning, A/B testing and drug design, intrinsically are instances of black-box optimization.
When applying transfer learning to accelerate the tuning process, we notice two domain-specific challenges: 1) most previous work focus on transferring tuning history, while expert knowledge from Spark engineers is of great potential to improve the tuning performance but is not well studied so far; 2) history tasks should be carefully utilized, where using dissimilar ones lead to a deteriorated performance in production.
To tackle this issue and further enhance the ensemble performance, we propose DivBO, a diversity-aware framework to inject explicit search of diversity into the CASH problems.
Transformer models have achieved state-of-the-art performance on various domains of applications and gradually becomes the foundations of the advanced large deep learning (DL) models.
Generating images from graph-structured inputs, such as scene graphs, is uniquely challenging due to the difficulty of aligning nodes and connections in graphs with objects and their relations in images.
In recent years, many efforts have been made on distributed GNN training, and an array of training algorithms and systems have been proposed.
Contrastive Language-Image Pre-training (CLIP) has been shown to learn visual representations with great transferability, which achieves promising accuracy for zero-shot classification.
Ranked #4 on Training-free 3D Point Cloud Classification on ScanObjectNN (using extra training data)
This survey aims to provide a contextualized, in-depth look at the state of diffusion models, identifying the key areas of focus and pointing to potential areas for further exploration.
Vertical federated learning (VFL) is an emerging paradigm that allows different parties (e. g., organizations or enterprises) to collaboratively build machine learning models with privacy protection.
End-to-end AutoML has attracted intensive interests from both academia and industry which automatically searches for ML pipelines in a space induced by feature engineering, algorithm/model selection, and hyper-parameter tuning.
Graph neural networks (GNNs) have been intensively applied to various graph-based applications.
First, GNNs can learn higher-order structural information by stacking more layers but can not deal with large depth due to the over-smoothing issue.
First, to address the functionality of VFL models, we propose the federated source layers to unite the data from different parties.
Graph neural networks (GNNs) have achieved great success in many graph-based applications.
Ranked #9 on Node Property Prediction on ogbn-mag
Graph Neural Networks (GNNs) have achieved great success in various graph mining tasks. However, drastic performance degradation is always observed when a GNN is stacked with many layers.
With the extensive applications of machine learning models, automatic hyperparameter optimization (HPO) has become increasingly important.
The extensive experiments show that our approach considerably boosts BO by designing a promising and compact search space instead of using the entire space, and outperforms the state-of-the-arts on a wide range of benchmarks, including machine learning and deep learning tuning tasks, and neural architecture search.
Prompt Learning has recently gained great popularity in bridging the gap between pretraining tasks and various downstream tasks.
To tackle this problem, Automated Machine Learning (AutoML) is introduced to automatically search for the proper candidates for different parts of deep recommender systems.
We introduce ZOOMER, a system deployed at Taobao, the largest e-commerce platform in China, for training and serving GNN-based recommendations over web-scale graphs.
Graph Neural Networks (GNNs) have achieved great success in various tasks, but their performance highly relies on a large number of labeled nodes, which typically requires considerable human effort.
Through deconstructing the message passing mechanism, PasCa presents a novel Scalable Graph Neural Architecture Paradigm (SGAP), together with a general architecture design space consisting of 150k different designs.
The ever-growing demand and complexity of machine learning are putting pressure on hyper-parameter tuning systems: while the evaluation cost of models continues to increase, the scalability of state-of-the-arts starts to become a crucial bottleneck.
Then it diversifies the experts and continues to train the MoE with a novel Dense-to-Sparse gate (DTS-Gate).
For example, the distributed K-core decomposition algorithm can scale to a large graph with 136 billion edges without losing correctness with our divide-and-conquer technique.
Embedding models have been an effective learning paradigm for high-dimensional data.
On top of that, we design an inter-view adapter to better extract the global feature and adaptively fuse the few-shot knowledge learned from 3D into CLIP pre-trained in 2D.
Ranked #3 on Open Vocabulary Object Detection on STPLS3D
Message passing is the core of most graph models such as Graph Convolutional Network (GCN) and Label Propagation (LP), which usually require a large number of clean labeled data to smooth out the neighborhood over the graph.
Recent works reveal that feature or label smoothing lies at the core of Graph Neural Networks (GNNs).
Designing neural architectures requires immense manual efforts.
1 code implementation • 13 Sep 2021 • Yuxing Han, Ziniu Wu, Peizhi Wu, Rong Zhu, Jingyi Yang, Liang Wei Tan, Kai Zeng, Gao Cong, Yanzhao Qin, Andreas Pfadler, Zhengping Qian, Jingren Zhou, Jiangneng Li, Bin Cui
Therefore, we propose a new metric P-Error to evaluate the performance of CardEst methods, which overcomes the limitation of Q-Error and is able to reflect the overall end-to-end performance of CardEst methods.
Graph neural networks (GNNs) have recently achieved state-of-the-art performance in many graph-based applications.
Based on the experimental results, we answer the following two essential questions: (1) what actually leads to the compromised performance of deep GNNs; (2) when we need and how to build deep GNNs.
Data selection methods, such as active learning and core-set selection, are useful tools for improving the data efficiency of deep learning models on large-scale datasets.
Unfortunately, many real-world networks are sparse in terms of both edges and labels, leading to sub-optimal performance of GNNs.
End-to-end AutoML has attracted intensive interests from both academia and industry, which automatically searches for ML pipelines in a space induced by feature engineering, algorithm/model selection, and hyper-parameter tuning.
Black-box optimization (BBO) has a broad range of applications, including automatic machine learning, engineering, physics, and experimental design.
It is based on the idea that similar users not only have a similar taste on items, but also have similar treatment effect under recommendations.
In recent studies, neural message passing has proved to be an effective way to design graph neural networks (GNNs), which have achieved state-of-the-art performance in many graph-based tasks.
On the one hand, it utilizes UI relations and user neighborhood to capture both global and local information.
In this framework, the BO methods are used to solve the HPO problem for each ML algorithm separately, incorporating a much smaller hyperparameter space for BO methods.
To resolve this, we propose a new structure learning algorithm LEAST, which comprehensively fulfills our business requirements as it attains high accuracy, efficiency and scalability at the same time.
Instead of sampling configurations randomly in HB, BOHB samples configurations based on a BO surrogate model, which is constructed with the high-fidelity measurements only.
Despite decades of research, existing methods either over simplify the models only using independent factorization which leads to inaccurate estimates, or over complicate them by lossless conditional factorization without any independent assumption which results in slow probability computation.
With the explosive growth of online information, recommender systems play a key role to alleviate such information overload.
Sequential recommendation methods play a crucial role in modern recommender systems because of their ability to capture a user's dynamic interest from her/his historical interactions.
Finally, with the new edge sampler and random walk model abstraction, we carefully implement a scalable NRL framework called UniNet.
no code implementations • 10 Oct 2019 • Xupeng Miao, Nezihe Merve Gürel, Wentao Zhang, Zhichao Han, Bo Li, Wei Min, Xi Rao, Hansheng Ren, Yinan Shan, Yingxia Shao, Yujie Wang, Fan Wu, Hui Xue, Yaming Yang, Zitao Zhang, Yang Zhao, Shuai Zhang, Yujing Wang, Bin Cui, Ce Zhang
Despite the wide application of Graph Convolutional Network (GCN), one major limitation is that it does not benefit from the increasing depth and suffers from the oversmoothing problem.
First, we introduced RP Trees into the tasks of similarity measurement such that accuracy is improved.
Unlike the densification to fill the empty bins after they undesirably occur, our design goal is to balance the load so as to reduce the empty bins in advance.
Gradient boosting decision tree (GBDT) is a widely-used machine learning algorithm in both data analytic competitions and real-world industrial applications.
The performance of deep neural networks crucially depends on good hyperparameter configurations.