Therefore, a threshold quantization strategy with a relatively small error is adopted in QCMD adagrad and QRDA adagrad to improve the signal-to-noise ratio and preserve the sparsity of the model.
Motivated by this observation, we propose a principled GCL framework on Imbalanced node classification (ImGCL), which automatically and adaptively balances the representation learned from GCL without knowing the labels.
no code implementations • 20 May 2022 • Bingzhe Wu, Jintang Li, Junchi Yu, Yatao Bian, Hengtong Zhang, Chaochao Chen, Chengbin Hou, Guoji Fu, Liang Chen, Tingyang Xu, Yu Rong, Xiaolin Zheng, Junzhou Huang, Ran He, Baoyuan Wu, Guangyu Sun, Peng Cui, Zibin Zheng, Zhe Liu, Peilin Zhao
Deep graph learning has achieved remarkable progresses in both business and scientific areas ranging from finance and e-commerce, to drug and advanced material discovery.
The probability prediction of multivariate time series is a notoriously challenging but practical task.
To the best of our knowledge, our method is the first GNN-based bilevel optimization framework for resolving this task.
In this paper, we propose a general framework to solve the above two challenges simultaneously.
Test-time adaptation (TTA) seeks to tackle potential distribution shifts between training and testing data by adapting a given model w. r. t.
Motivated by this, we propose to predict those hard-classified test samples in a looped manner to boost the model performance.
Learning set functions becomes increasingly more important in many applications like product recommendation and compound selection in AI-aided drug discovery.
In this survey, we provide a comprehensive review of various Graph Transformer models from the architectural design perspective.
Although these methods have made great progress, they are often limited by the recommender system's direct exposure and inactive interactions, and thus fail to mine all potential user interests.
1 code implementation • 24 Jan 2022 • Yuanfeng Ji, Lu Zhang, Jiaxiang Wu, Bingzhe Wu, Long-Kai Huang, Tingyang Xu, Yu Rong, Lanqing Li, Jie Ren, Ding Xue, Houtim Lai, Shaoyong Xu, Jing Feng, Wei Liu, Ping Luo, Shuigeng Zhou, Junzhou Huang, Peilin Zhao, Yatao Bian
AI-aided drug discovery (AIDD) is gaining increasing popularity due to its promise of making the search for new pharmaceuticals quicker, cheaper and more efficient.
The main target of retrosynthesis is to recursively decompose desired molecules into available building blocks.
Ranked #6 on Single-step retrosynthesis on USPTO-50k
In this paper, we propose a novel sequence verification task that aims to distinguish positive video pairs performing the same action sequence from negative ones with step-level transformations but still conducting the same task.
To this end, we propose a general graph convolutional module (GCM) that can be easily plugged into existing action localization methods, including two-stage and one-stage paradigms.
Ranked #2 on Temporal Action Localization on THUMOS’14 (mAP IOU@0.1 metric)
Graph neural networks (GNNs) have demonstrated superior performance for semi-supervised node classification on graphs, as a result of their ability to exploit node features and topological information simultaneously.
In ATS, for the first time, we design a neural scheduler to decide which meta-training tasks to use next by predicting the probability being sampled for each candidate task, and train the scheduler to optimize the generalization capacity of the meta-model to unseen tasks.
To alleviate the action distribution shift problem in extracting RL policy from static trajectories, we propose Value Penalized Q-learning (VPQ), an uncertainty-based offline RL algorithm.
Graph Clustering, which clusters the nodes of a graph given its collection of node features and edge connections in an unsupervised manner, has long been researched in graph learning and is essential in certain applications.
To address this, we propose a simple and efficient data augmentation strategy, local augmentation, to learn the distribution of the node features of the neighbors conditioned on the central node's feature and enhance GNN's expressive power with generated features.
To address this, we present a neural architecture adaptation method, namely Adaptation eXpert (AdaXpert), to efficiently adjust previous architectures on the growing data.
A linear graphon factorization model works as a decoder, leveraging the latent representations to reconstruct the induced graphons (and the corresponding observed graphs).
Recent studies imply that deep neural networks are vulnerable to adversarial examples -- inputs with a slight but intentional perturbation are incorrectly classified by the network.
1 code implementation • 14 Apr 2021 • Chaoyang He, Keshav Balasubramanian, Emir Ceyani, Carl Yang, Han Xie, Lichao Sun, Lifang He, Liangwei Yang, Philip S. Yu, Yu Rong, Peilin Zhao, Junzhou Huang, Murali Annavaram, Salman Avestimehr
FedGraphNN is built on a unified formulation of graph FL and contains a wide range of datasets from different domains, popular GNN models, and FL algorithms, with secure and efficient system support.
To this end, we propose a Pareto-Frontier-aware Neural Architecture Generator (NAG) which takes an arbitrary budget as input and produces the Pareto optimal architecture for the target budget.
To address this issue, we propose a Neural Architecture Transformer++ (NAT++) method which further enlarges the set of candidate transitions to improve the performance of architecture optimization.
To find promising architectures under different budgets, existing methods may have to perform an independent search for each budget, which is very inefficient and unnecessary.
Graph Neural Networks (GNNs) draw their strength from explicitly modeling the topological information of structured data.
Specifically, AST adopts a Sparse Transformer as the generator to learn a sparse attention map for time series forecasting, and uses a discriminator to improve the prediction performance from sequence level.
no code implementations • 25 Nov 2020 • Deheng Ye, Guibin Chen, Peilin Zhao, Fuhao Qiu, Bo Yuan, Wen Zhang, Sheng Chen, Mingfei Sun, Xiaoqian Li, Siqin Li, Jing Liang, Zhenjie Lian, Bei Shi, Liang Wang, Tengfei Shi, Qiang Fu, Wei Yang, Lanxiao Huang
Unlike prior attempts, we integrate the macro-strategy and the micromanagement of MOBA-game-playing into neural networks in a supervised and end-to-end manner.
Retrosynthesis is the process of recursively decomposing target molecules into available building blocks.
4 code implementations • 27 Jul 2020 • Chaoyang He, Songze Li, Jinhyun So, Xiao Zeng, Mi Zhang, Hongyi Wang, Xiaoyang Wang, Praneeth Vepakomma, Abhishek Singh, Hang Qiu, Xinghua Zhu, Jianzong Wang, Li Shen, Peilin Zhao, Yan Kang, Yang Liu, Ramesh Raskar, Qiang Yang, Murali Annavaram, Salman Avestimehr
Federated learning (FL) is a rapidly growing research field in machine learning.
Upon a formal discussion of the variants of IGI, we choose a particular case study of node clustering by making use of the graph labels and node features, with an assistance of a hierarchical graph that further characterizes the connections between different graphs.
With the proposed search strategy, our Curriculum Neural Architecture Search (CNAS) method significantly improves the search efficiency and finds better architectures than existing NAS methods.
Deep learning based medical image diagnosis has shown great potential in clinical medicine.
Recently, Neural Architecture Search (NAS) has been successfully applied to multiple artificial intelligence areas and shows better performance compared with hand-designed networks.
There are two main challenges: 1) the discrepancy of data distributions between domains; 2) the task difference between the diagnosis of typical pneumonia and COVID-19.
To alleviate the performance disturbance issue, we propose a new disturbance-immune update strategy for model updating.
However, existing MF approaches suffer from two major problems: (1) Expensive computations and storages due to the centralized model training mechanism: the centralized learners have to maintain the whole user-item rating matrix, and potentially huge low rank matrices.
To achieve this goal, we design two cross-domain attention modules to adapt context dependencies from both spatial and channel views.
Ranked #20 on Domain Adaptation on SYNTHIA-to-Cityscapes
This task, however, has two main difficulties: (i) the non-stationary price series and complex asset correlations make the learning of feature representation very hard; (ii) the practicality principle in financial markets requires controlling both transaction and risk costs.
Meanwhile, detecting rumors from such massive information in social media is becoming an arduous challenge.
no code implementations • 20 Dec 2019 • Deheng Ye, Zhao Liu, Mingfei Sun, Bei Shi, Peilin Zhao, Hao Wu, Hongsheng Yu, Shaojie Yang, Xipeng Wu, Qingwei Guo, Qiaobo Chen, Yinyuting Yin, Hao Zhang, Tengfei Shi, Liang Wang, Qiang Fu, Wei Yang, Lanxiao Huang
We study the reinforcement learning problem of complex action control in the Multi-player Online Battle Arena (MOBA) 1v1 games.
In these problems, there are two key challenges: the query budget is often limited; the ratio between classes is highly imbalanced.
In this paper, we seek to exploit rich labeled data from relevant domains to help the learning in the target task with unsupervised domain adaptation (UDA).
To verify the effectiveness of the proposed strategies, we apply NAT on both hand-crafted architectures and NAS based architectures.
In this paper, we explore a general Aggregated Gradient Langevin Dynamics framework (AGLD) for the Markov Chain Monte Carlo (MCMC) sampling.
With massive possible synthetic routes in chemistry, retrosynthesis prediction is still a challenge for researchers.
Ranked #14 on Single-step retrosynthesis on USPTO-50k
ii) the $W$-distance of a specific layer to the target distribution tends to decrease along training iterations.
Automated machine learning aims to automate the whole process of machine learning, including model configuration.
Then we apply the GCNs over the graph to model the relations among different proposals and learn powerful representations for the action classification and localization.
Ranked #4 on Temporal Action Localization on THUMOS’14 (mAP IOU@0.1 metric)
Decentralized Online Learning (online learning in decentralized networks) attracts more and more attention, since it is believed that Decentralized Online Learning can help the data providers cooperatively better solve their online problems without sharing their private data to a third party or other providers.
Bayesian optimisation is a popular technique for hyperparameter learning but typically requires initial exploration even in cases where similar prior tasks have been solved.
We study the joint distribution matching problem which aims at learning bidirectional mappings to match the joint distribution of two domains.
However, most deep learning methods employ feed-forward architectures, and thus the dependencies between LR and HR images are not fully exploited, leading to limited learning performance.
However, to choose the rank properly, it usually needs to run the algorithm for many times using different ranks, which clearly is inefficient for some large-scale datasets.
The experimental results demonstrate that, comparing with the classic and state-of-the-art (distributed) latent factor models, DCH has comparable performance in terms of recommendation accuracy but has both fast convergence speed in offline model training procedure and realtime efficiency in online recommendation procedure.
Cost-Sensitive Online Classification has drawn extensive attention in recent years, where the main approach is to directly online optimize two well-known cost-sensitive metrics: (i) weighted sum of sensitivity and specificity; (ii) weighted misclassification cost.
Online learning represents an important family of machine learning algorithms, in which a learner attempts to resolve an online prediction (or any type of decision-making) task by learning a model/hypothesis from a sequence of data instances one at a time.
To address this subsequent challenge, we follow the general projection-free algorithmic framework of Online Conditional Gradient and propose an Online Compact Convex Factorization Machine (OCCFM) algorithm that eschews the projection operation with efficient linear optimization steps.
The conditional gradient algorithm has regained a surge of research interest in recent years due to its high efficiency in handling large-scale machine learning problems.
Multi-Task Learning (MTL) can enhance a classifier's generalization performance by learning multiple related tasks simultaneously.
In this paper, we propose a novel scheme of Online Bayesian Collaborative Topic Regression (OBCTR) which is efficient and scalable for learning from data streams.
Despite their encouraging results reported, the existing online AUC maximization algorithms often adopt simple online gradient descent approaches that fail to exploit the geometrical knowledge of the data observed during the online learning process, and thus could suffer from relatively larger regret.
To overcome this drawback, we present a novel framework of Budget Online Multiple Kernel Learning (BOMKL) and propose a new Sparse Passive Aggressive learning to perform effective budget online learning.
Unlike some existing online data stream classification techniques that are often based on first-order online learning, we propose a framework of Sparse Online Classification (SOC) for data stream classification, which includes some state-of-the-art first-order sparse online learning algorithms as special cases and allows us to derive a new effective second-order online learning algorithm for data stream classification.
Stochastic Gradient Descent (SGD) is a popular optimization method which has been applied to many important machine learning tasks such as Support Vector Machines and Deep Neural Networks.
Uniform sampling of training data has been commonly used in traditional stochastic optimization algorithms such as Proximal Stochastic Gradient Descent (prox-SGD) and Proximal Stochastic Dual Coordinate Ascent (prox-SDCA).
In this paper, we address a new problem of active learning with expert advice, where the outcome of an instance is disclosed only when it is requested by the online learner.
This is clearly insufficient since when a new misclassified example is added to the pool of support vectors, we generally expect it to affect the weights for the existing support vectors.