SBAG firstly pre-trains a multi-layer perception network to capture social bot features, and then constructs multiple graph neural networks by embedding the features to model the early propagation of posts, which is further used to detect rumors.
Moreover, this task can be used to improve visual question generation and visual question answering.
In this paper, we propose to learn a prototype encoder from relation definition in a way that is useful for relation instance classification.
The key insight of our proposal is that we employ a weight prediction strategy in the forward pass to ensure that each mini-batch uses consistent and staleness-free weights to compute the forward pass.
To this end, we introduce TaskBench to evaluate the capability of LLMs in task automation.
However, selecting the appropriate edge feature to define patient similarity and construct the graph is challenging, given that each patient is depicted by high-dimensional features from diverse sources.
To overcome the aforementioned challenges, we propose an Unified Medical Image Pre-training framework, namely UniMedI, which utilizes diagnostic reports as common semantic space to create unified representations for diverse modalities of medical images (especially for 2D and 3D images).
In this paper, for the first time, we investigate how to search in a training-free manner with the help of teacher models and devise an effective Training-free ViT (TVT) search framework.
To the best of our knowledge, we are the first to exploit the LUT structure to extract temporal information in video tasks.
It is crucial to effectively model feature interactions to improve the prediction performance of CTR models.
Ranked #1 on Click-Through Rate Prediction on Criteo
In addition, we present a new architecture of assigning independent FR modules to separate sub-networks for parallel CTR models, as opposed to the conventional method of inserting a shared FR module on top of the embedding layer.
In this paper, we study graph label noise in the context of arbitrary heterophily, with the aim of rectifying noisy labels and assigning labels to previously unlabeled nodes.
Sign-based stochastic methods have gained attention due to their ability to achieve robust performance despite using only the sign information for parameter updates.
Large language models (LLMs) have gained enormous attention from both academia and industry, due to their exceptional ability in language generation and extremely powerful generalization.
Inspired by these findings, we propose LongLLMLingua for prompt compression towards improving LLMs' perception of the key information to simultaneously address the three challenges.
Finally, we select the most fitting chains of evidence from the evidence forest and integrate them into the generated story, thereby enhancing the narrative's complexity and credibility.
The rapid evolution of the web has led to an exponential growth in content.
Stochastic gradient descent (SGD) performed in an asynchronous manner plays a crucial role in training large-scale machine learning models.
Sequential recommendation demonstrates the capability to recommend items by modeling the sequential behavior of users.
To this end, we employ domain adversarial learning as a heuristic neural network initialization method, which can help the meta-learning module converge to a better optimal.
Theoretically, we prove the convergence of the meta-learning algorithm in MTEM and analyze the effectiveness of MTEM in achieving domain adaptation.
We conducted comprehensive experiments to validate the effectiveness of IMCorrect and the results demonstrate that IMCorrect is superior in completeness, utility, and efficiency, and is applicable in many recommendation unlearning scenarios.
Seeing is believing, however, the underlying mechanism of how human visual perceptions are intertwined with our cognitions is still a mystery.
In this paper, a three-machine equivalent method applicable to asymmetrical faults is proposed considering the operating wind speed and fault severity.
Order execution is a fundamental task in quantitative finance, aiming at finishing acquisition or liquidation for a number of trading orders of the specific assets.
The evolution of wireless networks gravitates towards connected intelligence, a concept that envisions seamless interconnectivity among humans, objects, and intelligence in a hyper-connected cyber-physical world.
Due to the nature of risk management in learning applicable policies, risk-sensitive reinforcement learning (RSRL) has been realized as an important direction.
A timely detection of seizures for newborn infants with electroencephalogram (EEG) has been a common yet life-saving practice in the Neonatal Intensive Care Unit (NICU).
Pronunciation assessment is a major challenge in the computer-aided pronunciation training system, especially at the word (phoneme)-level.
the future weights to update the DNN parameters, making the gradient-based optimizer achieve better convergence and generalization compared to the original optimizer without weight prediction.
In this work, we propose SimuLine, a simulation platform to dissect the evolution of news recommendation ecosystems and present a detailed analysis of the evolutionary process and underlying mechanisms.
In this paper, we propose DiffusionNER, which formulates the named entity recognition task as a boundary-denoising diffusion process and thus generates named entities from noisy spans.
Ranked #2 on Nested Named Entity Recognition on GENIA
However, most of the mixup methods do not consider the varying degree of learning difficulty in different stages of training and generate new samples with one hot labels, resulting in the model over confidence.
The energy consumption for training deep learning models is increasing at an alarming rate due to the growth of training data and model scale, resulting in a negative impact on carbon neutrality.
Moreover, our method is highly efficient and achieves more than 1000 times training speedup compared to the conventional DG methods with fine-tuning a pretrained model.
Ranked #1 on Domain Generalization on PACS
In contrast, though human engineers have the incredible ability to understand tasks and reason about solutions, their experience and knowledge are often sparse and difficult to utilize by quantitative approaches.
Specifically, TriSIM4Rec consists of 1) a dynamic ideal low-pass graph filter to dynamically mine co-occurrence information in user-item interactions, which is implemented by incremental singular value decomposition (SVD); 2) a parameter-free attention module to capture sequential information of user interactions effectively and efficiently; and 3) an item transition matrix to store the transition probabilities of item pairs.
BFReg-NN starts from gene expression data and is capable of merging most existing biological knowledge into the model, including the regulatory relations among genes or proteins (e. g., gene regulatory networks (GRN), protein-protein interaction networks (PPI)) and the hierarchical relations among genes, proteins and pathways (e. g., several genes/proteins are contained in a pathway).
The habitual behavior is generated by using prior distribution of intention, which is goal-less; and the goal-directed behavior is generated by the posterior distribution of intention, which is conditioned on the goal.
Solving complicated AI tasks with different domains and modalities is a key step toward artificial general intelligence.
This paper presents several efficient methods to extract better AD-related cues from high-level acoustic and linguistic features.
Experimental results based on the ACII Challenge 2022 dataset demonstrate the superior performance of the proposed system and the effectiveness of considering multiple relationships using hierarchical regression chain models.
Ranked #1 on Vocal Bursts Intensity Prediction on HUME-VB
In this paper, we consider an offline-to-online setting where the agent is first learned from the offline dataset and then trained online, and propose a framework called Adaptive Policy Learning for effectively taking advantage of offline and online data.
To this end, we introduce RoITr, a Rotation-Invariant Transformer to cope with the pose variations in the point cloud matching task.
To facilitate the research on this problem, a new benchmark dataset named LDV-WebRTC is constructed based on a real-world online streaming system.
Correspondingly, we propose a Dual Stream deep model for Stereotypical Behaviours Detection, DS-SBD, based on the temporal trajectory of human poses and the repetition patterns of human actions.
However, the interaction signal may not be sufficient to accurately characterize user interests and the low-pass filters may ignore the useful information contained in the high-frequency component of the observed signals, resulting in suboptimal accuracy.
Ensemble methods can deliver surprising performance gains but also bring significantly higher computational costs, e. g., can be up to 2048X in large-scale ensemble tasks.
In this paper, to comprehensively enhance the performance of generative graph SSL against other GCL models on both unsupervised and supervised learning tasks, we propose the SeeGera model, which is based on the family of self-supervised variational graph auto-encoder (VGAE).
Practical networks for edge devices adopt shallow depth and small convolutional kernels to save memory and computational cost, which leads to a restricted receptive field.
In this paper, we propose Ranking Distillation one-shot NAS (RD-NAS) to enhance ranking consistency, which utilizes zero-cost proxies as the cheap teacher and adopts the margin ranking loss to distill the ranking knowledge.
Prompt learning is one of the most effective and trending ways to adapt powerful vision-language foundation models like CLIP to downstream datasets by tuning learnable prompt vectors with very few samples.
In this paper, we propose a scalable evaluation methodology (SAIH) for analyzing the AI performance trend of HPC systems with scaling the problem sizes of customized AI applications.
Many Click-Through Rate (CTR) prediction works focused on designing advanced architectures to model complex feature interactions but neglected the importance of feature representation learning, e. g., adopting a plain embedding layer for each feature, which results in sub-optimal feature representations and thus inferior CTR prediction performance.
In order to reduce the complexity of simulation of power systems including large-scale wind farms, it is critical to develop dynamic equivalent methods for wind farms which are applicable to the expected contingency analysis.
Instead of learning fixed triggers for the target classes from the training set, DT-IBA can dynamically generate new triggers for any unknown identities.
By analyzing this dataset, we find that a large improvement in summarization quality can be achieved by providing ground-truth omission labels for the summarization model to recover omission information, which demonstrates the importance of omission detection for omission mitigation in dialogue summarization.
Dynamic interaction graphs have been widely adopted to model the evolution of user-item interactions over time.
A good state representation is crucial to solving complicated reinforcement learning (RL) challenges.
In this paper, we propose a novel composable prompt-based generative framework, which could be applied to a wide range of tasks in the field of Information Extraction.
Vision transformers have shown excellent performance in computer vision tasks.
Specifically, we first pre-train an encoder-decoder framework in an automatic speech recognition (ASR) objective by using speech-to-text dataset, and then fine-tune ASR encoder on the cleft palate dataset for hypernasality estimation.
Then, our proposed CKBG method enhances this lightweight base model by bypassing the original network with ``kernel grafts'', which are extra convolutional kernels containing the prior knowledge of external pretrained image SR models.
In this paper, we aim to develop a technique that can achieve a good trade-off between privacy protection and data usability for person ReID.
Autoregressive generative models are commonly used, especially for those tasks involving sequential data.
Instead of looking at one format, it is a good solution to utilize the formats of VG and RG together to avoid these shortcomings.
Offline reinforcement learning (RL) aims at learning policies from previously collected static trajectory data without interacting with the real environment.
These features make it necessary to apply 3D parallelism, which integrates data parallelism, pipeline model parallelism and tensor model parallelism, to achieve high training efficiency.
Previous works on sentence scoring mainly adopted either causal language modeling (CLM) like GPT or masked language modeling (MLM) like BERT, which have some limitations: 1) CLM only utilizes unidirectional information for the probability estimation of a sentence without considering bidirectional context, which affects the scoring quality; 2) MLM can only estimate the probability of partial tokens at a time and thus requires multiple forward passes to estimate the probability of the whole sentence, which incurs large computation and time cost.
Considering the great performance of ensemble methods on both accuracy and generalization in supervised learning (SL), we design a robust and applicable method named Ensemble Proximal Policy Optimization (EPPO), which learns ensemble policies in an end-to-end manner.
Further, for other homophilous nodes excluded in the neighborhood, they are ignored for information aggregation.
Ranked #1 on Node Classification on pokec
However, most methods only learn a fixed representation for each feature without considering the varying importance of each feature under different contexts, resulting in inferior performance.
The most important motivation of this research is that we can use the straightforward element-wise multiplication operation to replace the image convolution in the frequency domain based on the Cross-Correlation Theorem, which obviously reduces the computation complexity.
Capturing the dynamics in user preference is crucial to better predict user future behaviors because user preferences often drift over time.
To the best of our knowledge, we are the first to make a reasonable dynamic runtime scheduler on the combination of tensor swapping and tensor recomputation without user oversight.
To this end, we propose a cross-metric multi-dimensional root cause analysis method, named CMMD, which consists of two key components: 1) relationship modeling, which utilizes graph neural network (GNN) to model the unknown complex calculation among metrics and aggregation function among dimensions from historical data; 2) root cause localization, which adopts the genetic algorithm to efficiently and effectively dive into the raw data and localize the abnormal dimension(s) once the KPI anomalies are detected.
Fine-tuning pretrained models is a common practice in domain generalization (DG) tasks.
Ranked #6 on Domain Generalization on PACS
Neural architecture search (NAS) has brought significant progress in recent image recognition tasks.
We propose VRL3, a powerful data-driven framework with a simple design for solving challenging visual deep reinforcement learning (DRL) tasks.
Continuous-depth neural networks, such as the Neural Ordinary Differential Equations (ODEs), have aroused a great deal of interest from the communities of machine learning and data science in recent years, which bridge the connection between deep neural networks and dynamical systems.
To address these issues, we propose a RL-enhanced GNN explainer, RG-Explainer, which consists of three main components: starting point selection, iterative graph generation and stopping criteria learning.
In this paper, we propose a novel generative framework for RTS data - RTSGAN to tackle the aforementioned challenges.
EmbRace introduces Sparsity-aware Hybrid Communication, which integrates AlltoAll and model parallelism into data-parallel training, so as to reduce the communication overhead of highly sparse parameters.
In this paper, to eliminate the effort for tuning the momentum-related hyperparameter, we propose a new adaptive momentum inspired by the optimal choice of the heavy ball momentum for quadratic optimization.
Distributed stochastic gradient descent (SGD) approach has been widely used in large-scale deep learning, and the gradient collective method is vital to ensure the training scalability of the distributed deep learning system.
Real-world data is often generated by some complex distribution, which can be approximated by a composition of multiple simpler distributions.
Ensemble learning, which can consistently improve the prediction performance in supervised learning, has drawn increasing attentions in reinforcement learning (RL).
How to make intelligent decisions is a central problem in machine learning and cognitive science.
Specifically, we explicitly consider the difference between the online and offline data and apply an adaptive update scheme accordingly, i. e., a pessimistic update strategy for the offline dataset and a greedy or no pessimistic update scheme for the online dataset.
A good state representation is crucial to reinforcement learning (RL) while an ideal representation is hard to learn only with signals from the RL objective.
The energy consumption of deep learning models is increasing at a breathtaking rate, which raises concerns due to potential negative effects on carbon neutrality in the context of global warming and climate change.
In this paper, we endeavor to obtain a better understanding of GCN-based CF methods via the lens of graph signal processing.
Ranked #5 on Collaborative Filtering on Gowalla
IIB significantly outperforms IRM on synthetic datasets, where the pseudo-invariant features and geometric skews occur, showing the effectiveness of proposed formulation in overcoming failure modes of IRM.
Based on the proposed quality measurement, we propose a deep Tiny Face Quality network (tinyFQnet) to learn a quality prediction function from data.
Hence, the key is to make full use of rich interaction information among streamers, users, and products.
One-bit matrix completion is an important class of positiveunlabeled (PU) learning problems where the observations consist of only positive examples, eg, in top-N recommender systems.
Graph pooling that summaries the information in a large graph into a compact form is essential in hierarchical graph representation learning.
In collaborative filtering (CF) algorithms, the optimal models are usually learned by globally minimizing the empirical risks averaged over all the observed data.
Graph neural networks (GNN) have been proven to be mature enough for handling graph-structured data on node-level graph representation learning tasks.
By monitoring the impact of varying resolution on the quality of high-dimensional video analytics features, hence the accuracy of video analytics results, the proposed end-to-end optimization framework learns the best non-myopic policy for dynamically controlling the resolution of input video streams to globally optimize energy efficiency.
In this paper, we investigate the decentralized statistical inference problem, where a network of agents cooperatively recover a (structured) vector from private noisy samples without centralized coordination.
The stability and generalization of stochastic gradient-based methods provide valuable insights into understanding the algorithmic performance of machine learning models.
In recent years, the Deep Learning Alternating Minimization (DLAM), which is actually the alternating minimization applied to the penalty form of the deep neutral networks training, has been developed as an alternative algorithm to overcome several drawbacks of Stochastic Gradient Descent (SGD) algorithms.
Many works concentrate on how to reduce language bias which makes models answer questions ignoring visual content and language context.
Reference expression comprehension (REC) aims to find the location that the phrase refer to in a given image.
Distant supervision provides a means to create a large number of weakly labeled data at low cost for relation classification.
First, we provide a finite sample bound for both classification and regression problems under Semi-DA.
In real world applications like healthcare, it is usually difficult to build a machine learning prediction model that works universally well across different institutions.
We also provide novel update rules and theoretical convergence analysis.
Typically, the performance of TD(0) and TD($\lambda$) is very sensitive to the choice of stepsizes.
It is challenging for weakly supervised object detection network to precisely predict the positions of the objects, since there are no instance-level category annotations.
In this paper, we propose a general proximal incremental aggregated gradient algorithm, which contains various existing algorithms including the basic incremental aggregated gradient method.
Rapid progress has been made in the field of reading comprehension and question answering, where several systems have achieved human parity in some simplified settings.
Ranked #8 on Question Answering on DROP Test
Traditional approaches to the task of ACE event extraction usually depend on manually annotated data, which is often laborious to create and limited in size.
This paper considers the reading comprehension task in which multiple documents are given as input.
Open-domain targeted sentiment analysis aims to detect opinion targets along with their sentiment polarities from a sentence.
Most teacher-student frameworks based on knowledge distillation (KD) depend on a strong congruent constraint on instance level.
Focusing on discriminate spatiotemporal feature learning, we propose Information Fused Temporal Transformation Network (IF-TTN) for action recognition on top of popular Temporal Segment Network (TSN) framework.
The proposed FSN can make dense predictions at frame-level for a video clip using both spatial and temporal context information.
In this paper, we consider a class of nonconvex problems with linear constraints appearing frequently in the area of image processing.
Collaborative filtering (CF) is a popular technique in today's recommender systems, and matrix approximation-based CF methods have achieved great success in both rating prediction and top-N recommendation tasks.
For objective functions satisfying a relaxed strongly convex condition, the linear convergence is established under weaker assumptions on the step size and inertial parameter than made in the existing literature.
Support vector machines (SVMs) with sparsity-inducing nonconvex penalties have received considerable attentions for the characteristics of automatic classification and variable selection.
Despite that current reading comprehension systems have achieved significant advancements, their promising performances are often obtained at the cost of making an ensemble of numerous models.
Machine reading comprehension with unanswerable questions aims to abstain from answering when no answer can be inferred.
Ranked #11 on Question Answering on SQuAD2.0 dev
LRM is a general method for real-time detectors, as it utilizes the final feature map which exists in all real-time detectors to mine hard examples.
Depthwise convolutions provide significant performance benefits owing to the reduction in both parameters and mult-adds.
However, our studies show that submatrices with different ranks could coexist in the same user-item rating matrix, so that approximations with fixed ranks cannot perfectly describe the internal structures of the rating matrix, therefore leading to inferior recommendation accuracy.
Ranked #4 on Recommendation Systems on MovieLens 10M
A newly proposed work exploits Convolutional-Deconvolutional-Convolutional (CDC) filters to upsample the predictions of 3D ConvNets, making it possible to perform per-frame action predictions and achieving promising performance in terms of temporal action localization.
S-OHEM exploits OHEM with stratified sampling, a widely-adopted sampling technique, to choose the training examples according to this influence during hard example mining, and thus enhance the performance of object detectors.
Object detection is an import task of computer vision. A variety of methods have been proposed, but methods using the weak labels still do not have a satisfactory result. In this paper, we propose a new framework that using the weakly supervised method's output as the pseudo-strong labels to train a strongly supervised model. One weakly supervised method is treated as black-box to generate class-specific bounding boxes on train dataset. A de-noise method is then applied to the noisy bounding boxes. Then the de-noised pseudo-strong labels are used to train a strongly object detection network. The whole framework is still weakly supervised because the entire process only uses the image-level labels. The experiment results on PASCAL VOC 2007 prove the validity of our framework, and we get result 43. 4% on mean average precision compared to 39. 5% of the previous best result and 34. 5% of the initial method, respectively. And this frame work is simple and distinct, and is promising to be applied to other method easily.