Thanks to HyperFD, each local task (client) is able to effectively leverage the learning "experience" of previous tasks without uploading raw images to the platform; meanwhile, the meta-feature extractor is continuously learned to better trade off the bias and variance.
Specifically, we creatively propose Multi-granularity Intent Heterogeneous Session Graph which captures the interactions between different granularity intent units and relieves the burden of long-dependency.
Ranked #1 on Session-Based Recommendations on yoochoose1/4
Analytically, 1) skipping the convolutional operation prevents the features from losing diversity; and 2) the "skipped" nodes enable gradients to be directly passed back, thus mitigating the gradient vanishing and model weights over-decaying issues.
In such a low-resource setting, we devise a novel conversational agent, Divter, in order to isolate parameters that depend on multimodal dialogues from the entire generation model.
Creating labeled training sets has become one of the major roadblocks in machine learning.
On the one hand, multi-hop-based approaches do not explicitly distinguish relevant nodes from a large number of multi-hop neighborhoods, leading to a severe over-smoothing problem.
To address these problems, we introduce a benchmark platform, WRENCH, for thorough and standardized evaluation of WS approaches.
However, simply integrating KGs in current KG-based RS models is not necessarily a guarantee to improve the recommendation performance, which may even weaken the holistic model capability.
Architecture performance predictors have been widely used in neural architecture search (NAS).
Pre-trained language models like BERT achieve superior performances in various NLP tasks without explicit consideration of syntactic information.
In this paper, we propose a novel and generic mechanism based on evolving attention to improve the performance of transformers.
Instead, we model their dependencies via a chain of prediction models that take previous attention maps as input to predict the attention maps of a new layer through convolutional neural networks.
We add the model designed by AutoADR as a sub-model into the production Ad Relevance model.
However, regarding Heterogeneous Information Network (HIN), existing HIN-oriented GCN methods still suffer from two deficiencies: (1) they cannot flexibly explore all possible meta-paths and extract the most useful ones for a target object, which hinders both effectiveness and interpretability; (2) they often need to generate intermediate meta-path based dense graphs, which leads to high computational complexity.
BERT is a cutting-edge language representation model pre-trained by a large corpus, which achieves superior performances on various natural language understanding tasks.
With the success of deep neural networks, Neural Architecture Search (NAS) as a way of automatic model design has attracted wide attention.
Learning text representation is crucial for text classification and other language related tasks.
no code implementations • 10 Oct 2019 • Xupeng Miao, Nezihe Merve Gürel, Wentao Zhang, Zhichao Han, Bo Li, Wei Min, Xi Rao, Hansheng Ren, Yinan Shan, Yingxia Shao, Yujie Wang, Fan Wu, Hui Xue, Yaming Yang, Zitao Zhang, Yang Zhao, Shuai Zhang, Yujing Wang, Bin Cui, Ce Zhang
Despite the wide application of Graph Convolutional Network (GCN), one major limitation is that it does not benefit from the increasing depth and suffers from the oversmoothing problem.