no code implementations • • Jiangang Bai, Yujing Wang, Hong Sun, Ruonan Wu, Tianmeng Yang, Pengfei Tang, Defu Cao, Mingliang Zhang1, Yunhai Tong, Yaming Yang, Jing Bai, Ruofei Zhang, Hao Sun, Wei Shen
Large-scale pre-trained language models have attracted extensive attentions in the research community and shown promising results on various tasks of natural language processing.
Pseudo Labeling is a technique used to improve the performance of semi-supervised Graph Neural Networks (GNNs) by generating additional pseudo-labels based on confident predictions.
To the best of our knowledge, this is the first work that explicitly models the layer-wise evolution of attention maps.
First, it is the largest multi-modal conversation dataset by the number of dialogues by 88x.
Ranked #2 on Multimodal Intent Recognition on MMDialog
The two modules can effectively utilize and enhance each other, promoting the model to learn discriminative embeddings.
By leveraging the proposed AFIE, the proposed framework is able to yield a stable importance evaluation of each filter no matter whether the original model is trained fully.
Thus, in this work, we study the application of WS on binary classification tasks with positive labeling sources only.
Thanks to HyperFD, each local task (client) is able to effectively leverage the learning "experience" of previous tasks without uploading raw images to the platform; meanwhile, the meta-feature extractor is continuously learned to better trade off the bias and variance.
Specifically, we creatively propose Multi-granularity Intent Heterogeneous Session Graph which captures the interactions between different granularity intent units and relieves the burden of long-dependency.
In this paper, we conduct theoretical and experimental analysis to explore the fundamental causes of performance degradation in deep GCNs: over-smoothing and gradient vanishing have a mutually reinforcing effect that causes the performance to deteriorate more quickly in deep GCNs.
In such a low-resource setting, we devise a novel conversational agent, Divter, in order to isolate parameters that depend on multimodal dialogues from the entire generation model.
Creating labeled training sets has become one of the major roadblocks in machine learning.
On the one hand, multi-hop-based approaches do not explicitly distinguish relevant nodes from a large number of multi-hop neighborhoods, leading to a severe over-smoothing problem.
To address these problems, we introduce a benchmark platform, WRENCH, for thorough and standardized evaluation of WS approaches.
However, simply integrating KGs in current KG-based RS models is not necessarily a guarantee to improve the recommendation performance, which may even weaken the holistic model capability.
One of the key challenges in Neural Architecture Search (NAS) is to efficiently rank the performances of architectures.
Pre-trained language models like BERT achieve superior performances in various NLP tasks without explicit consideration of syntactic information.
In this paper, we propose a novel and generic mechanism based on evolving attention to improve the performance of transformers.
Instead, we model their dependencies via a chain of prediction models that take previous attention maps as input to predict the attention maps of a new layer through convolutional neural networks.
We add the model designed by AutoADR as a sub-model into the production Ad Relevance model.
However, regarding Heterogeneous Information Network (HIN), existing HIN-oriented GCN methods still suffer from two deficiencies: (1) they cannot flexibly explore all possible meta-paths and extract the most useful ones for a target object, which hinders both effectiveness and interpretability; (2) they often need to generate intermediate meta-path based dense graphs, which leads to high computational complexity.
BERT is a cutting-edge language representation model pre-trained by a large corpus, which achieves superior performances on various natural language understanding tasks.
With the success of deep neural networks, Neural Architecture Search (NAS) as a way of automatic model design has attracted wide attention.
Learning text representation is crucial for text classification and other language related tasks.
no code implementations • 10 Oct 2019 • Xupeng Miao, Nezihe Merve Gürel, Wentao Zhang, Zhichao Han, Bo Li, Wei Min, Xi Rao, Hansheng Ren, Yinan Shan, Yingxia Shao, Yujie Wang, Fan Wu, Hui Xue, Yaming Yang, Zitao Zhang, Yang Zhao, Shuai Zhang, Yujing Wang, Bin Cui, Ce Zhang
Despite the wide application of Graph Convolutional Network (GCN), one major limitation is that it does not benefit from the increasing depth and suffers from the oversmoothing problem.