In this expository article, we provide a self-contained overview of the notion of convolution embedded in different theories: from the classical Fourier theory to the theory of algebraic signal processing.
To support cost-effective language inference in multilingual settings, we propose AMTSS, an adaptive multi-teacher single-student distillation framework, which allows distilling knowledge from multiple teachers to a single student.
We introduce the key notion of label non-uniformity, which is derived from the Wasserstein distance between the softmax distribution of the logits and the uniform distribution.
In graph neural networks (GNNs), both node features and labels are examples of graph signals, a key notion in graph signal processing (GSP).
However, a graph can have hyperbolic and Euclidean geometries at different regions of the graph.
When sampling multiple signals, the correlation between the signals can be exploited to reduce the overall number of samples.
In this paper, we present Simplicial Graph Attention Network (SGAT), a simplicial complex approach to represent such high-order interactions by placing features from non-target nodes on the simplices.
Sampling and interpolation have been extensively studied, in order to reconstruct or estimate the entire graph signal from the signal values on a subset of vertexes, of which most achievements are about continuous signals.
In order to better understand the reason behind model behaviors (i. e., making predictions), most recent works have exploited generative models to provide complementary explanations.
Pre-trained Language Models (PLMs) have achieved great success on Machine Reading Comprehension (MRC) over the past few years.
Experimental results demonstrate that our adapted margin cosine loss can greatly enhance the baseline models with an absolute performance gain of 15\% on average, strongly verifying the potential of tackling the language prior problem in VQA from the angle of the answer feature space learning.
More specifically, we propose a reinforced selector to extract useful PRF terms to enhance response candidates and a BERT-based response ranker to rank the PRF-enhanced responses.
More concretely, we first introduce a novel graph-based iterative knowledge retrieval module, which iteratively retrieves concepts and entities related to the given question and its choices from multiple knowledge sources.
Pre-sales customer service is of importance to E-commerce platforms as it contributes to optimizing customers' buying process.
More specifically, we take advantage of a decision model to help the dialogue system decide whether to wait or answer.
How to build a high-quality multi-domain dialogue system is a challenging work due to its complicated and entangled dialogue state space among each domain, which seriously limits the quality of dialogue policy, and further affects the generated response.
And the arbitrator decides whether to wait or to make a response to the user directly.
Information-seeking conversation system aims at satisfying the information needs of users through conversations.
The key idea of the proposed approach is to use a Forward Transformation to transform dense representations to sparse representations.
How to incorporate external knowledge into a neural dialogue model is critically important for dialogue systems to behave like real humans.
Positive-unlabeled (PU) learning learns a binary classifier using only positive and unlabeled examples without labeled negative examples.
In this paper, we present a fast and strong neural approach for general purpose text matching applications.
Ranked #5 on Natural Language Inference on SciTail
Then, we devise a mechanism to identify the relevant information from the noise-prone review snippets and incorporate this information to guide the answer generation.
In view of the huge success of convolution neural networks (CNN) for image classification and object recognition, there have been attempts to generalize the method to general graph-structured data.
Specifically, the data selector "acts" on the source domain data to find a subset for optimization of the TL model, and the performance of the TL model can provide "rewards" in turn to update the selector.
Our approach is extended from a basic monolingual STS framework to a shared multilingual encoder pretrained with translation task to incorporate rich-resource language data.
In the era of big data, focused analysis for diverse topics with a short response time becomes an urgent demand.
Building multi-turn information-seeking conversation systems is an important and challenging research topic.
Dialogue management (DM) decides the next action of a dialogue system according to the current dialogue state, and thus plays a central role in task-oriented dialogue systems.