There lacks an efficient method to help users conduct gesture exploration, which is challenging due to the intrinsically temporal evolution of gestures and their complex correlation to speech content.
For the graph construction, a two-stage graph diversification scheme is proposed, which makes a good trade-off between the efficiency and reachability for the search procedure that builds upon it.
Anomaly detection is widely used to distinguish system anomalies by analyzing the temporal and spatial features of wireless sensor network (WSN) data streams; it is one of critical technique that ensures the reliability of WSNs.
Natural language interfaces (NLIs) provide users with a convenient way to interactively analyze data through natural language queries.
To this end, we believe that local attention is crucial to strike the balance between computational efficiency and modeling capacity.
Ranked #1 on Image Generation on CelebA-HQ 1024x1024
In this paper, we show our solution to the Google Landmark Recognition 2021 Competition.
However, those models do not consider the numerical properties of numbers and cannot perform robustly on numerical reasoning tasks (e. g., math word problems and measurement estimation).
Despite being a critical communication skill, grasping humor is challenging -- a successful use of humor requires a mixture of both engaging content build-up and an appropriate vocal delivery (e. g., pause).
Much research focuses on modeling the complex intra- and inter-modal interactions between different communication channels.
This paper presents our systems for the three Subtasks of SemEval Task4: Reading Comprehension of Abstract Meaning (ReCAM).
Ranked #1 on Reading Comprehension on ReCAM (using extra training data)
In this paper, we rely on representative prototypes, the feature centroids of classes, to address the two issues for unsupervised domain adaptation.
Ranked #7 on Image-to-Image Translation on SYNTHIA-to-Cityscapes
In this paper, we classify affine Ricci solitons associated to canonical connections and Kobayashi-Nomizu connections and perturbed canonical connections and perturbed Kobayashi-Nomizu connections on three-dimensional Lorentzian Lie groups with some product structure.
Differential Geometry 53C40, 53C42
In this paper, we try to eliminate semantic ambiguity in skip connection operations by adding attention gates (AGs), and use attention mechanisms to combine local features with their corresponding global dependencies, explicitly model the dependencies between channels and use multi-scale predictive fusion to utilize global information at different scales.
Starting from the first region, the feedforward control parameters are learned simultaneously with the low order plant model in the same region and then moves to the next region until all the regions are performed.
Two case studies and interviews with domain experts demonstrate the effectiveness of GNNLens in facilitating the understanding of GNN models and their errors.
Modern neural machine translation (NMT) models employ a large number of parameters, which leads to serious over-parameterization and typically causes the underutilization of computational resources.
Fusing multi-modality medical images, such as MR and PET, can provide various anatomical or functional information about human body.
Specifically, we model the relationship between students and questions using student interactions to construct the student-interaction-question network and further present a new GNN model, called R^2GCN, which intrinsically works for the heterogeneous networks, to achieve generalizable student performance prediction in interactive online question pools.
The growing use of automated decision-making in critical applications, such as crime prediction and college admission, has raised questions about fairness in machine learning.
These optimization problems usually have complex properties, such as non-convexity and NP-hardness, which may not be addressed by the traditional convex optimization-based solutions.
Pixel-wise operations between polarimetric images are important for processing polarization information.
In the occluded region, as depth and camera motion can provide more reliable motion estimation, they can be used to instruct unsupervised learning of optical flow.
The modulation of voice properties, such as pitch, volume, and speed, is crucial for delivering a successful public speech.
The key challenge of multi-domain translation lies in simultaneously encoding both the general knowledge shared across domains and the particular knowledge distinctive to each domain in a unified model.
With increasing popularity in online learning, a surge of E-learning platforms have emerged to facilitate education opportunities for k-12 (from kindergarten to 12th grade) students and with this, a wealth of information on their learning logs are getting recorded.
Our visualization system features a channel coherence view and a sentence clustering view that together enable users to obtain a quick overview of emotion coherence and its temporal evolution.
Zero-shot translation, translating between language pairs on which a Neural Machine Translation (NMT) system has never been trained, is an emergent property when training the system in multilingual settings.
To leverage the data and resources, a new machine learning paradigm, called edge learning, has emerged where learning algorithms are deployed at the edge for providing fast and intelligent services to mobile users.
Inspired by the conditional integration idea in classical control society, we propose SPI-Optimizer, an integral-Separated PI controller based optimizer WITHOUT introducing extra hyperparameter.
We frame low-resource translation as a meta-learning problem, and we learn to adapt to low-resource languages based on multilingual high-resource language tasks.
In this paper, we present a novel deep learning-based approach to evaluate the readability of graph layouts by directly using graph images.
In this paper, we present a multi-scale Fully Convolutional Networks (MSP-RFCN) to robustly detect and classify human hands under various challenging conditions.
Specifically, the former is devised to find a pair of individuals with the minimum vector angle, which means that these two individuals share the most similar search direction.
In this paper, we extend an attention-based neural machine translation (NMT) model by allowing it to access an entire training set of parallel sentence pairs even after training.
In the evolutionary computation research community, the performance of most evolutionary algorithms (EAs) depends strongly on their implemented coordinate system.
This paper proposes a convolutional neural network (CNN)-based method that learns traffic as images and predicts large-scale, network-wide traffic speed with a high accuracy.
Multi-objective optimisation is regarded as one of the most promising ways for dealing with constrained optimisation problems in evolutionary optimisation.
In this work, we propose to assess classifiers in terms of normalized mutual information (NI), which is novel and well defined in a compact range for classifier evaluation.