iBARLE consists of (1) Appearance Variation Generation (AVG) module, which promotes visual appearance domain generalization, (2) Complex Structure Mix-up (CSMix) module, which enhances generalizability w. r. t.
To the best of our knowledge, our work is the pioneer which fills the gap in benchmarks and techniques for practical pedestrian trajectory prediction across different domains.
Semi-supervised domain adaptation (SSDA) is quite a challenging problem requiring methods to overcome both 1) overfitting towards poorly annotated data and 2) distribution shift across domains.
For slow learning of graph similarity, this paper proposes a novel early-fusion approach by designing a co-attention-based feature fusion network on multilevel GNN features.
Current Sign Language Recognition (SLR) methods usually extract features via deep neural networks and suffer overfitting due to limited and noisy data.
In this paper, we propose a novel framework, MemREIN, which considers Memorized, Restitution, and Instance Normalization for cross-domain few-shot learning.
Sign language is commonly used by deaf or speech impaired people to communicate but requires significant effort to master.
Ranked #2 on Sign Language Recognition on WLASL-2000
As texts always contain a large proportion of task-irrelevant words, accurate alignment between aspects and their sentimental descriptions is the most crucial and challenging step.
Moreover, the performance of advanced approaches degrades dramatically for past learned classes (i. e., catastrophic forgetting), due to the irregular and redundant geometric structures of 3D point cloud data.
However, it is still a challenging task since (1) the job title and job transition (job-hopping) data is messy which contains a lot of subjective and non-standard naming conventions for the same position (e. g., Programmer, Software Development Engineer, SDE, Implementation Engineer), (2) there is a large amount of missing title/transition information, and (3) one talent only seeks limited numbers of jobs which brings the incompleteness and randomness modeling job transition patterns.
Extensive experiments on four action datasets illustrate the proposed CAM achieves better results for each view and also boosts multi-view performance.
Inductive and unsupervised graph learning is a critical technique for predictive or information retrieval tasks where label information is difficult to obtain.
Current adversarial adaptation methods attempt to align the cross-domain features, whereas two challenges remain unsolved: 1) the conditional distribution mismatch and 2) the bias of the decision boundary towards the source domain.
Multi-view time series classification (MVTSC) aims to improve the performance by fusing the distinctive temporal information from multiple views.
Specifically, most general-purpose DA methods that struggle for global feature alignment and ignore local geometric information are not suitable for 3D domain alignment.
Ranked #1 on Unsupervised Domain Adaptation on PreSIL to KITTI
Multi-view action recognition targets to integrate complementary information from different views to improve classification performance.
To make up this, we introduce a new, large-scale EV-Action dataset in this work, which consists of RGB, depth, electromyography (EMG), and two skeleton modalities.
Ranked #4 on Multimodal Activity Recognition on EV-Action
We further generalize the framework to handle more than two modalities and missing modalities.
To solve these problems, we propose the very deep residual channel attention networks (RCAN).
Ranked #14 on Image Super-Resolution on BSD100 - 4x upscaling