As a successful approach to self-supervised learning, contrastive learning aims to learn invariant information shared among distortions of the input sample.
To this end, we propose a methodology, specifically consistency and complementarity network (CoCoNet), which avails of strict global inter-view consistency and local cross-view complementarity preserving regularization to comprehensively learn representations from multiple views.
Although artificial intelligence (AI) has made significant progress in understanding molecules in a wide range of fields, existing models generally acquire the single cognitive ability from the single molecular modality.
Contrastive learning (CL)-based self-supervised learning models learn visual representations in a pairwise manner.
In this paper, we explore a potential visual analogue of words, i. e., semantic parts, and we integrate semantic information into the training process of MAE by proposing a Semantic-Guided Masking strategy.
Most state-of-the-art methods focus on designing temporal convolution-based models, but the limitations on modeling long-term temporal dependencies and inflexibility of temporal convolutions limit the potential of these models.
Ranked #1 on Action Segmentation on 50Salads
Vision-language models are pre-trained by aligning image-text pairs in a common space so that the models can deal with open-set visual concepts by learning semantic information from textual labels.
In addition, to cope with new attacks in real-world deployment, we propose an Active Adaption Resampling (AAR) method, which can better discover the distribution of unseen sample data and adapt the parameters of encoder.
We perform a meta learning technique to build the augmentation generator that updates its network parameters by considering the performance of the encoder.
We conduct theoretical analysis on the robustness of the proposed RLPGA and prove that the robust informative-theoretic-based loss and the local preserving module are beneficial to reduce the empirical risk of the target domain.
In this way, if a key segment has a high correlation score with the query segment, its successive segment contributes more to the prediction of the query segment.
Long-term time series forecasting is widely used in real-world applications such as financial investment, electricity management and production planning.
Explainable distances for sequence data depend on temporal alignment to tackle sequences with different lengths and local variances.
In this paper, we give an analysis of the existing representation learning framework of unsupervised domain adaptation and show that the learned feature representations of the source domain samples are with discriminability, compressibility, and transferability.
To this end, we rethink the existing multi-view learning paradigm from the information theoretical perspective and then propose a novel information theoretical framework for generalized multi-view learning.
Supervised dimensionality reduction for sequence data projects the observations in sequences onto a low-dimensional subspace to better separate different sequence classes.
Low-dimensional discriminative representations enhance machine learning methods in both performance and complexity, motivating supervised dimensionality reduction (DR) that transforms high-dimensional data to a discriminative subspace.
Multi-shot person re-identification (MsP-RID) utilizes multiple images from the same person to facilitate identification.
Many discriminant analysis methods such as LDA and HLDA actually maximize the average pairwise distances between classes, which often causes the class separation problem.