For tasks related to distant domains with different class label sets, PLMs may memorize non-transferable knowledge for the target domain and suffer from negative transfer.
The importance of weights for each task can be determined either explicitly through learning a task-specific mask during training (e. g., parameter isolation-based approaches) or implicitly by introducing a regularization term (e. g., regularization-based approaches).
We propose a new lexical inference task, Mental and Physical Classification (MPC), to handle commonsense reasoning in a reasoning graph.
In fact, this question was answered ten years ago when IDRec beats MoRec by a strong margin in both recommendation accuracy and efficiency.
Secondly, we propose a new method of feature drift compensation that accommodates feature drift in the backbone when learning new tasks.
In experiments on ImageNet-Subset and ImageNet-1K, we show that our method AttnDistill outperforms existing self-supervised knowledge distillation (SSKD) methods and achieves state-of-the-art k-NN accuracy compared with self-supervised learning (SSL) methods learning from scratch (with the ViT-S model).
Accurate semantic segmentation models typically require significant computational resources, inhibiting their use in practical applications.
The lesion synthesis framework was evaluated for lesion textures, and the synthetic lesions were used to train a lesion segmentation network to further validate the effectiveness of this framework.
Large-scale Protein Language Models (PLMs) have improved performance in protein prediction tasks, ranging from 3D structure prediction to various function predictions.
However, due to the significant imbalance between the amount of annotated data in the source and target domains, usually only the target distribution is aligned to the source domain, leading to adapting unnecessary source specific knowledge to the target domain, i. e., biased domain adaptation.
Neural video compression has emerged as a novel paradigm combining trainable multilayer neural networks and machine learning, achieving competitive rate-distortion (RD) performances, but still remaining impractical due to heavy neural architectures, with large memory and computational demands.
Prompt-based fine-tuning has boosted the performance of Pre-trained Language Models (PLMs) on few-shot text classification by employing task-specific prompts.
Aiming at the accurate and effective coaxiality measurement for twist drill with irregular surface, an optical measurement mechanism is proposed in this paper.
Thus, we further propose a unified framework that allows both translation and autoencoding capabilities in a single codec.
Aiming at a simple, neat redesign of distributed deep learning frameworks for various parallelism paradigms, we present OneFlow, a novel distributed training framework based on an SBP (split, broadcast and partial-value) abstraction and the actor model.
Furthermore, we build an encoder-decoder network based on the proposed continuous CRF graph convolution (CRFConv), in which the CRFConv embedded in the decoding layers can restore the details of high-level features that were lost in the encoding stage to enhance the location ability of the network, thereby benefiting segmentation.
In this paper, we take the advantage of previous pre-trained models (PTMs) and propose a novel Chinese Pre-trained Unbalanced Transformer (CPT).
We propose Local Geometry Code Learning (LGCL), a model that improves the original DeepSDF results by learning from a local shape geometry of the full 3D shape.
Neural image compression (NIC) is a new coding paradigm where coding capabilities are captured by deep models learned from data.
Neural image compression leverages deep neural networks to outperform traditional image codecs in rate-distortion performance.
In our modified U-Net model, the result of background subtraction from other models will be combined with the source image as input for learning the statistical distribution.
Railway systems require regular manual maintenance, a large part of which is dedicated to inspecting track deformation.
Modern object detection methods based on convolutional neural network suffer from severe catastrophic forgetting in learning new classes without original data.
To address this problem, we propose a self-training framework to automatically mine hard examples with pseudo-labels from unannotated videos or images.
Addressing these limitations, we formulate the problem of variable rate-distortion optimization for deep image compression, and propose modulated autoencoders (MAEs), where the representations of a shared autoencoder are adapted to the specific rate-distortion tradeoff via a modulation network.
Many automated processes such as auto-piloting rely on a good semantic segmentation as a critical component.
In this paper, we adapt the geodesic distance-based recursive filter to the sparse data interpolation problem.
In the computer research area, facial expression recognition is a hot research problem.
Random data augmentation is a critical technique to avoid overfitting in training deep neural network models.
Ranked #3 on Pose Estimation on Leeds Sports Poses
We will present experimental results to show that our quality classified framework can accurately classify images based on the type and severity of image degradations and can significantly boost the performances of state-of-the-art face detector and recognizer in dealing with image datasets containing mixed quality images.
In addition, we propose an online clustering method based on binary k-means that is capable of clustering large photo stream on a single machine, and show applications to spam detection and trending photo discovery.
We present an improved Locality Preserving Projections (LPP) method, named Gloablity-Locality Preserving Projections (GLPP), to preserve both the global and local geometric structures of data.
We address the problem of reconstructing and analyzing surveillance videos using compressive sensing.
This paper addresses the problem of automatically recognizing linguistically significant nonmanual expressions in American Sign Language from video.