We project the freely available 3D segmentation annotations onto the 2D plane and leverage the corresponding 2D semantic maps as the supervision signal, significantly enhancing the semantic awareness of multi-view detectors.
However, Large language models have two prominent characteristics compared to smaller models: (1) Most of compression algorithms require finetuning or even retraining the model after compression.
As a result, we dissect the preservation of patch-wise spatial information in CLIP and proposed a local-to-global framework to obtain image tags.
In the context of autonomous driving, the significance of effective feature learning is widely acknowledged.
Unsupervised Video Object Segmentation (VOS) aims at identifying the contours of primary foreground objects in videos without any prior knowledge.
Unsupervised domain adaptation (UDA) methods facilitate the transfer of models to target domains without labels.
In this paper, we restudy the hyper-parameter temperature and figure out its incapability to distill the knowledge from each sample sufficiently when it is a single value.
Specifically, we employ an asymmetric encoder to learn the compensating features of the RGB and the thermal images.
Ranked #16 on Thermal Image Segmentation on MFN Dataset
The key to associating the two different representations is our introduced input-dependent Query Initialization module, which could efficiently generate reference points and content queries.
To tackle these problems, we propose Asymmetric Parallel Point Transformer (APPT).
One of the challenges in federated learning is the non-independent and identically distributed (non-iid) characteristics between heterogeneous devices, which cause significant differences in local updates and affect the performance of the central server.
On the one hand, CEL blends each token with multiple patches of different scales, providing the self-attention module itself with cross-scale features.
Unfortunately, the network cannot accurately distinguish different depths from such non-discriminative visual features, resulting in unstable depth training.
To efficiently generate high-quality segmentation masks from CLIP, we propose a novel WSSS framework called CLIP-ES.
Ranked #10 on Weakly-Supervised Semantic Segmentation on COCO 2014 val
In contrast to previous 3D MAE frameworks, which either design a complex decoder to infer masked information from maintained regions or adopt sophisticated masking strategies, we instead propose a much simpler paradigm.
Most of these methods fail to achieve realistic reconstruction when only a single image is available.
When training a teacher-student semi-supervised framework, we randomly select gt samples and pseudo samples to both labeled frames and unlabeled frames, making a strong data augmentation for them.
Various training criteria for these auxiliary outliers are proposed based on heuristic intuitions.
In this paper, we conduct theoretical and experimental analysis to explore the fundamental causes of performance degradation in deep GCNs: over-smoothing and gradient vanishing have a mutually reinforcing effect that causes the performance to deteriorate more quickly in deep GCNs.
On the one hand, CEL blends each embedding with multiple patches of different scales, providing the self-attention module itself with cross-scale features.
Ranked #41 on Semantic Segmentation on ADE20K val
Estimating the MI for a subset of features is often intractable.
The effectiveness of gene expression pattern annotation relies on the quality of feature representation.
Based on our theoretical analysis, we propose to first learn the gradient field of the distance function and then learn the distance function itself.
MTVFL has the following key properties: (1) the vector fields we learned are close to the gradient fields of the prediction functions; (2) within each task, the vector field is required to be as parallel as possible which is expected to span a low dimensional subspace; (3) the vector fields from all tasks share a low dimensional subspace.
To achieve this goal, we show that the second order smoothness measures the linearity of the function, and the gradient field of a linear function has to be a parallel vector field.