An approach to tackle this issue is to introduce Positional Encoding (PE) of nodes, and inject it into the input layer, like in Transformers.
In this paper, we unify these two families of approaches from the angle of manifold learning and propose TLDR, a dimensionality reduction method for generic input spaces that is porting the simple self-supervised learning framework of Barlow Twins to a setting where it is hard or impossible to define an appropriate set of distortions by hand.
We present a novel 3D mapping method leveraging the recent progress in neural implicit representation for 3D reconstruction.
Although pre-trained models (PLMs) have achieved remarkable improvements in a wide range of NLP tasks, they are expensive in terms of time and resources.
In this work, we propose FlexConv, a novel convolutional operation with which high bandwidth convolutional kernels of learnable kernel size can be learned at a fixed parameter cost.
Prompt tuning, which only tunes continuous prompts with a frozen language model, substantially reduces per-task storage and memory usage at training.
Since scene context helps reasoning about object semantics, current works focus on models with large capacity and receptive fields that can fully capture the global context of an input 3D scene.
Ranked #1 on Semantic Segmentation on ScanNet
The style-based GAN (StyleGAN) architecture achieved state-of-the-art results for generating high-quality images, but it lacks explicit and precise control over camera poses.
Transformers have attracted increasing interests in computer vision, but they still fall behind state-of-the-art convolutional networks.
Ranked #1 on Image Classification on ImageNet (using extra training data)