Graph Neural Networks with Learnable Structural and Positional Representations

An approach to tackle this issue is to introduce Positional Encoding (PE) of nodes, and inject it into the input layer, like in Transformers.

Knowledge Graphs Recommendation Systems

Taming Visually Guided Sound Generation

In this work, we propose a single model capable of generating visually relevant, high-fidelity sounds prompted with a set of frames from open-domain videos in less time than it takes to play it on a single GPU.

Audio Generation

TLDR: Twin Learning for Dimensionality Reduction

In this paper, we unify these two families of approaches from the angle of manifold learning and propose TLDR, a dimensionality reduction method for generic input spaces that is porting the simple self-supervised learning framework of Barlow Twins to a setting where it is hard or impossible to define an appropriate set of distortions by hand.

Dimensionality Reduction Representation Learning +1

NeuralBlox: Real-Time Neural Representation Fusion for Robust Volumetric Mapping

We present a novel 3D mapping method leveraging the recent progress in neural implicit representation for 3D reconstruction.

3D Reconstruction

Mengzi: Towards Lightweight yet Ingenious Pre-trained Models for Chinese

Although pre-trained models (PLMs) have achieved remarkable improvements in a wide range of NLP tasks, they are expensive in terms of time and resources.

FlexConv: Continuous Kernel Convolutions with Differentiable Kernel Sizes

In this work, we propose FlexConv, a novel convolutional operation with which high bandwidth convolutional kernels of learnable kernel size can be learned at a fixed parameter cost.

Sequential Image Classification Time Series

P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks

Prompt tuning, which only tunes continuous prompts with a frozen language model, substantially reduces per-task storage and memory usage at training.

Language Modelling

Mix3D: Out-of-Context Data Augmentation for 3D Scenes

Since scene context helps reasoning about object semantics, current works focus on models with large capacity and receptive fields that can fully capture the global context of an input 3D scene.

3D Semantic Segmentation

CIPS-3D: A 3D-Aware Generator of GANs Based on Conditionally-Independent Pixel Synthesis

The style-based GAN (StyleGAN) architecture achieved state-of-the-art results for generating high-quality images, but it lacks explicit and precise control over camera poses.

Image Generation Transfer Learning

CoAtNet: Marrying Convolution and Attention for All Data Sizes

Transformers have attracted increasing interests in computer vision, but they still fall behind state-of-the-art convolutional networks.

Image Classification

