MetaFormer is Actually What You Need for Vision

sail-sg/poolformer 22 Nov 2021

Based on this observation, we hypothesize that the general architecture of the transformers, instead of the specific token mixer module, is more essential to the model's performance.

Image Classification Semantic Segmentation

CLIPstyler: Image Style Transfer with a Single Text Condition

paper11667/clipstyler 1 Dec 2021

In order to deal with such applications, we propose a new framework that enables a style transfer `without' a style image, but only with a text description of the desired style.

Style Transfer

Attention-Guided Generative Adversarial Networks for Unsupervised Image-to-Image Translation

vict0rsch/ArxivTools 28 Mar 2019

To handle the limitation, in this paper we propose a novel Attention-Guided Generative Adversarial Network (AGGAN), which can detect the most discriminative semantic object and minimize changes of unwanted part for semantic manipulation problems without using extra data and models.

Translation Unsupervised Image-To-Image Translation

DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion

DanceTrack/DanceTrack 29 Nov 2021

A typical pipeline for multi-object tracking (MOT) is to use a detector for object localization, and following re-identification (re-ID) for object association.

Multi-Object Tracking Object Detection +1

DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation

lhoyer/DAFormer 29 Nov 2021

As the influence of recent network architectures has not been systematically studied, we first benchmark different network architectures for UDA and then propose a novel UDA method, DAFormer, based on the benchmark results.

Semantic Segmentation Synthetic-to-Real Translation +1

Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World

vict0rsch/PaperMemory 20 Mar 2017

Bridging the 'reality gap' that separates simulated robotics from experiments on hardware could accelerate robotic research through improved data availability.

Object Localization

TransMVSNet: Global Context-aware Multi-view Stereo Network with Transformers

megviirobot/transmvsnet 29 Nov 2021

We analogize MVS back to its nature of a feature matching task and therefore propose a powerful Feature Matching Transformer (FMT) to leverage intra- (self-) and inter- (cross-) attention to aggregate long-range context information within and across images.

Efficient conformer: Progressive downsampling and grouped attention for automatic speech recognition

burchim/efficientconformer 31 Aug 2021

The recently proposed Conformer architecture has shown state-of-the-art performances in Automatic Speech Recognition by combining convolution with attention to model both local and global dependencies.

automatic-speech-recognition Language Modelling +1

Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data

endlesssora/deceived NeurIPS 2021

Generative adversarial networks (GANs) typically require ample data for training in order to synthesize high-fidelity images.

Mesa: A Memory-saving Training Framework for Transformers

zhuang-group/mesa 22 Nov 2021

Specifically, Mesa uses exact activations during forward pass while storing a low-precision version of activations to reduce memory consumption during training.


