We present a convolution-free approach to video classification built exclusively on self-attention over space and time.
Ranked #1 on
Action Recognition
on Diving-48
ACTION CLASSIFICATION ACTION RECOGNITION VIDEO QUESTION ANSWERING VIDEO UNDERSTANDING
This paper presents a new vision Transformer, called Swin Transformer, that capably serves as a general-purpose backbone for computer vision.
Ranked #1 on
Semantic Segmentation
on ADE20K
IMAGE CLASSIFICATION INSTANCE SEGMENTATION REAL-TIME OBJECT DETECTION SEMANTIC SEGMENTATION
A text-to-speech synthesis system typically consists of multiple stages, such as a text analysis frontend, an acoustic model and an audio synthesis module.
Ranked #4 on
Speech Synthesis
on North American English
We introduce a lightweight unit, conditional channel weighting, to replace costly pointwise (1x1) convolutions in shuffle blocks.
Ranked #11 on
Pose Estimation
on COCO test-dev
In recent years, the use of Generative Adversarial Networks (GANs) has become very popular in generative image modeling.
Recent advances in document image analysis (DIA) have been primarily driven by the application of neural networks.
In this paper, we propose the Self-Attention Generative Adversarial Network (SAGAN) which allows attention-driven, long-range dependency modeling for image generation tasks.
Ranked #11 on
Conditional Image Generation
on ImageNet 128x128
Recent advances in neural-network based generative modeling of speech has shown great potential for speech coding.
Inspired by the common painting process of drawing a draft and revising the details, we introduce a novel feed-forward method named Laplacian Pyramid Network (LapStyle).
The problem of answering questions using knowledge from pre-trained language models (LMs) and knowledge graphs (KGs) presents two challenges: given a QA context (question and answer choice), methods need to (i) identify relevant knowledge from large KGs, and (ii) perform joint reasoning over the QA context and KG.
Ranked #1 on
Common Sense Reasoning
on CommonsenseQA
COMMON SENSE REASONING GRAPH REPRESENTATION LEARNING KNOWLEDGE GRAPHS LANGUAGE MODELLING MULTI-HOP QUESTION ANSWERING QUESTION ANSWERING