Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation

SysCV/pcan NeurIPS 2021

We propose Prototypical Cross-Attention Network (PCAN), capable of leveraging rich spatio-temporal information for online multiple object tracking and segmentation.

Multi-Object Tracking and Segmentation Multiple Object Track and Segmentation +1

0.55 stars / hour


BR-IDL/PaddleViT 5 Oct 2021

:robot: PaddleViT: State-of-the-art Visual Transformer and MLP Models for PaddlePaddle 2. 0+

Image Classification Object Detection

0.54 stars / hour

Attention Mechanisms in Computer Vision: A Survey

MenghaoGuo/Awesome-Vision-Attentions 15 Nov 2021

Humans can naturally and effectively find salient regions in complex scenes.

Image Classification Image Generation +4

0.52 stars / hour

TransMVSNet: Global Context-aware Multi-view Stereo Network with Transformers

megviirobot/transmvsnet 29 Nov 2021

We analogize MVS back to its nature of a feature matching task and therefore propose a powerful Feature Matching Transformer (FMT) to leverage intra- (self-) and inter- (cross-) attention to aggregate long-range context information within and across images.

0.51 stars / hour

DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion

DanceTrack/DanceTrack 29 Nov 2021

A typical pipeline for multi-object tracking (MOT) is to use a detector for object localization, and following re-identification (re-ID) for object association.

Multi-Object Tracking Object Detection +1

0.48 stars / hour

MetaFormer is Actually What You Need for Vision

sail-sg/poolformer 22 Nov 2021

Based on this observation, we hypothesize that the general architecture of the transformers, instead of the specific token mixer module, is more essential to the model's performance.

Image Classification Semantic Segmentation

0.48 stars / hour

Robust High-Resolution Video Matting with Temporal Guidance

PeterL1n/RobustVideoMatting 25 Aug 2021

We introduce a robust, real-time, high-resolution human video matting method that achieves new state-of-the-art performance.

Video Matting

0.46 stars / hour

Mesa: A Memory-saving Training Framework for Transformers

zhuang-group/mesa 22 Nov 2021

Specifically, Mesa uses exact activations during forward pass while storing a low-precision version of activations to reduce memory consumption during training.


0.44 stars / hour

HybVIO: Pushing the Limits of Real-time Visual-inertial Odometry

SpectacularAI/HybVIO 22 Jun 2021

We present HybVIO, a novel hybrid approach for combining filtering-based visual-inertial odometry (VIO) with optimization-based SLAM.

0.42 stars / hour

Projected GANs Converge Faster

autonomousvision/projected_gan NeurIPS 2021

Generative Adversarial Networks (GANs) produce high-quality images but are challenging to train.

Image Generation

0.41 stars / hour