LV-ViT

Introduced by Jiang et al. in All Tokens Matter: Token Labeling for Training Better Vision Transformers

LV-ViT is a type of vision transformer that uses token labelling as a training objective. Different from the standard training objective of ViTs that computes the classification loss on an additional trainable class token, token labelling takes advantage of all the image patch tokens to compute the training loss in a dense manner. Specifically, token labeling reformulates the image classification problem into multiple token-level recognition problems and assigns each patch token with an individual location-specific supervision generated by a machine annotator.

Source: All Tokens Matter: Token Labeling for Training Better Vision Transformers

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Image Classification	3	27.27%
Efficient ViTs	2	18.18%
Computational Efficiency	1	9.09%
Token Reduction	1	9.09%
Analogical Similarity	1	9.09%
Action Recognition	1	9.09%
General Classification	1	9.09%
Semantic Segmentation	1	9.09%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Vision Transformers

Image Models