Prior work has studied different visual modalities in isolation and developed separate architectures for recognition of images, videos, and 3D data.
Ranked #1 on Semantic Segmentation on NYU Depth v2
Neural graphics primitives, parameterized by fully connected neural networks, can be costly to train and evaluate.
Model pre-training is a cornerstone of modern visual recognition systems.
Ranked #1 on Image Classification on Places365-Standard (using extra training data)
The "Roaring 20s" of visual recognition began with the introduction of Vision Transformers (ViTs), which quickly superseded ConvNets as the state-of-the-art image classification model.
Ranked #1 on Domain Generalization on ImageNet-Sketch (using extra training data)
Censorship of Internet content in China is understood to operate through a system of intermediary liability whereby service providers are liable for the content on their platforms.
The Transformer architecture has improved the performance of deep learning models in domains such as Computer Vision and Natural Language Processing.
We present an efficient method for joint optimization of topology, materials and lighting from multi-view image observations.
The International Symposium on Biomedical Imaging (ISBI) held a grand challenge to evaluate computational systems for the automated detection of metastatic breast cancer in whole slide images of sentinel lymph node biopsies.
The ability of Generative Adversarial Networks to encode rich semantics within their latent space has been widely adopted for facial image editing.