Prior work has studied different visual modalities in isolation and developed separate architectures for recognition of images, videos, and 3D data.
Ranked #1 on Semantic Segmentation on NYU Depth v2
Neural graphics primitives, parameterized by fully connected neural networks, can be costly to train and evaluate.
Model pre-training is a cornerstone of modern visual recognition systems.
Ranked #1 on Image Classification on Places365-Standard (using extra training data)
The "Roaring 20s" of visual recognition began with the introduction of Vision Transformers (ViTs), which quickly superseded ConvNets as the state-of-the-art image classification model.
Ranked #1 on Domain Generalization on ImageNet-Sketch (using extra training data)
The ability of Generative Adversarial Networks to encode rich semantics within their latent space has been widely adopted for facial image editing.
The Transformer architecture has improved the performance of deep learning models in domains such as Computer Vision and Natural Language Processing.
Censorship of Internet content in China is understood to operate through a system of intermediary liability whereby service providers are liable for the content on their platforms.
The International Symposium on Biomedical Imaging (ISBI) held a grand challenge to evaluate computational systems for the automated detection of metastatic breast cancer in whole slide images of sentinel lymph node biopsies.
Structured knowledge grounding (SKG) leverages structured knowledge to complete user requests, such as semantic parsing over databases and question answering over knowledge bases.
Ranked #1 on Task-Oriented Dialogue Systems on KVRET