Prior work has studied different visual modalities in isolation and developed separate architectures for recognition of images, videos, and 3D data.
Ranked #1 on Semantic Segmentation on NYU Depth v2
Model pre-training is a cornerstone of modern visual recognition systems.
Ranked #1 on Image Classification on Places365-Standard (using extra training data)
Neural graphics primitives, parameterized by fully connected neural networks, can be costly to train and evaluate.
The "Roaring 20s" of visual recognition began with the introduction of Vision Transformers (ViTs), which quickly superseded ConvNets as the state-of-the-art image classification model.
Ranked #1 on Domain Generalization on ImageNet-Sketch (using extra training data)
We present an efficient method for joint optimization of topology, materials and lighting from multi-view image observations.
Censorship of Internet content in China is understood to operate through a system of intermediary liability whereby service providers are liable for the content on their platforms.
The Transformer architecture has improved the performance of deep learning models in domains such as Computer Vision and Natural Language Processing.
Structured knowledge grounding (SKG) leverages structured knowledge to complete user requests, such as semantic parsing over databases and question answering over knowledge bases.
Ranked #1 on Task-Oriented Dialogue Systems on KVRET
The International Symposium on Biomedical Imaging (ISBI) held a grand challenge to evaluate computational systems for the automated detection of metastatic breast cancer in whole slide images of sentinel lymph node biopsies.