Neural graphics primitives, parameterized by fully connected neural networks, can be costly to train and evaluate.
The "Roaring 20s" of visual recognition began with the introduction of Vision Transformers (ViTs), which quickly superseded ConvNets as the state-of-the-art image classification model.
Ranked #1 on Domain Generalization on ImageNet-Sketch (using extra training data)
We present an efficient method for joint optimization of topology, materials and lighting from multi-view image observations.
For the first time, we train a detector with all the twenty-one-thousand classes of the ImageNet dataset and show that it generalizes to new datasets without fine-tuning.
Censorship of Internet content in China is understood to operate through a system of intermediary liability whereby service providers are liable for the content on their platforms.
Structured knowledge grounding (SKG) leverages structured knowledge to complete user requests, such as semantic parsing over databases and question answering over knowledge bases.
Ranked #1 on Task-Oriented Dialogue Systems on KVRET
The second edition of Deep Learning Interviews is home to hundreds of fully-solved problems, from a wide range of key topics in AI.
The Transformer architecture has improved the performance of deep learning models in domains such as Computer Vision and Natural Language Processing.
In this paper, we ask the following question: is it possible to combine the strengths of CNNs and ViTs to build a light-weight and low latency network for mobile vision tasks?
Ranked #380 on Image Classification on ImageNet