By exploiting metric space distances, our network is able to learn local features with increasing contextual scales.
#2 best model for Semantic Segmentation on ShapeNet
We present ViLBERT (short for Vision-and-Language BERT), a model for learning task-agnostic joint representations of image content and natural language.
We show that the basic classification framework alone can be used to tackle some of the most challenging tasks in image synthesis.
#9 best model for Image Generation on CIFAR-10
Many applications of machine learning require models that are human-aligned, i. e., that make decisions based on human-meaningful information about the input.
The learning rate warmup heuristic achieves remarkable success in stabilizing training, accelerating convergence and improving generalization for adaptive stochastic optimization algorithms like RMSprop and Adam.
Under the new schema, the proposed method can achieve superior accuracy (WIDER FACE Val/Test -- Easy: 0. 910/0. 896, Medium: 0. 881/0. 865, Hard: 0. 780/0. 770; FDDB -- discontinuous: 0. 973, continuous: 0. 724).
#6 best model for Face Detection on FDDB