Easy-to-use and powerful NLP library with Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including Neural Search, Question Answering, Information Extraction and Sentiment Analysis end-to-end system.
The Transformer architecture has improved the performance of deep learning models in domains such as Computer Vision and Natural Language Processing.
Our key insight is to take advantage of the powerful vision-language model CLIP for supervising neural human generation, in terms of 3D geometry, texture and animation.
Specifically, BEVerse first performs shared feature extraction and lifting to generate 4D BEV representations from multi-timestamp and multi-view images.
We present an efficient method for joint optimization of topology, materials and lighting from multi-view image observations.
We introduce Ivy, a templated Deep Learning (DL) framework which abstracts existing DL frameworks.
Masked AutoEncoder (MAE) has recently led the trends of visual self-supervision area by an elegant asymmetric encoder-decoder design, which significantly optimizes both the pre-training efficiency and fine-tuning accuracy.
We evaluate our two-stream approach for inpainting tasks, where experiments show that it improves both the propagation of features within a single frame as required for image inpainting, as well as their propagation from keyframes to target frames.
Text-to-image generation has traditionally focused on finding better modeling assumptions for training on a fixed dataset.
Ranked #4 on Zero-Shot Text-to-Image Generation on COCO