We present Imagen, a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding.
Ranked #1 on Text-to-Image Generation on COCO (using extra training data)
This paper describes the system developed by the NPU team for the 2020 personalized voice trigger challenge.
We present SymForce, a library for fast symbolic computation, code generation, and nonlinear optimization for robotics applications like computer vision, motion planning, and controls.
Toward more descriptive and distinctive caption generation, we propose using CLIP, a multimodal encoder trained on huge image-text pairs from web, to calculate multimodal similarity and use it as a reward function.
We introduce Ivy, a templated Deep Learning (DL) framework which abstracts existing DL frameworks.
Masked AutoEncoder (MAE) has recently led the trends of visual self-supervision area by an elegant asymmetric encoder-decoder design, which significantly optimizes both the pre-training efficiency and fine-tuning accuracy.
Ranked #16 on Object Detection on COCO minival