The 2D-3D coordinates and corresponding weights are treated as intermediate variables learned by minimizing the KL divergence between the predicted and target pose distribution.
Ranked #4 on
6D Pose Estimation using RGB
on LineMOD
We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities.
Ranked #1 on
Code Generation
on APPS
In recent years, deep generative models have attracted increasing interest due to their capacity to model complex distributions.
We present a generative image inpainting system to complete images with free-form mask and guidance.
Ranked #3 on
Image Inpainting
on Places2 val
We present the Pathways Autoregressive Text-to-Image (Parti) model, which generates high-fidelity photorealistic images and supports content-rich synthesis involving complex compositions and world knowledge.
Ranked #1 on
Text-to-Image Generation
on COCO
We introduce ArtBench-10, the first class-balanced, high-quality, cleanly annotated, and standardized dataset for benchmarking artwork generation.
We introduce \textit{Nocturne}, a new 2D driving simulator for investigating multi-agent coordination under partial observability.
Text-to-image generation has traditionally focused on finding better modeling assumptions for training on a fixed dataset.
Ranked #12 on
Text-to-Image Generation
on COCO
(using extra training data)
In this paper, we introduce an enormous dataset HaGRID (HAnd Gesture Recognition Image Dataset) for hand gesture recognition (HGR) systems.
However, we show that directly applying such models to recognize image regions for object detection leads to poor performance due to a domain shift: CLIP was trained to match an image as a whole to a text description, without capturing the fine-grained alignment between image regions and text spans.