We present the Pathways Autoregressive Text-to-Image (Parti) model, which generates high-fidelity photorealistic images and supports content-rich synthesis involving complex compositions and world knowledge.
Ranked #1 on
Text-to-Image Generation
on COCO
Reconstructing 3D objects is an important computer vision task that has wide application in AR/VR.
The 2D-3D coordinates and corresponding weights are treated as intermediate variables learned by minimizing the KL divergence between the predicted and target pose distribution.
Ranked #4 on
6D Pose Estimation using RGB
on LineMOD
The modified VTE is termed as Strided Transformer Encoder (STE), which is built upon the outputs of VTE.
Ranked #1 on
3D Human Pose Estimation
on HumanEva-I
In recent years, deep generative models have attracted increasing interest due to their capacity to model complex distributions.
We introduce ArtBench-10, the first class-balanced, high-quality, cleanly annotated, and standardized dataset for benchmarking artwork generation.
Human mobility data contains rich but abundant information, which yields to the comprehensive region embeddings for cross domain tasks.
We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities.
Ranked #1 on
Code Generation
on APPS
We present a generative image inpainting system to complete images with free-form mask and guidance.
Ranked #3 on
Image Inpainting
on Places2 val
We introduce OmniXAI (short for Omni eXplainable AI), an open-source Python library of eXplainable AI (XAI), which offers omni-way explainable AI capabilities and various interpretable machine learning techniques to address the pain points of understanding and interpreting the decisions made by machine learning (ML) in practice.