We present the Pathways Autoregressive Text-to-Image (Parti) model, which generates high-fidelity photorealistic images and supports content-rich synthesis involving complex compositions and world knowledge.
Ranked #1 on Text-to-Image Generation on COCO
The modified VTE is termed as Strided Transformer Encoder (STE), which is built upon the outputs of VTE.
Ranked #1 on 3D Human Pose Estimation on HumanEva-I
Human mobility data contains rich but abundant information, which yields to the comprehensive region embeddings for cross domain tasks.
The 2D-3D coordinates and corresponding weights are treated as intermediate variables learned by minimizing the KL divergence between the predicted and target pose distribution.
Ranked #4 on 6D Pose Estimation using RGB on LineMOD
We introduce OmniXAI (short for Omni eXplainable AI), an open-source Python library of eXplainable AI (XAI), which offers omni-way explainable AI capabilities and various interpretable machine learning techniques to address the pain points of understanding and interpreting the decisions made by machine learning (ML) in practice.
To address this issue, we introduce a novel and theoretically sound method, named Robust Temporal Feature Magnitude learning (RTFM), which trains a feature magnitude learning function to effectively recognise the positive instances, substantially improving the robustness of the MIL approach to the negative instances from abnormal videos.