Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

lucidrains/imagen-pytorch 23 May 2022

We present Imagen, a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding.

Language Modelling Zero-Shot Text-to-Image Generation

SymForce: Symbolic Computation and Code Generation for Robotics

symforce-org/symforce 17 Apr 2022

We present SymForce, a library for fast symbolic computation, code generation, and nonlinear optimization for robotics applications like computer vision, motion planning, and controls.

Code Generation Motion Planning

The NPU System for the 2020 Personalized Voice Trigger Challenge

PaddlePaddle/PaddleSpeech 26 Feb 2021

This paper describes the system developed by the NPU team for the 2020 personalized voice trigger challenge.

Small-Footprint Keyword Spotting Speaker Verification

OnePose: One-Shot Object Pose Estimation without CAD Models

zju3dv/OnePose 24 May 2022

We propose a new method named OnePose for object pose estimation.

6D Pose Estimation Graph Attention +1

MuJoCo: A physics engine for model-based control

deepmind/mujoco IEEE/RSJ IROS 2012

To facilitate optimal control applications and in particular sampling and finite differencing, the dynamics can be evaluated for different states and controls in parallel.

Inception Transformer

sail-sg/iformer 25 May 2022

Recent studies show that Transformer has strong capability of building long-range dependencies, yet is incompetent in capturing high frequencies that predominantly convey local information.

Image Classification

Fine-grained Image Captioning with CLIP Reward

j-min/clip-caption-reward 26 May 2022

Toward more descriptive and distinctive caption generation, we propose using CLIP, a multimodal encoder trained on huge image-text pairs from web, to calculate multimodal similarity and use it as a reward function.

Image Captioning Image Retrieval +2

Ivy: Templated Deep Learning for Inter-Framework Portability

ivy-dl/ivy 4 Feb 2021

We introduce Ivy, a templated Deep Learning (DL) framework which abstracts existing DL frameworks.

Hierarchical Text-Conditional Image Generation with CLIP Latents

lucidrains/DALLE2-pytorch 13 Apr 2022

Contrastive models like CLIP have been shown to learn robust representations of images that capture both semantics and style.

Conditional Image Generation Zero-Shot Text-to-Image Generation

