Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

lucidrains/imagen-pytorch 23 May 2022

We present Imagen, a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding.

 Ranked #1 on Text-to-Image Generation on COCO (using extra training data)

Language Modelling Zero-Shot Text-to-Image Generation

8.84 stars / hour

SymForce: Symbolic Computation and Code Generation for Robotics

symforce-org/symforce 17 Apr 2022

We present SymForce, a library for fast symbolic computation, code generation, and nonlinear optimization for robotics applications like computer vision, motion planning, and controls.

Code Generation Motion Planning

2.43 stars / hour

The NPU System for the 2020 Personalized Voice Trigger Challenge

PaddlePaddle/PaddleSpeech 26 Feb 2021

This paper describes the system developed by the NPU team for the 2020 personalized voice trigger challenge.

Small-Footprint Keyword Spotting Speaker Verification

2.39 stars / hour

OnePose: One-Shot Object Pose Estimation without CAD Models

zju3dv/OnePose 24 May 2022

We propose a new method named OnePose for object pose estimation.

6D Pose Estimation Graph Attention +1

2.05 stars / hour

MuJoCo: A physics engine for model-based control

deepmind/mujoco IEEE/RSJ IROS 2012

To facilitate optimal control applications and in particular sampling and finite differencing, the dynamics can be evaluated for different states and controls in parallel.

1.42 stars / hour

Inception Transformer

sail-sg/iformer 25 May 2022

Recent studies show that Transformer has strong capability of building long-range dependencies, yet is incompetent in capturing high frequencies that predominantly convey local information.

Image Classification

0.74 stars / hour

Fine-grained Image Captioning with CLIP Reward

j-min/clip-caption-reward 26 May 2022

Toward more descriptive and distinctive caption generation, we propose using CLIP, a multimodal encoder trained on huge image-text pairs from web, to calculate multimodal similarity and use it as a reward function.

Image Captioning Image Retrieval +2

0.70 stars / hour

Ivy: Templated Deep Learning for Inter-Framework Portability

ivy-dl/ivy 4 Feb 2021

We introduce Ivy, a templated Deep Learning (DL) framework which abstracts existing DL frameworks.

0.65 stars / hour

Hierarchical Text-Conditional Image Generation with CLIP Latents

lucidrains/DALLE2-pytorch 13 Apr 2022

Contrastive models like CLIP have been shown to learn robust representations of images that capture both semantics and style.

Ranked #5 on Text-to-Image Generation on COCO (using extra training data)

Conditional Image Generation Zero-Shot Text-to-Image Generation

0.47 stars / hour