DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation

XavierXiao/Dreambooth-Stable-Diffusion 25 Aug 2022

Once the subject is embedded in the output domain of the model, the unique identifier can then be used to synthesize fully-novel photorealistic images of the subject contextualized in different scenes.

Image Generation

Human Motion Diffusion Model

guytevet/motion-diffusion-model 29 Sep 2022

In this paper, we introduce Motion Diffusion Model (MDM), a carefully adapted classifier-free diffusion-based generative model for the human motion domain.

Efficient Few-Shot Learning Without Prompts

huggingface/setfit 22 Sep 2022

This simple framework requires no prompts or verbalizers, and achieves high accuracy with orders of magnitude less parameters than existing techniques.

Few-Shot Learning

Learning to Learn with Generative Models of Neural Network Checkpoints

wpeebles/g.pt 26 Sep 2022

We explore a data-driven approach for learning to optimize neural networks.

Robust Speech Recognition via Large-Scale Weak Supervision

openai/whisper Preprint 2022

We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet.

Robust Speech Recognition

KILT: a Benchmark for Knowledge Intensive Language Tasks

facebookresearch/editeval NAACL 2021

We test both task-specific and general baselines, evaluating downstream performance in addition to the ability of the models to provide provenance.

Entity Linking Fact Checking +4

LAVIS: A Library for Language-Vision Intelligence

salesforce/lavis 15 Sep 2022

We introduce LAVIS, an open-source deep learning library for LAnguage-VISion research and applications.

Image Captioning Image Retrieval +6

TVLT: Textless Vision-Language Transformer

zinengtang/tvlt 28 Sep 2022

In this work, we present the Textless Vision-Language Transformer (TVLT), where homogeneous transformer blocks take raw visual and audio inputs for vision-and-language representation learning with minimal modality-specific design, and do not use text-specific modules such as tokenization or automatic speech recognition (ASR).

Automatic Speech Recognition Image Retrieval +6

towhee-io/towhee 28 Jul 2022

Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.

Image Classification object-detection +2

Zero-Shot Text-Guided Object Generation with Dream Fields

shengyu-meng/dreamfields-3D CVPR 2022

Our method, Dream Fields, can generate the geometry and color of a wide range of objects without 3D supervision.

Neural Rendering

