Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation

showlab/Tune-A-Video 22 Dec 2022

To reproduce the success of text-to-image (T2I) generation, recent works in text-to-video (T2V) generation employ large-scale text-video dataset for fine-tuning.

Style Transfer Text-to-Video Generation +1

NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality

heatz123/naturalspeech 9 May 2022

In this paper, we answer these questions by first defining the human-level quality based on the statistical significance of subjective measure and introducing appropriate guidelines to judge it, and then developing a TTS system called NaturalSpeech that achieves human-level quality on a benchmark dataset.

Speech Synthesis Text-To-Speech Synthesis

PyGlove: Efficiently Exchanging ML Ideas as Code

google/pyglove 3 Feb 2023

We also perform a case study of a large codebase where PyGlove led to an 80% reduction in the number of lines of code.

OpenSpike: An OpenRAM SNN Accelerator

sfmth/openspike 2 Feb 2023

This paper presents a spiking neural network (SNN) accelerator made using fully open-source EDA tools, process design kit (PDK), and memory macros synthesized using OpenRAM.

L2SR: Learning to Sample and Reconstruct for Accelerated MRI

facebookresearch/fastMRI 5 Dec 2022

Accelerated MRI aims to find a pair of samplers and reconstructors to reduce acquisition time while maintaining the reconstruction quality.

Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models

AttendAndExcite/Attend-and-Excite 31 Jan 2023

Recent text-to-image generative models have demonstrated an unparalleled ability to generate diverse and creative imagery guided by a target text prompt.

Generative Semantic Nursing

Self-improving Multiplane-to-layer Images for Novel View Synthesis

SamsungLabs/MLI 4 Oct 2022

We present a new method for lightweight novel-view synthesis that generalizes to an arbitrary forward-facing scene.

Generalizable Novel View Synthesis Novel View Synthesis

AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities

automatic1111/stable-diffusion-webui 12 Nov 2022

In this work, we present a conceptually simple and effective method to train a strong bilingual/multilingual multimodal representation model.

Contrastive Learning Cross-Modal Retrieval +10

