Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

PKU-YuanGroup/Video-LLaVA 16 Nov 2023

In this work, we unify visual representation into the language feature space to advance the foundational LLM towards a unified LVLM.

Language Modelling Large Language Model +2

1,316
1.82 stars / hour

Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

stability-ai/generative-models None 2023

We then explore the impact of finetuning our base model on high-quality data and train a text-to-video model that is competitive with closed-source video generation.

14,348
1.76 stars / hour

Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models

rohitgandikota/sliders 20 Nov 2023

We present a method to create interpretable concept sliders that enable precise control over attributes in image generations from diffusion models.

Image Generation

277
1.35 stars / hour

Efficient LLM Inference on CPUs

intel/intel-extension-for-transformers 1 Nov 2023

Large language models (LLMs) have demonstrated remarkable performance and tremendous potential across a wide range of tasks.

Quantization

1,132
1.32 stars / hour

Exponentially Faster Language Modelling

pbelcak/ultrafastbert 15 Nov 2023

Language models only really need to use an exponential fraction of their neurons for individual inferences.

Benchmarking Language Modelling

351
1.24 stars / hour

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

yl4579/StyleTTS2 13 Jun 2023

In this paper, we present StyleTTS 2, a text-to-speech (TTS) model that leverages style diffusion and adversarial training with large speech language models (SLMs) to achieve human-level TTS synthesis.

Speech Synthesis

2,825
1.21 stars / hour

LCM-LoRA: A Universal Stable-Diffusion Acceleration Module

luosiallen/latent-consistency-model 9 Nov 2023

Latent Consistency Models (LCMs) have achieved impressive performance in accelerating text-to-image generative tasks, producing high-quality images with minimal inference steps.

Image Generation

2,887
1.13 stars / hour

HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis

sh-lee-prml/hierspeechpp 21 Nov 2023

Furthermore, we significantly improve the naturalness and speaker similarity of synthetic speech even in zero-shot speech synthesis scenarios.

Speech Synthesis Super-Resolution +2

238
1.07 stars / hour

MagicDance: Realistic Human Dance Video Generation with Motions & Facial Expressions Transfer

boese0601/magicdance 18 Nov 2023

In this work, we propose MagicDance, a diffusion-based model for 2D human motion and facial expression transfer on challenging human dance videos.

Video Generation

125
0.82 stars / hour

CogVLM: Visual Expert for Pretrained Language Models

thudm/cogvlm 6 Nov 2023

We introduce CogVLM, a powerful open-source visual language foundation model.

Language Modelling Visual Question Answering

2,445
0.82 stars / hour