Mamba: Linear-Time Sequence Modeling with Selective State Spaces

state-spaces/mamba 1 Dec 2023

Foundation models, now powering most of the exciting applications in deep learning, are almost universally based on the Transformer architecture and its core attention module.

Language Modelling

1,699
9.64 stars / hour

TaskWeaver: A Code-First Agent Framework

microsoft/taskweaver 29 Nov 2023

TaskWeaver provides support for rich data structures, flexible plugin usage, and dynamic plugin selection, and leverages LLM coding capabilities for complex logic.

Natural Language Understanding

1,701
3.63 stars / hour

DeepCache: Accelerating Diffusion Models for Free

horseee/deepcache 1 Dec 2023

Diffusion models have recently gained unprecedented attention in the field of image synthesis due to their remarkable generative capabilities.

Denoising Image Generation

130
1.96 stars / hour

SeamlessM4T: Massively Multilingual & Multimodal Machine Translation

facebookresearch/seamless_communication 22 Aug 2023

What does it take to create the Babel Fish, a tool that can help individuals translate speech between any two languages?

Automatic Speech Recognition Speech-to-Speech Translation +3

8,017
1.92 stars / hour

GeoDream: Disentangling 2D and Geometric Priors for High-Fidelity and Consistent 3D Generation

baaivision/GeoDream 29 Nov 2023

We justify that the refined 3D geometric priors aid in the 3D-aware capability of 2D diffusion priors, which in turn provides superior guidance for the refinement of 3D geometric priors.

Text to 3D

305
1.82 stars / hour

Gaussian Grouping: Segment and Edit Anything in 3D Scenes

lkeab/gaussian-grouping 1 Dec 2023

To address this issue, we propose Gaussian Grouping, which extends Gaussian Splatting to jointly reconstruct and segment anything in open-world 3D scenes.

Colorization Novel View Synthesis +1

93
1.76 stars / hour

Improving Sample Quality of Diffusion Models Using Self-Attention Guidance

lllyasviel/fooocus ICCV 2023

Denoising diffusion models (DDMs) have attracted attention for their exceptional generation quality and diversity.

Denoising Image Generation

23,719
1.76 stars / hour

HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis

sh-lee-prml/hierspeechpp 21 Nov 2023

Furthermore, we significantly improve the naturalness and speaker similarity of synthetic speech even in zero-shot speech synthesis scenarios.

Speech Synthesis Super-Resolution +2

623
1.46 stars / hour

DiffiT: Diffusion Vision Transformers for Image Generation

nvlabs/diffit 4 Dec 2023

We also introduce latent DiffiT which consists of transformer model with the proposed self-attention layers, for high-resolution image generation.

Denoising Image Generation

89
1.38 stars / hour

MotionDirector: Motion Customization of Text-to-Video Diffusion Models

showlab/MotionDirector 12 Oct 2023

Given a set of video clips of the same motion concept, the task of Motion Customization is to adapt existing text-to-video diffusion models to generate videos with this motion.

381
1.37 stars / hour