Scattered Mixture-of-Experts Implementation

shawntan/scattermoe 13 Mar 2024

We present ScatterMoE, an implementation of Sparse Mixture-of-Experts (SMoE) on GPUs.

68
0.59 stars / hour

MoAI: Mixture of All Intelligence for Large Language and Vision Models

ByungKwanLee/MoAI 12 Mar 2024

Therefore, we present a new LLVM, Mixture of All Intelligence (MoAI), which leverages auxiliary visual information obtained from the outputs of external segmentation, detection, SGG, and OCR models.

Scene Understanding Visual Question Answering

135
0.57 stars / hour

TripoSR: Fast 3D Object Reconstruction from a Single Image

vast-ai-research/triposr 4 Mar 2024

This technical report introduces TripoSR, a 3D reconstruction model leveraging transformer architecture for fast feed-forward 3D generation, producing 3D mesh from a single image in under 0. 5 seconds.

3D Object Reconstruction From A Single Image 3D Reconstruction +1

3,056
0.56 stars / hour

SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis

ZiqiaoPeng/SyncTalk 29 Nov 2023

A lifelike talking head requires synchronized coordination of subject identity, lip movements, facial expressions, and head poses.

Talking Face Generation Talking Head Generation

525
0.55 stars / hour

DragAnything: Motion Control for Anything using Entity Representation

showlab/draganything 12 Mar 2024

We introduce DragAnything, which utilizes a entity representation to achieve motion control for any object in controllable video generation.

Object Video Generation

173
0.55 stars / hour

ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models

bytedance/res-adapter 4 Mar 2024

Especially, after learning a deep understanding of pure resolution priors, ResAdapter trained on the general dataset, generates resolution-free images with personalized diffusion models while preserving their original style domain.

Image Generation

434
0.54 stars / hour

VideoMamba: State Space Model for Efficient Video Understanding

opengvlab/videomamba 11 Mar 2024

Addressing the dual challenges of local redundancy and global dependencies in video understanding, this work innovatively adapts the Mamba to the video domain.

Video Understanding

297
0.54 stars / hour

Unified Source-Free Domain Adaptation

tntek/source-free-domain-adaptation 12 Mar 2024

To tackle this unified SFDA problem, we propose a novel approach called Latent Causal Factors Discovery (LCFD).

Language Modelling Source-Free Domain Adaptation +1

55
0.53 stars / hour

Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding

opengvlab/video-mamba-suite 14 Mar 2024

We categorize Mamba into four roles for modeling videos, deriving a Video Mamba Suite composed of 14 models/modules, and evaluating them on 12 video understanding tasks.

Temporal Action Localization Video Understanding

42
0.50 stars / hour

A Decade's Battle on Dataset Bias: Are We There Yet?

liuzhuang13/bias 13 Mar 2024

We revisit the "dataset classification" experiment suggested by Torralba and Efros a decade ago, in the new era with large-scale, diverse, and hopefully less biased datasets as well as more capable neural network architectures.

Memorization

58
0.49 stars / hour