Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance

fudan-generative-vision/champ 21 Mar 2024

In this study, we introduce a methodology for human image animation by leveraging a 3D human parametric model within a latent diffusion framework to enhance shape alignment and motion guidance in curernt human generative techniques.

Animated GIF Generation Image Animation +1

4,452
0.25 stars / hour

SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing

modelscope/swift CVPR 2024

Image diffusion models have been utilized in various tasks, such as text-to-image generation and controllable image synthesis.

Decoder Text-to-Image Generation

3,800
0.25 stars / hour

EFM3D: A Benchmark for Measuring Progress Towards 3D Egocentric Foundation Models

facebookresearch/efm3d 14 Jun 2024

The advent of wearable computers enables a new source of context for AI that is embedded in egocentric sensor data.

3D Object Detection 3D Reconstruction +2

77
0.24 stars / hour

LLaMA-Omni: Seamless Speech Interaction with Large Language Models

ictnlp/llama-omni 10 Sep 2024

We build our model based on the latest Llama-3. 1-8B-Instruct model.

2,350
0.24 stars / hour

LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit

modeltc/llmc 9 May 2024

In this paper, we present LLMC, a plug-and-play compression toolkit, to fairly and systematically explore the impact of quantization.

Benchmarking Computational Efficiency +3

282
0.24 stars / hour

ControlAR: Controllable Image Generation with Autoregressive Models

hustvl/controlar 3 Oct 2024

Firstly, we explore control encoding for AR models and propose a lightweight control encoder to transform spatial inputs (e. g., canny edges or depth maps) into control tokens.

Image Generation

55
0.27 stars / hour

Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models

whb139426/grounded-video-llm 4 Oct 2024

Video Large Language Models (Video-LLMs) have demonstrated remarkable capabilities in coarse-grained video understanding, however, they struggle with fine-grained temporal grounding.

Dense Video Captioning Sentence +1

35
0.24 stars / hour

Autoregressive Action Sequence Learning for Robotic Manipulation

mlzxy/arp 4 Oct 2024

We propose the Chunking Causal Transformer (CCT), which extends the next-single-token prediction of causal transformers to support multi-token prediction in a single pass.

Chunking Robot Manipulation

42
0.23 stars / hour

FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models

zhipeixu/fakeshield 3 Oct 2024

The rapid development of generative AI is a double-edged sword, which not only facilitates content creation but also makes image manipulation easier and more difficult to detect.

Face Swapping Image Forgery Detection +2

35
0.23 stars / hour

QAEncoder: Towards Aligned Representation Learning in Question Answering System

IAAR-Shanghai/QAEncoder 30 Sep 2024

Modern QA systems entail retrieval-augmented generation (RAG) for accurate and trustworthy responses.

Document Embedding Question Answering +2

29
0.23 stars / hour