StyleTokenizer: Defining Image Style by a Single Instance for Controlling Diffusion Models

alipay/style-tokenizer 4 Sep 2024

To tackle these challenges, we introduce StyleTokenizer, a zero-shot style control image generation method that aligns style representation with text representation using a style tokenizer.

Denoising Text-to-Image Generation

35
0.33 stars / hour

Beyond Alignment: Blind Video Face Restoration via Parsing-Guided Temporal-Coherent Transformer

kepengxu/pgtformer 21 Apr 2024

Multiple complex degradations are coupled in low-quality video faces in the real world.

Face Parsing Semantic Parsing +1

166
0.33 stars / hour

LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models

zhiyuanhubj/LongRecipe 31 Aug 2024

Large language models (LLMs) face significant challenges in handling long-context tasks because of their limited effective context window size during pretraining, which restricts their ability to generalize over extended sequences.

8k

47
0.33 stars / hour

SWIFT:A Scalable lightWeight Infrastructure for Fine-Tuning

modelscope/swift 10 Aug 2024

With support of over $300+$ LLMs and $50+$ MLLMs, SWIFT stands as the open-source framework that provide the most comprehensive support for fine-tuning large models.

Hallucination Optical Character Recognition +6

3,376
0.31 stars / hour

Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance

fudan-generative-vision/champ 21 Mar 2024

In this study, we introduce a methodology for human image animation by leveraging a 3D human parametric model within a latent diffusion framework to enhance shape alignment and motion guidance in curernt human generative techniques.

Animated GIF Generation Image Animation +1

3,831
0.31 stars / hour

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

sakanaai/ai-scientist 12 Aug 2024

This approach signifies the beginning of a new era in scientific discovery in machine learning: bringing the transformative benefits of AI agents to the entire research process of AI itself, and taking us closer to a world where endless affordable creativity and innovation can be unleashed on the world's most challenging problems.

Language Modelling scientific discovery

7,341
0.29 stars / hour

MegActor-$Σ$: Unlocking Flexible Mixed-Modal Control in Portrait Animation with Diffusion Transformer

megvii-research/megactor 27 Aug 2024

To address this issue, we introduce MegActor-$\Sigma$: a mixed-modal conditional diffusion transformer (DiT), which can flexibly inject audio and visual modality control signals into portrait animation.

Portrait Animation

724
0.29 stars / hour

FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance

360CVGroup/FancyVideo 15 Aug 2024

Then, TAR refines the correlation matrix between cross-frame textual conditions and latent features along the time dimension.

TAR Video Generation

437
0.28 stars / hour

LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models

LLaVA-VL/LLaVA-NeXT 10 Jul 2024

To this end, we introduce LLaVA-NeXT-Interleave, which simultaneously tackles Multi-image, Multi-frame (video), Multi-view (3D), and Multi-patch (single-image) scenarios in LMMs.

Zero-Shot Video Question Answer

2,366
0.28 stars / hour

LAR-IQA: A Lightweight, Accurate, and Robust No-Reference Image Quality Assessment Model

nasimjamshidi/lar-iqa 30 Aug 2024

Recent advancements in the field of No-Reference Image Quality Assessment (NR-IQA) using deep learning techniques demonstrate high performance across multiple open-source datasets.

Kolmogorov-Arnold Networks

28
0.26 stars / hour