MiniCPM-V: A GPT-4V Level MLLM on Your Phone

openbmb/minicpm-v 3 Aug 2024

The recent surge of Multimodal Large Language Models (MLLMs) has fundamentally reshaped the landscape of AI research and industry, shedding light on a promising path toward the next AI milestone.

Hallucination Multiple-choice +3

17,573
2.26 stars / hour

MiniRAG: Towards Extremely Simple Retrieval-Augmented Generation

hkuds/minirag 12 Jan 2025

The growing demand for efficient and lightweight Retrieval-Augmented Generation (RAG) systems has highlighted significant challenges when deploying Small Language Models (SLMs) in existing RAG frameworks.

RAG Retrieval

470
1.49 stars / hour

Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget

sonyresearch/micro_diffusion 22 Jul 2024

As scaling laws in generative AI push performance, they also simultaneously concentrate the development of these models among actors with large computational resources.

1,133
1.47 stars / hour

Monolith: Real Time Recommendation System With Collisionless Embedding Table

bytedance/monolith 16 Sep 2022

In this paper, we present Monolith, a system tailored for online training.

8,247
1.30 stars / hour

Tensor Product Attention Is All You Need

tensorgi/t6 11 Jan 2025

Scaling language models to handle longer input sequences typically necessitates large key-value (KV) caches, resulting in substantial memory overhead during inference.

Language Modeling Language Modelling

246
1.00 stars / hour

Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

magic-research/Sa2VA 7 Jan 2025

This work presents Sa2VA, the first unified model for dense grounded understanding of both images and videos.

2k Language Modeling +7

794
0.99 stars / hour

Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise

vgenai-netflix-eyeline-research/go-with-the-flow 14 Jan 2025

The efficiency of our algorithm enables us to fine-tune modern video diffusion base models using warped noise with minimal overhead, and provide a one-stop solution for a wide range of user-friendly motion control: local object motion control, global camera movement control, and motion transfer.

Optical Flow Estimation

483
0.96 stars / hour

UnCommon Objects in 3D

facebookresearch/uco3d 13 Jan 2025

We introduce Uncommon Objects in 3D (uCO3D), a new object-centric dataset for 3D deep learning and 3D generative AI.

Object

357
0.92 stars / hour

$\text{Transformer}^2$: Self-adaptive LLMs

SakanaAI/self-adaptive-llms 9 Jan 2025

Self-adaptive large language models (LLMs) aim to solve the challenges posed by traditional fine-tuning methods, which are often computationally intensive and static in their ability to handle diverse tasks.

735
0.88 stars / hour

Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Diffusion Transformer Networks

fudan-generative-vision/hallo3 1 Dec 2024

Existing methodologies for animating portrait images face significant challenges, particularly in handling non-frontal perspectives, rendering dynamic objects around the portrait, and generating immersive, realistic backgrounds.

Image Animation Portrait Animation

785
0.86 stars / hour