FastVLM: Efficient Vision Encoding for Vision Language Models

apple/ml-fastvlm 17 Dec 2024

At different operational resolutions, the vision encoder of a VLM can be optimized along two axes: reducing encoding latency and minimizing the number of visual tokens passed to the LLM, thereby lowering overall latency.

3,444
6.14 stars / hour

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

jiuhaichen/blip3o 14 May 2025

Building on our innovative model design, training recipe, and datasets, we develop BLIP3-o, a suite of state-of-the-art unified multimodal models.

Image Generation

368
3.42 stars / hour

MASS: Multi-Agent Simulation Scaling for Portfolio Construction

gta0804/mass 15 May 2025

LLM-based multi-agent has gained significant attention for their potential in simulation and enhancing performance.

82
1.89 stars / hour

Absolute Zero: Reinforced Self-play Reasoning with Zero Data

LeapLabTHU/Absolute-Zero-Reasoner 6 May 2025

Reinforcement learning with verifiable rewards (RLVR) has shown promise in enhancing the reasoning capabilities of large language models by learning directly from outcome-based rewards.

Mathematical Reasoning

1,148
1.78 stars / hour

HealthBench: Evaluating Large Language Models Towards Improved Human Health

openai/simple-evals 13 May 2025

We present HealthBench, an open-source benchmark measuring the performance and safety of large language models in healthcare.

Instruction Following Multiple-choice

3,441
1.44 stars / hour

Continuous Thought Machines

SakanaAI/continuous-thought-machines 8 May 2025

The CTM has two core innovations: (1) neuron-level temporal processing, where each neuron uses unique weight parameters to process a history of incoming signals; and (2) neural synchronization employed as a latent representation.

Computational Efficiency Question Answering

725
1.36 stars / hour

Spherical Channels for Modeling Atomic Interactions

Open-Catalyst-Project/ocp 29 Jun 2022

We propose the Spherical Channel Network (SCN) to model atomic energies and forces.

Computational chemistry Graph Neural Network

1,365
1.18 stars / hour

Parallel Scaling Law for Language Models

qwenlm/parscale 15 May 2025

We apply $P$ diverse and learnable transformations to the input, execute forward passes of the model in parallel, and dynamically aggregate the $P$ outputs.

62
1.15 stars / hour

VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model

vita-mllm/vita-audio 6 May 2025

Specifically, we introduce a lightweight Multiple Cross-modal Token Prediction (MCTP) module that efficiently generates multiple audio tokens within a single model forward pass, which not only accelerates the inference but also significantly reduces the latency for generating the first audio in streaming scenarios.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +7

324
1.14 stars / hour

Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

mem0ai/mem0 28 Apr 2025

Large Language Models (LLMs) have demonstrated remarkable prowess in generating contextually coherent responses, yet their fixed context windows pose fundamental challenges for maintaining consistency over prolonged multi-session dialogues.

RAG

31,146
1.09 stars / hour