Align Anything: Training All-Modality Models to Follow Instructions with Language Feedback

pku-alignment/align-anything 20 Dec 2024

In this work, we make the first attempt to fine-tune all-modality models (i. e. input and output with any modality, also named any-to-any models) using human preference data across all modalities (including text, image, audio, and video), ensuring its behavior aligns with human intentions.

Instruction Following

1,965
0.85 stars / hour

FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration

fireredteam/fireredasr 24 Jan 2025

We present FireRedASR, a family of large-scale automatic speech recognition (ASR) models for Mandarin, designed to meet diverse requirements in superior performance and optimal efficiency across various applications.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

508
0.85 stars / hour

On the Emergence of Thinking in LLMs I: Searching for the Right Intuition

OpenLLMAI/OpenRLHF 10 Feb 2025

Lastly, we propose a theory as to why RLSP search strategy is more suitable for LLMs inspired by a remarkable result that says CoT provably increases computational power of LLMs, which grows as the number of steps in CoT \cite{li2024chain, merrill2023expresssive}.

Math

4,725
0.59 stars / hour

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

deepseek-ai/deepseek-vl2 13 Dec 2024

We present DeepSeek-VL2, an advanced series of large Mixture-of-Experts (MoE) Vision-Language Models that significantly improves upon its predecessor, DeepSeek-VL, through two key major upgrades.

Chart Understanding Optical Character Recognition +4

3,631
0.64 stars / hour

LIMO: Less is More for Reasoning

gair-nlp/limo 5 Feb 2025

While conventional wisdom suggests that sophisticated reasoning tasks demand extensive training data (>100, 000 examples), we demonstrate that complex mathematical reasoning abilities can be effectively elicited with surprisingly few examples.

Math Mathematical Reasoning +2

602
0.63 stars / hour

MedRAX: Medical Reasoning Agent for Chest X-ray

bowang-lab/medrax 4 Feb 2025

Chest X-rays (CXRs) play an integral role in driving critical decisions in disease management and patient care.

AI Agent Management

398
0.62 stars / hour

Accelerating Data Processing and Benchmarking of AI Models for Pathology

mahmoodlab/trident 10 Feb 2025

Advances in foundation modeling have reshaped computational pathology.

Benchmarking

63
0.60 stars / hour

Temporal Working Memory: Query-Guided Segment Refinement for Enhanced Multimodal Understanding

xid32/naacl_2025_twm 9 Feb 2025

To overcome these challenges, we introduce a specialized cognitive module, temporal working memory (TWM), which aims to enhance the temporal modeling capabilities of MFMs.

Image Captioning Image-text Retrieval +5

106
0.60 stars / hour

Meta Audiobox Aesthetics: Unified Automatic Quality Assessment for Speech, Music, and Sound

facebookresearch/audiobox-aesthetics 7 Feb 2025

The quantification of audio aesthetics remains a complex challenge in audio processing, primarily due to its subjective nature, which is influenced by human perception and cultural context.

Benchmarking

301
0.60 stars / hour

MatterGen: a generative model for inorganic materials design

microsoft/mattergen 6 Dec 2023

We further introduce adapter modules to enable fine-tuning towards any given property constraints with a labeled dataset.

model

1,127
0.59 stars / hour