MedRAX: Medical Reasoning Agent for Chest X-ray

bowang-lab/medrax 4 Feb 2025

Chest X-rays (CXRs) play an integral role in driving critical decisions in disease management and patient care.

AI Agent Management

398
0.86 stars / hour

Align Anything: Training All-Modality Models to Follow Instructions with Language Feedback

pku-alignment/align-anything 20 Dec 2024

In this work, we make the first attempt to fine-tune all-modality models (i. e. input and output with any modality, also named any-to-any models) using human preference data across all modalities (including text, image, audio, and video), ensuring its behavior aligns with human intentions.

Instruction Following

1,965
0.81 stars / hour

FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video Generation

foundationvision/flashvideo 7 Feb 2025

DiT diffusion models have achieved great success in text-to-video generation, leveraging their scalability in model capacity and data scale.

Computational Efficiency Text-to-Video Generation +1

258
0.81 stars / hour

Enhance-A-Video: Better Generated Video for Free

NUS-HPC-AI-Lab/Enhance-A-Video 11 Feb 2025

DiT-based video generation has achieved remarkable results, but research into enhancing existing models remains relatively unexplored.

Video Generation

363
0.75 stars / hour

IntellAgent: A Multi-Agent Framework for Evaluating Conversational AI Systems

plurai-ai/intellagent 19 Jan 2025

IntellAgent represents a paradigm shift in evaluating conversational AI.

Navigate

842
0.72 stars / hour

Temporal Working Memory: Query-Guided Segment Refinement for Enhanced Multimodal Understanding

xid32/naacl_2025_twm 9 Feb 2025

To overcome these challenges, we introduce a specialized cognitive module, temporal working memory (TWM), which aims to enhance the temporal modeling capabilities of MFMs.

Image Captioning Image-text Retrieval +5

106
0.72 stars / hour

GS-CPR: Efficient Camera Pose Refinement via 3D Gaussian Splatting

XRIM-Lab/GS-CPR International Conference on Learning Representations (ICLR) 2025

We leverage 3D Gaussian Splatting (3DGS) as a scene representation and propose a novel test-time camera pose refinement (CPR) framework, GS-CPR.

3DGS NeRF +3

22
0.71 stars / hour

LIMO: Less is More for Reasoning

gair-nlp/limo 5 Feb 2025

While conventional wisdom suggests that sophisticated reasoning tasks demand extensive training data (>100, 000 examples), we demonstrate that complex mathematical reasoning abilities can be effectively elicited with surprisingly few examples.

Math Mathematical Reasoning +2

602
0.70 stars / hour

Free-Form Image Inpainting with Gated Convolution

zuruoke/watermark-removal ICCV 2019

We present a generative image inpainting system to complete images with free-form mask and guidance.

feature selection Image Inpainting +1

2,446
0.67 stars / hour

Meta Audiobox Aesthetics: Unified Automatic Quality Assessment for Speech, Music, and Sound

facebookresearch/audiobox-aesthetics 7 Feb 2025

The quantification of audio aesthetics remains a complex challenge in audio processing, primarily due to its subjective nature, which is influenced by human perception and cultural context.

Benchmarking

301
0.61 stars / hour