LLaVA-CoT: Let Vision Language Models Reason Step-by-Step

PKU-YuanGroup/LLaVA-CoT 15 Nov 2024

Large language models have demonstrated substantial advancements in reasoning capabilities, particularly through inference-time scaling, as illustrated by models such as OpenAI's o1.

Logical Reasoning Multimodal Reasoning +2

1,472
0.30 stars / hour

OLMo: Accelerating the Science of Language Models

allenai/olmo 1 Feb 2024

Given the importance of these details in scientifically studying these models, including their biases and potential risks, we believe it is essential for the research community to have access to powerful, truly open LMs.

Language Modelling

4,785
0.30 stars / hour

KAG: Boosting LLMs in Professional Domains via Knowledge Augmented Generation

openspg/kag 10 Sep 2024

The recently developed retrieval-augmented generation (RAG) technology has enabled the efficient construction of domain-specific applications.

Knowledge Graphs Question Answering +2

745
0.30 stars / hour

LLM2CLIP: Powerful Language Model Unlocks Richer Visual Representation

microsoft/LLM2CLIP 7 Nov 2024

CLIP is a foundational multimodal model that aligns image and text features into a shared space using contrastive learning on large-scale image-text pairs.

Contrastive Learning Image Captioning +3

347
0.30 stars / hour

Qwen2 Technical Report

qwenlm/qwen1.5 15 Jul 2024

This report introduces the Qwen2 series, the latest addition to our large language models and large multimodal models.

 Ranked #1 on Arithmetic Reasoning on GSM8K (using extra training data)

Arithmetic Reasoning GSM8K +5

10,299
0.29 stars / hour

VecCity: A Taxonomy-guided Library for Map Entity Representation Learning

Bigscity-VecCity/VecCity 31 Oct 2024

First, existing research is fragmented, with models classified by the type of map entity, limiting the reusability of techniques across different tasks.

Representation Learning

35
0.28 stars / hour

GauStudio: A Modular Framework for 3D Gaussian Splatting and Beyond

gap-lab-cuhk-sz/gaustudio 28 Mar 2024

We present GauStudio, a novel modular framework for modeling 3D Gaussian Splatting (3DGS) to provide standardized, plug-and-play components for users to easily customize and implement a 3DGS pipeline.

Novel View Synthesis Surface Reconstruction

1,192
0.28 stars / hour

In-Context LoRA for Diffusion Transformers

ali-vilab/In-Context-LoRA 31 Oct 2024

While task-specific in terms of tuning data, our framework remains task-agnostic in architecture and pipeline, offering a powerful tool for the community and providing valuable insights for further research on product-level task-agnostic generation systems.

Image Generation

1,210
0.27 stars / hour

MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks

adlith/moe-jetpack 7 Jun 2024

The sparsely activated mixture of experts (MoE) model presents a promising alternative to traditional densely activated (dense) models, enhancing both quality and computational efficiency.

Computational Efficiency

96
0.21 stars / hour

Qwen2.5-Coder Technical Report

qwenlm/qwen2.5-coder 18 Sep 2024

In this report, we introduce the Qwen2. 5-Coder series, a significant upgrade from its predecessor, CodeQwen1. 5.

Code Generation +2

3,126
0.26 stars / hour