Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation

tencent/hunyuan3d-2 21 Jan 2025

This system includes two foundation components: a large-scale shape generation model -- Hunyuan3D-DiT, and a large-scale texture synthesis model -- Hunyuan3D-Paint.

Texture Synthesis

6,041
0.31 stars / hour

Extended Agriculture-Vision: An Extension of a Large Aerial Image Dataset for Agricultural Pattern Analysis

jingwu6/extended-agriculture-vision-dataset 4 Mar 2023

First, we generate and release an improved version of the Agriculture-Vision dataset (Chiu et al., 2020b) to include raw, full-field imagery for greater experimental flexibility.

Benchmarking Contrastive Learning +2

231
0.30 stars / hour

DeepSeek-VL: Towards Real-World Vision-Language Understanding

deepseek-ai/deepseek-vl 8 Mar 2024

The DeepSeek-VL family (both 1. 3B and 7B models) showcases superior user experiences as a vision-language chatbot in real-world applications, achieving state-of-the-art or competitive performance across a wide range of visual-language benchmarks at the same model size while maintaining robust performance on language-centric benchmarks.

Chatbot Language Modelling +3

3,397
0.30 stars / hour

Enhancing Chat Language Models by Scaling High-quality Instructional Conversations

thunlp/ultrachat 23 May 2023

Fine-tuning on instruction data has been widely validated as an effective practice for implementing chat language models like ChatGPT.

Diversity

2,274
0.30 stars / hour

Efficient Memory Management for Large Language Model Serving with PagedAttention

vllm-project/vllm 12 Sep 2023

On top of it, we build vLLM, an LLM serving system that achieves (1) near-zero waste in KV cache memory and (2) flexible sharing of KV cache within and across requests to further reduce memory usage.

Language Modeling Language Modelling +2

37,446
0.30 stars / hour

AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders

stanfordnlp/axbench 28 Jan 2025

We introduce a novel weakly-supervised representational method (Rank-1 Representation Finetuning; ReFT-r1), which is competitive on both tasks while providing the interpretability advantages that prompting lacks.

Language Modeling Language Modelling

50
0.28 stars / hour

D3still: Decoupled Differential Distillation for Asymmetric Image Retrieval

scy-x/d3still CVPR 2024

Existing methods for asymmetric image retrieval employ a rigid pairwise similarity constraint between the query network and the larger gallery network.

Image Retrieval Retrieval

46
0.28 stars / hour

OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer

om-ai-lab/OmAgent 24 Jun 2024

Recent advancements in Large Language Models (LLMs) have expanded their capabilities to multimodal contexts, including comprehensive video understanding.

AI Agent Video Understanding

1,663
0.27 stars / hour

Render-and-Compare: Cross-View 6 DoF Localization from Noisy Prior

choyaa/render2loc 13 Feb 2023

Despite the significant progress in 6-DoF visual localization, researchers are mostly driven by ground-level benchmarks.

Camera Pose Estimation Pose Estimation +1

44
0.27 stars / hour

Stable Flow: Vital Layers for Training-Free Image Editing

snap-research/stable-flow 21 Nov 2024

The main challenge is that, unlike the UNet-based models, DiT lacks a coarse-to-fine synthesis structure, making it unclear in which layers to perform the injection.

Text-based Image Editing

223
0.27 stars / hour