Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control

igl-hkust/diffusionasshader 7 Jan 2025

Diffusion models have demonstrated impressive performance in generating high-quality videos from text prompts or images.

Video Generation

358
0.59 stars / hour

Don't Do RAG: When Cache-Augmented Generation is All You Need for Knowledge Tasks

hhhuang/cag 20 Dec 2024

With the advent of large language models (LLMs) featuring significantly extended context windows, this paper proposes an alternative paradigm, cache-augmented generation (CAG) that bypasses real-time retrieval.

RAG Retrieval

783
0.59 stars / hour

DeepSeek-V3 Technical Report

deepseek-ai/deepseek-v3 27 Dec 2024

We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token.

Language Modeling Language Modelling

19,498
0.58 stars / hour

WebWalker: Benchmarking LLMs in Web Traversal

alibaba-nlp/webwalker 13 Jan 2025

Extensive experimental results show that WebWalkerQA is challenging and demonstrates the effectiveness of RAG combined with WebWalker, through the horizontal and vertical integration in real-world scenarios.

Benchmarking Open-Domain Question Answering +2

53
0.57 stars / hour

A General Framework for Inference-time Scaling and Steering of Diffusion Models

zacharyhorvitz/fk-diffusion-steering 12 Jan 2025

For steering text-to-image models with a human preference reward, we find that FK steering a 0. 8B parameter model outperforms a 2. 6B parameter fine-tuned model on prompt fidelity, with faster sampling and no training.

Protein Design

38
0.53 stars / hour

Generative AI for Cel-Animation: A Survey

yunlong10/Awesome-AI4Animation 8 Jan 2025

Traditional Celluloid (Cel) Animation production pipeline encompasses multiple essential steps, including storyboarding, layout design, keyframe animation, inbetweening, and colorization, which demand substantial manual effort, technical expertise, and significant time investment.

Colorization Layout Design +1

46
0.53 stars / hour

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

VITA-MLLM/VITA 3 Jan 2025

Recent Multimodal Large Language Models (MLLMs) have typically focused on integrating visual and textual modalities, with less emphasis placed on the role of speech in enhancing interaction.

1,934
0.50 stars / hour

Eliza: A Web3 friendly AI Agent Operating System

ai16z/eliza 12 Jan 2025

AI Agent, powered by large language models (LLMs) as its cognitive core, is an intelligent agentic system capable of autonomously controlling and determining the execution paths under user's instructions.

AI Agent RAG

12,011
0.50 stars / hour

DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing

ucbepic/docetl 16 Oct 2024

Our evaluation on four different unstructured document analysis tasks demonstrates that DocETL finds plans with outputs that are 25 to 80% more accurate than well-engineered baselines, addressing a critical gap in unstructured data analysis.

1,518
0.46 stars / hour

O1 Replication Journey -- Part 3: Inference-time Scaling for Medical Reasoning

spiral-med/ophiuchus 11 Jan 2025

Building upon our previous investigations of O1 replication (Part 1: Journey Learning [Qin et al., 2024] and Part 2: Distillation [Huang et al., 2024]), this work explores the potential of inference-time scaling in large language models (LLMs) for medical reasoning tasks, ranging from diagnostic decision-making to treatment planning.

Decision Making MedQA

29
0.43 stars / hour