AM-RADIO: Agglomerative Vision Foundation Model -- Reduce All Domains Into One

nvlabs/radio 10 Dec 2023

A handful of visual foundation models (VFMs) have recently emerged as the backbones for numerous downstream tasks.

Benchmarking object-detection +2

162
0.89 stars / hour

Spectrally Pruned Gaussian Fields with Neural Compensation

runyiyang/sundae 1 May 2024

However, this comes with high memory consumption, e. g., a well-trained Gaussian field may utilize three million Gaussian primitives and over 700 MB of memory.

41
0.88 stars / hour

WavCraft: Audio Editing and Generation with Natural Language Prompts

jinhualiang/wavcraft 14 Mar 2024

We introduce WavCraft, a collective system that leverages large language models (LLMs) to connect diverse task-specific models for audio content creation and editing.

In-Context Learning

173
0.74 stars / hour

Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding

suzgunmirac/meta-prompting 23 Jan 2024

This collaborative prompting approach empowers a single LM to simultaneously act as a comprehensive orchestrator and a panel of diverse experts, significantly enhancing its performance across a wide array of tasks.

Checkmate In One

219
0.70 stars / hour

FlowMap: High-Quality Camera Poses, Intrinsics, and Depth via Gradient Descent

dcharatan/flowmap 23 Apr 2024

This paper introduces FlowMap, an end-to-end differentiable method that solves for precise camera poses, camera intrinsics, and per-frame dense depth of a video sequence.

Novel View Synthesis Optical Flow Estimation +1

654
0.69 stars / hour

MicroDreamer: Zero-shot 3D Generation in $\sim$20 Seconds by Score-based Iterative Reconstruction

ml-gsai/microdreamer 30 Apr 2024

In this paper, we introduce score-based iterative reconstruction (SIR), an efficient and general algorithm for 3D generation with a multi-view score-based diffusion model.

3D Generation 3D Reconstruction

44
0.67 stars / hour

CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data

apple/corenet 24 Apr 2024

Contrastive learning has emerged as a transformative method for learning effective visual representations through the alignment of image and text embeddings.

Contrastive Learning

6,125
0.66 stars / hour

Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

FoundationVision/VAR 3 Apr 2024

We present Visual AutoRegressive modeling (VAR), a new generation paradigm that redefines the autoregressive learning on images as coarse-to-fine "next-scale prediction" or "next-resolution prediction", diverging from the standard raster-scan "next-token prediction".

Image Generation Language Modelling +2

3,253
0.65 stars / hour

MemGPT: Towards LLMs as Operating Systems

cpacker/memgpt 12 Oct 2023

Large language models (LLMs) have revolutionized AI, but are constrained by limited context windows, hindering their utility in tasks like extended conversations and document analysis.

Management

9,122
0.64 stars / hour