Trending Research

AM-RADIO: Agglomerative Vision Foundation Model -- Reduce All Domains Into One

nvlabs/radio • • 10 Dec 2023

A handful of visual foundation models (VFMs) have recently emerged as the backbones for numerous downstream tasks.

Benchmarking object-detection +2

234

0.89 stars / hour

Paper
Code

Spectrally Pruned Gaussian Fields with Neural Compensation

runyiyang/sundae • 1 May 2024

However, this comes with high memory consumption, e. g., a well-trained Gaussian field may utilize three million Gaussian primitives and over 700 MB of memory.

0.88 stars / hour

Paper
Code

WavCraft: Audio Editing and Generation with Natural Language Prompts

jinhualiang/wavcraft • • 14 Mar 2024

We introduce WavCraft, a collective system that leverages large language models (LLMs) to connect diverse task-specific models for audio content creation and editing.

In-Context Learning

222

0.74 stars / hour

Paper
Code

PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning

magic-research/PLLaVA • • arXiv 2024

PLLaVA achieves new state-of-the-art performance on modern benchmark datasets for both video question-answer and captioning tasks.

Ranked #1 on Video-based Generative Performance Benchmarking on VideoInstruct

Dense Captioning Video-based Generative Performance Benchmarking +1

272

0.73 stars / hour

Paper
Code

Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding

suzgunmirac/meta-prompting • 23 Jan 2024

This collaborative prompting approach empowers a single LM to simultaneously act as a comprehensive orchestrator and a panel of diverse experts, significantly enhancing its performance across a wide array of tasks.

Checkmate In One

229

0.70 stars / hour

Paper
Code

FlowMap: High-Quality Camera Poses, Intrinsics, and Depth via Gradient Descent

dcharatan/flowmap • • 23 Apr 2024

This paper introduces FlowMap, an end-to-end differentiable method that solves for precise camera poses, camera intrinsics, and per-frame dense depth of a video sequence.

Novel View Synthesis Optical Flow Estimation +1

679

0.69 stars / hour

Paper
Code

MicroDreamer: Zero-shot 3D Generation in $\sim$20 Seconds by Score-based Iterative Reconstruction

ml-gsai/microdreamer • • 30 Apr 2024

In this paper, we introduce score-based iterative reconstruction (SIR), an efficient and general algorithm for 3D generation with a multi-view score-based diffusion model.

3D Generation 3D Reconstruction

0.67 stars / hour

Paper
Code

CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data

apple/corenet • • 24 Apr 2024

Contrastive learning has emerged as a transformative method for learning effective visual representations through the alignment of image and text embeddings.

Contrastive Learning

6,269

0.66 stars / hour

Paper
Code

Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

FoundationVision/VAR • • 3 Apr 2024

We present Visual AutoRegressive modeling (VAR), a new generation paradigm that redefines the autoregressive learning on images as coarse-to-fine "next-scale prediction" or "next-resolution prediction", diverging from the standard raster-scan "next-token prediction".

Ranked #7 on Image Generation on ImageNet 256x256

Image Generation Language Modelling +2

3,349

0.65 stars / hour

Paper
Code

MemGPT: Towards LLMs as Operating Systems

cpacker/memgpt • 12 Oct 2023

Large language models (LLMs) have revolutionized AI, but are constrained by limited context windows, hindering their utility in tasks like extended conversations and document analysis.

Management

9,422

0.64 stars / hour

Paper
Code