Rewrite the Stars

ma-xu/rewrite-the-stars 29 Mar 2024

Recent studies have drawn attention to the untapped potential of the "star operation" (element-wise multiplication) in network design.

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

mit-han-lab/qserve 7 May 2024

The key insight driving QServe is that the efficiency of LLM serving on GPUs is critically influenced by operations on low-throughput CUDA cores.

Language Modelling Large Language Model +1

DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks

zzzhang-jx/docres 7 May 2024

This underscores the potential of DocRes across a broader spectrum of document image restoration tasks.

Binarization Deblurring +3

WavCraft: Audio Editing and Generation with Large Language Models

jinhualiang/wavcraft 14 Mar 2024

We introduce WavCraft, a collective system that leverages large language models (LLMs) to connect diverse task-specific models for audio content creation and editing.

In-Context Learning

FeatUp: A Model-Agnostic Framework for Features at Any Resolution

mhamilton723/FeatUp 15 Mar 2024

Deep features are a cornerstone of computer vision research, capturing image semantics and enabling the community to solve downstream tasks even in the zero- or few-shot regime.

Depth Estimation Depth Prediction +5

PuLID: Pure and Lightning ID Customization via Contrastive Alignment

tothebeginning/pulid 24 Apr 2024

We propose Pure and Lightning ID customization (PuLID), a novel tuning-free ID customization method for text-to-image generation.

Text-to-Image Generation

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

mcgill-nlp/llm2vec 9 Apr 2024

We outperform encoder-only models by a large margin on word-level tasks and reach a new unsupervised state-of-the-art performance on the Massive Text Embeddings Benchmark (MTEB).

Contrastive Learning Decoder

ThemeStation: Generating Theme-Aware 3D Assets from Few Exemplars

3DTopia/ThemeStation 22 Mar 2024

To this end, we design a two-stage framework that draws a concept image first, followed by a reference-informed 3D modeling stage.

3D Generation Unity

Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models

assafelovic/gpt-researcher 22 Feb 2024

We study how to apply large language models to write grounded and organized long-form articles from scratch, with comparable breadth and depth to Wikipedia pages.


AM-RADIO: Agglomerative Vision Foundation Model -- Reduce All Domains Into One

nvlabs/radio 10 Dec 2023

A handful of visual foundation models (VFMs) have recently emerged as the backbones for numerous downstream tasks.

Benchmarking object-detection +2

