Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

FoundationVision/VAR 3 Apr 2024

We present Visual AutoRegressive modeling (VAR), a new generation paradigm that redefines the autoregressive learning on images as coarse-to-fine "next-scale prediction" or "next-resolution prediction", diverging from the standard raster-scan "next-token prediction".

Image Generation Language Modelling +2

1,681
5.58 stars / hour

InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation

instantstyle/instantstyle 3 Apr 2024

Tuning-free diffusion-based models have demonstrated significant potential in the realm of image personalization and customization.

Text-to-Image Generation

756
2.82 stars / hour

ReFT: Representation Finetuning for Language Models

stanfordnlp/pyreft 4 Apr 2024

LoReFT is a drop-in replacement for existing PEFTs and learns interventions that are 10x-50x more parameter-efficient than prior state-of-the-art PEFTs.

Arithmetic Reasoning

401
2.40 stars / hour

AIOS: LLM Agent Operating System

agiresearch/aios 25 Mar 2024

Inspired by these challenges, this paper presents AIOS, an LLM agent operating system, which embeds large language model into operating systems (OS) as the brain of the OS, enabling an operating system "with soul" -- an important step towards AGI.

Language Modelling Large Language Model +1

2,183
2.30 stars / hour

AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent

thudm/autowebglm 4 Apr 2024

Large language models (LLMs) have fueled many intelligent agent tasks, such as web navigation -- but most existing agents perform far from satisfying in real-world webpages due to three factors: (1) the versatility of actions on webpages, (2) HTML text exceeding model processing capacity, and (3) the complexity of decision-making due to the open-domain nature of web.

Decision Making Language Modelling +1

124
1.09 stars / hour

AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation

scutzzj/aniportrait 26 Mar 2024

In this study, we propose AniPortrait, a novel framework for generating high-quality animation driven by audio and a reference portrait image.

Face Reenactment

3,324
1.03 stars / hour

VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild

jasonppy/voicecraft 25 Mar 2024

We introduce VoiceCraft, a token infilling neural codec language model, that achieves state-of-the-art performance on both speech editing and zero-shot text-to-speech (TTS) on audiobooks, internet videos, and podcasts.

Language Modelling

6,176
0.98 stars / hour

DN-Splatter: Depth and Normal Priors for Gaussian Splatting and Meshing

maturk/dn-splatter 26 Mar 2024

3D Gaussian splatting, a novel differentiable rendering technique, has achieved state-of-the-art novel view synthesis results with high rendering speeds and relatively low training times.

Depth Estimation Novel View Synthesis

123
0.96 stars / hour

PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models

graphpku/pissa 3 Apr 2024

However, LoRA approximates Delta W through the product of two matrices, A, initialized with Gaussian noise, and B, initialized with zeros, while PiSSA initializes A and B with principal singular values and vectors of the original matrix W. PiSSA can better approximate the outcomes of full-parameter fine-tuning at the beginning by changing the essential parts while freezing the "noisy" parts.

Quantization

65
0.83 stars / hour

HairFastGAN: Realistic and Robust Hair Transfer with a Fast Encoder-Based Approach

airi-institute/hairfastgan 1 Apr 2024

Our paper addresses the complex task of transferring a hairstyle from a reference image to an input photo for virtual hair try-on.

106
0.79 stars / hour