LoLCATs: On Low-Rank Linearizing of Large Language Models

hazyresearch/lolcats 14 Oct 2024

When compared with prior approaches under the same compute budgets, LoLCATs significantly improves linearizing quality, closing the gap between linearized and original Llama 3. 1 70B and 405B LLMs by 77. 8% and 78. 1% on 5-shot MMLU.

MMLU

87
1.21 stars / hour

Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis

viiika/Meissonic 10 Oct 2024

Diffusion models, such as Stable Diffusion, have made significant strides in visual generation, yet their paradigm remains fundamentally different from autoregressive language models, complicating the development of unified language-vision models.

Feature Compression Image Generation

78
1.13 stars / hour

Making Images Real Again: A Comprehensive Survey on Deep Image Composition

bcmi/libcom 28 Jun 2021

We have also contributed the first image composition toolbox: libcom https://github. com/bcmi/libcom, which assembles 10+ image composition related functions (e. g., image blending, image harmonization, object placement, shadow generation, generative composition).

Image Harmonization

477
1.10 stars / hour

Agent S: An Open Agentic Framework that Uses Computers Like a Human

simular-ai/agent-s 10 Oct 2024

We present Agent S, an open agentic framework that enables autonomous interaction with computers through a Graphical User Interface (GUI), aimed at transforming human-computer interaction by automating complex, multi-step tasks.

AI Agent

117
0.92 stars / hour

HART: Efficient Visual Generation with Hybrid Autoregressive Transformer

mit-han-lab/hart 14 Oct 2024

To address these challenges, we present the hybrid tokenizer, which decomposes the continuous latents from the autoencoder into two components: discrete tokens representing the big picture and continuous tokens representing the residual components that cannot be represented by the discrete tokens.

Image Generation Image Reconstruction

64
0.92 stars / hour

Generalizable Humanoid Manipulation with Improved 3D Diffusion Policies

YanjieZe/Improved-3D-Diffusion-Policy 14 Oct 2024

Humanoid robots capable of autonomous operation in diverse environments have long been a goal for roboticists.

Camera Calibration Point Cloud Segmentation

24
0.88 stars / hour

Fast Feedforward 3D Gaussian Splatting Compression

yihangchen-ee/fcgs 10 Oct 2024

With 3D Gaussian Splatting (3DGS) advancing real-time and high-fidelity rendering for novel view synthesis, storage requirements pose challenges for their widespread adoption.

Novel View Synthesis

70
0.71 stars / hour

Libra: Building Decoupled Vision System on Large Language Models

yifanxu74/libra 16 May 2024

Specifically, we incorporate a routed visual expert with a cross-modal bridge module into a pretrained LLM to route the vision and language flows during attention computing to enable different attention patterns in inner-modal modeling and cross-modal interaction scenarios.

Language Modelling Large Language Model

121
0.69 stars / hour

SceneCraft: Layout-Guided 3D Scene Generation

orangesodahub/scenecraft 11 Oct 2024

The creation of complex 3D scenes tailored to user specifications has been a tedious and challenging task with traditional 3D modeling tools.

3D Generation Scene Generation +1

47
0.68 stars / hour

Posterior-Mean Rectified Flow: Towards Minimum MSE Photo-Realistic Image Restoration

ohayonguy/PMRF 1 Oct 2024

Photo-realistic image restoration algorithms are typically evaluated by distortion measures (e. g., PSNR, SSIM) and by perceptual quality measures (e. g., FID, NIQE), where the desire is to attain the lowest possible distortion without compromising on perceptual quality.

 Ranked #1 on Blind Face Restoration on CelebA-Test (FID metric)

Blind Face Restoration Image Colorization +5

414
0.67 stars / hour