MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views

donydchen/mvsplat360 7 Nov 2024

To evaluate MVSplat360's performance, we introduce a new benchmark using the challenging DL3DV-10K dataset, where MVSplat360 achieves superior visual quality compared to state-of-the-art methods on wide-sweeping or even 360{\deg} NVS tasks.

3D Reconstruction Denoising +2

39
1.29 stars / hour

WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning

THUDM/WebRL 4 Nov 2024

Specifically, WebRL incorporates 1) a self-evolving curriculum that generates new tasks from unsuccessful attempts, 2) a robust outcome-supervised reward model (ORM), and 3) adaptive reinforcement learning strategies to ensure consistent improvements.

145
1.24 stars / hour

Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant

homebrewltd/ichigo 20 Oct 2024

Large Language Models (LLMs) have revolutionized natural language processing, but their application to speech-based tasks remains challenging due to the complexities of integrating audio and text modalities.

Question Answering speech-recognition +1

1,714
1.19 stars / hour

GameGen-X: Interactive Open-world Game Video Generation

gamegen-x/gamegen-x 1 Nov 2024

To realize this vision, we first collected and built an Open-World Video Game Dataset from scratch.

Text-to-Video Generation Video Generation

134
1.18 stars / hour

Classification Done Right for Vision-Language Pre-Training

x-cls/superclass 5 Nov 2024

Due to the absence of the text encoding as contrastive target, SuperClass does not require a text encoder and does not need to maintain a large batch size as CLIP does.

Classification

72
1.09 stars / hour

A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale

facebookresearch/optimizers 12 Sep 2023

It constructs a block-diagonal preconditioner where each block consists of a coarse Kronecker product approximation to full-matrix AdaGrad for each parameter of the neural network.

Stochastic Optimization

412
0.92 stars / hour

Addressing Representation Collapse in Vector Quantized Models with One Linear Layer

youngsheen/SimVQ 4 Nov 2024

However, VQ models are often hindered by the problem of representation collapse in the latent space, which leads to low codebook utilization and limits the scalability of the codebook for large-scale training.

Quantization Representation Learning

89
0.91 stars / hour

AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents

THUDM/Android-Lab 31 Oct 2024

It supports both large language models (LLMs) and multimodal models (LMMs) in the same action space.

Benchmarking

114
0.89 stars / hour

DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation

shallowdream204/dreamclear 24 Oct 2024

Our second contribution, DreamClear, is a DiT-based image restoration model.

Image Restoration

712
0.83 stars / hour

No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images

cvg/NoPoSplat 31 Oct 2024

We utilize the reconstructed 3D Gaussians for novel view synthesis and pose estimation tasks and propose a two-stage coarse-to-fine pipeline for accurate pose estimation.

3D Reconstruction Generalizable Novel View Synthesis +2

430
0.80 stars / hour