PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting

cvlab-kaist/PF3plat 29 Oct 2024

We then introduce lightweight, learnable modules to refine depth and pose estimates from the coarse alignments, improving the quality of 3D reconstruction and novel view synthesis.

3D Reconstruction Monocular Depth Estimation +1

54
0.65 stars / hour

AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions

multimodal-art-projection/AutoKaggle 27 Oct 2024

Data science tasks involving tabular data present complex challenges that require sophisticated problem-solving approaches.

Feature Engineering

60
0.63 stars / hour

Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation

fudan-generative-vision/hallo2 10 Oct 2024

To the best of our knowledge, Hallo2, proposed in this paper, is the first method to achieve 4K resolution and generate hour-long, audio-driven portrait image animations enhanced with textual prompts.

4k Image Animation +2

3,276
0.63 stars / hour

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

SWivid/F5-TTS 9 Oct 2024

This sampling strategy for flow step can be easily applied to existing flow matching based models without retraining.

Denoising Text to Speech

6,261
0.62 stars / hour

CrossEarth: Geospatial Vision Foundation Model for Domain Generalizable Remote Sensing Semantic Segmentation

cuzyoung/crossearth 30 Oct 2024

The field of Remote Sensing Domain Generalization (RSDG) has emerged as a critical and valuable research frontier, focusing on developing models that generalize effectively across diverse scenarios.

Domain Generalization Segmentation +1

36
0.60 stars / hour

Tora: Trajectory-oriented Diffusion Transformer for Video Generation

alibaba/Tora 31 Jul 2024

The TE encodes arbitrary trajectories into hierarchical spacetime motion patches with a 3D video compression network.

Video Compression Video Generation

506
0.59 stars / hour

Agent S: An Open Agentic Framework that Uses Computers Like a Human

simular-ai/agent-s 10 Oct 2024

We present Agent S, an open agentic framework that enables autonomous interaction with computers through a Graphical User Interface (GUI), aimed at transforming human-computer interaction by automating complex, multi-step tasks.

AI Agent

464
0.58 stars / hour

LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA

THUDM/LongCite 4 Sep 2024

Though current long-context large language models (LLMs) have demonstrated impressive capacities in answering user questions based on extensive text, the lack of citations in their responses makes user verification difficult, leading to concerns about their trustworthiness due to their potential hallucinations.

Question Answering Sentence

389
0.55 stars / hour

DepthSplat: Connecting Gaussian Splatting and Depth

cvg/depthsplat 17 Oct 2024

Gaussian splatting and single/multi-view depth estimation are typically studied in isolation.

Depth Estimation Novel View Synthesis +1

479
0.54 stars / hour

SMITE: Segment Me In TimE

alimohammadiamirhossein/smite 24 Oct 2024

Segmenting an object in a video presents significant challenges.

Segmentation Video Object Segmentation +1

101
0.53 stars / hour