StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text

picsart-ai-research/streamingt2v 21 Mar 2024

To overcome these limitations, we introduce StreamingT2V, an autoregressive approach for long video generation of 80, 240, 600, 1200 or more frames with smooth transitions.

Text-to-Video Generation Video Generation

514
0.79 stars / hour

FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects

NVlabs/FoundationPose 13 Dec 2023

We present FoundationPose, a unified foundation model for 6D object pose estimation and tracking, supporting both model-based and model-free setups.

3D Object Detection 3D Object Tracking +7

720
0.74 stars / hour

SchurVINS: Schur Complement-Based Lightweight Visual Inertial Navigation System

bytedance/schurvins 4 Dec 2023

To this end, we propose a novel filter-based VINS framework named SchurVINS, which could guarantee both high accuracy by building a complete residual model and low computational complexity with Schur complement.

Computational Efficiency

78
0.71 stars / hour

Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models

haozheliu-st/t-gate 3 Apr 2024

This study explores the role of cross-attention during inference in text-conditional diffusion models.

109
0.70 stars / hour

Advancing LLM Reasoning Generalists with Preference Trees

openbmb/eurus 2 Apr 2024

We introduce Eurus, a suite of large language models (LLMs) optimized for reasoning.

Benchmarking Code Generation +1

85
0.53 stars / hour

CameraCtrl: Enabling Camera Control for Text-to-Video Generation

hehao13/cameractrl 2 Apr 2024

Controllability plays a crucial role in video generation since it allows users to create desired content.

Text-to-Video Generation Video Generation

150
0.49 stars / hour

ChangeMamba: Remote Sensing Change Detection with Spatio-Temporal State Space Model

chenhongruixuan/mambacd 4 Apr 2024

For the change decoder, which is available in all three architectures, we propose three spatio-temporal relationship modeling mechanisms, which can be naturally combined with the Mamba architecture and fully utilize its attribute to achieve spatio-temporal interaction of multi-temporal features and obtain accurate change information.

Attribute Change Detection

64
0.49 stars / hour

Position Paper: What Can Large Language Models Tell Us about Time Series Analysis

kimmeen/time-llm 5 Feb 2024

Time series analysis is essential for comprehending the complexities inherent in various real-world systems and applications.

Decision Making Position +3

591
0.47 stars / hour

UniTable: Towards a Unified Framework for Table Structure Recognition via Self-Supervised Pretraining

poloclub/unitable 7 Mar 2024

Tables convey factual and quantitative data with implicit conventions created by humans that are often challenging for machines to parse.

Language Modelling

67
0.46 stars / hour

SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation

Xiaojiu-z/SSR_Encoder 26 Dec 2023

Recent advancements in subject-driven image generation have led to zero-shot generation, yet precise selection and focus on crucial subject representations remain challenging.

Image Generation

45
0.44 stars / hour