Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Beomi/InfiniTransformer 10 Apr 2024

This work introduces an efficient method to scale Transformer-based Large Language Models (LLMs) to infinitely long inputs with bounded memory and computation.

Book summarization Language Modelling +1

168
0.75 stars / hour

StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text

picsart-ai-research/streamingt2v 21 Mar 2024

To overcome these limitations, we introduce StreamingT2V, an autoregressive approach for long video generation of 80, 240, 600, 1200 or more frames with smooth transitions.

Text-to-Video Generation Video Generation

848
0.75 stars / hour

SoccerNet Game State Reconstruction: End-to-End Athlete Tracking and Identification on a Minimap

soccernet/sn-gamestate 17 Apr 2024

This tracking and identification process is crucial for reconstructing the game state, defined by the athletes' positions and identities on a 2D top-view of the pitch, (i. e. a minimap).

Camera Calibration

59
0.66 stars / hour

Learning with 3D rotations, a hitchhiker's guide to SO(3)

martius-lab/hitchhiking-rotations 17 Apr 2024

Many settings in machine learning require the selection of a rotation representation.

38
0.63 stars / hour

InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation

instantstyle/instantstyle 3 Apr 2024

Tuning-free diffusion-based models have demonstrated significant potential in the realm of image personalization and customization.

Text-to-Image Generation

1,086
0.54 stars / hour

Probing the 3D Awareness of Visual Foundation Models

mbanani/probe3d 12 Apr 2024

Given that such models can classify, delineate, and localize objects in 2D, we ask whether they also represent their 3D structure?

177
0.54 stars / hour

WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks?

servicenow/browsergym 12 Mar 2024

We study the use of large language model-based agents for interacting with software via web browsers.

Language Modelling Large Language Model

104
0.51 stars / hour

MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators

pku-yuangroup/magictime 7 Apr 2024

Recent advances in Text-to-Video generation (T2V) have achieved remarkable success in synthesizing high-quality general videos from textual descriptions.

Text-to-Video Generation Video Generation

1,019
0.51 stars / hour

Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like Speed

zju3dv/efficientloftr 7 Mar 2024

Furthermore, we find spatial variance exists in LoFTR's fine correlation module, which is adverse to matching accuracy.

3D Reconstruction Image Retrieval

330
0.45 stars / hour