Trending Research

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Beomi/InfiniTransformer • • 10 Apr 2024

This work introduces an efficient method to scale Transformer-based Large Language Models (LLMs) to infinitely long inputs with bounded memory and computation.

Book summarization Language Modelling +1

183

0.75 stars / hour

Paper
Code

StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text

picsart-ai-research/streamingt2v • • 21 Mar 2024

To overcome these limitations, we introduce StreamingT2V, an autoregressive approach for long video generation of 80, 240, 600, 1200 or more frames with smooth transitions.

Text-to-Video Generation Video Generation

870

0.75 stars / hour

Paper
Code

SoccerNet Game State Reconstruction: End-to-End Athlete Tracking and Identification on a Minimap

soccernet/sn-gamestate • • 17 Apr 2024

This tracking and identification process is crucial for reconstructing the game state, defined by the athletes' positions and identities on a 2D top-view of the pitch, (i. e. a minimap).

Ranked #1 on Game State Reconstruction on SoccerNet-GSR

Camera Calibration Game State Reconstruction

0.66 stars / hour

Paper
Code

Learning with 3D rotations, a hitchhiker's guide to SO(3)

martius-lab/hitchhiking-rotations • • 17 Apr 2024

Many settings in machine learning require the selection of a rotation representation.

0.63 stars / hour

Paper
Code

InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation

instantstyle/instantstyle • • 3 Apr 2024

Tuning-free diffusion-based models have demonstrated significant potential in the realm of image personalization and customization.

Text-to-Image Generation

1,104

0.54 stars / hour

Paper
Code

Probing the 3D Awareness of Visual Foundation Models

mbanani/probe3d • • 12 Apr 2024

Given that such models can classify, delineate, and localize objects in 2D, we ask whether they also represent their 3D structure?

178

0.54 stars / hour

Paper
Code

WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks?

servicenow/browsergym • 12 Mar 2024

We study the use of large language model-based agents for interacting with software via web browsers.

Language Modelling Large Language Model

107

0.51 stars / hour

Paper
Code

MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators

pku-yuangroup/magictime • • 7 Apr 2024

Recent advances in Text-to-Video generation (T2V) have achieved remarkable success in synthesizing high-quality general videos from textual descriptions.

Text-to-Video Generation Video Generation

1,028

0.51 stars / hour

Paper
Code