World Model on Million-Length Video And Language With RingAttention

LargeWorldModel/LWM 13 Feb 2024

This work paves the way for training on massive datasets of long video and language to develop understanding of both human knowledge and the multimodal world, and broader capabilities.

Video Understanding

UFO: A UI-Focused Agent for Windows OS Interaction

microsoft/UFO 8 Feb 2024

We introduce UFO, an innovative UI-Focused agent to fulfill user requests tailored to applications on Windows OS, harnessing the capabilities of GPT-Vision.


Revisiting Feature Prediction for Learning Visual Representations from Video

facebookresearch/jepa arXiv preprint 2024

This paper explores feature prediction as a stand-alone objective for unsupervised learning from video and introduces V-JEPA, a collection of vision models trained solely using a feature prediction objective, without the use of pretrained image encoders, text, negative examples, reconstruction, or other sources of supervision.

Scalable Diffusion Models with Transformers

facebookresearch/DiT ICCV 2023

We explore a new class of diffusion models based on the transformer architecture.

Image Generation

YOLO-World: Real-Time Open-Vocabulary Object Detection

ailab-cvc/yolo-world 30 Jan 2024

The You Only Look Once (YOLO) series of detectors have established themselves as efficient and practical tools.

Instance Segmentation Language Modelling +4

DoRA: Weight-Decomposed Low-Rank Adaptation

catid/dora 14 Feb 2024

By employing DoRA, we enhance both the learning capacity and training stability of LoRA while avoiding any additional inference overhead.

DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows

datadreamer-dev/datadreamer 16 Feb 2024

The rapid rise to prominence of these models and these unique challenges has had immediate adverse impacts on open science and on the reproducibility of work that uses them.

Synthetic Data Generation

OpenFedLLM: Training Large Language Models on Decentralized Private Data via Federated Learning

rui-ye/openfedllm 10 Feb 2024

Trained on massive publicly available data, large language models (LLMs) have demonstrated tremendous success across various fields.

Federated Learning Instruction Following +1

Generative Representational Instruction Tuning

contextualai/gritlm 15 Feb 2024

Notably, we find that GRIT matches training on only generative or embedding data, thus we can unify both at no performance loss.

Language Modelling Large Language Model +1

GraphCast: Learning skillful medium-range global weather forecasting

google-deepmind/graphcast 24 Dec 2022

Global medium-range weather forecasting is critical to decision-making across many social and economic domains.

Decision Making Weather Forecasting

