YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information

wongkinyiu/yolov9 21 Feb 2024

It can be used to obtain complete information, so that train-from-scratch models can achieve better results than state-of-the-art models pre-trained using large datasets, the comparison results are shown in Figure 1.

object-detection Object Detection

3,222
10.11 stars / hour

Scalable Diffusion Models with Transformers

facebookresearch/DiT ICCV 2023

We explore a new class of diffusion models based on the transformer architecture.

Image Generation

3,753
2.93 stars / hour

Neural Network Diffusion

nus-hpc-ai-lab/neural-network-diffusion 20 Feb 2024

The autoencoder extracts latent representations of a subset of the trained network parameters.

Video Generation

335
2.07 stars / hour

GaussianObject: Just Taking Four Images to Get A High-Quality 3D Object with Gaussian Splatting

GaussianObject/GaussianObject 15 Feb 2024

Then we construct a Gaussian repair model based on diffusion models to supplement the omitted object information, where Gaussians are further refined.

Neural Rendering Object

430
1.96 stars / hour

UFO: A UI-Focused Agent for Windows OS Interaction

microsoft/UFO 8 Feb 2024

We introduce UFO, an innovative UI-Focused agent to fulfill user requests tailored to applications on Windows OS, harnessing the capabilities of GPT-Vision.

Navigate

2,067
1.95 stars / hour

Revisiting Feature Prediction for Learning Visual Representations from Video

facebookresearch/jepa arXiv preprint 2024

This paper explores feature prediction as a stand-alone objective for unsupervised learning from video and introduces V-JEPA, a collection of vision models trained solely using a feature prediction objective, without the use of pretrained image encoders, text, negative examples, reconstruction, or other sources of supervision.

1,556
1.81 stars / hour

FiT: Flexible Vision Transformer for Diffusion Model

whlzy/fit 19 Feb 2024

Nature is infinitely resolution-free.

Image Cropping

202
1.50 stars / hour

SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers

willisma/sit 16 Jan 2024

We present Scalable Interpolant Transformers (SiT), a family of generative models built on the backbone of Diffusion Transformers (DiT).

Image Generation

348
1.43 stars / hour

World Model on Million-Length Video And Language With RingAttention

LargeWorldModel/LWM 13 Feb 2024

This work paves the way for training on massive datasets of long video and language to develop understanding of both human knowledge and the multimodal world, and broader capabilities.

Video Understanding

5,978
1.30 stars / hour

YOLO-World: Real-Time Open-Vocabulary Object Detection

ailab-cvc/yolo-world 30 Jan 2024

The You Only Look Once (YOLO) series of detectors have established themselves as efficient and practical tools.

Instance Segmentation Language Modelling +4

2,265
1.11 stars / hour