YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information

wongkinyiu/yolov9 21 Feb 2024

It can be used to obtain complete information, so that train-from-scratch models can achieve better results than state-of-the-art models pre-trained using large datasets, the comparison results are shown in Figure 1.

object-detection Object Detection

Scalable Diffusion Models with Transformers

facebookresearch/DiT ICCV 2023

We explore a new class of diffusion models based on the transformer architecture.

Image Generation

Neural Network Diffusion

nus-hpc-ai-lab/neural-network-diffusion 20 Feb 2024

The autoencoder extracts latent representations of a subset of the trained network parameters.

Video Generation

Intent-based Prompt Calibration: Enhancing prompt optimization with synthetic boundary cases

eladlev/autoprompt 5 Feb 2024

Recent studies have demonstrated the capabilities of LLMs to automatically conduct prompt engineering by employing a meta-prompt that incorporates the outcomes of the last trials and proposes an improved prompt.

Prompt Engineering

GaussianObject: Just Taking Four Images to Get A High-Quality 3D Object with Gaussian Splatting

GaussianObject/GaussianObject 15 Feb 2024

Then we construct a Gaussian repair model based on diffusion models to supplement the omitted object information, where Gaussians are further refined.

Neural Rendering Object

UFO: A UI-Focused Agent for Windows OS Interaction

microsoft/UFO 8 Feb 2024

We introduce UFO, an innovative UI-Focused agent to fulfill user requests tailored to applications on Windows OS, harnessing the capabilities of GPT-Vision.


Cleaner Pretraining Corpus Curation with Neural Web Scraping

openmatch/neuscraper 22 Feb 2024

The web contains large-scale, diverse, and abundant information to satisfy the information-seeking needs of humans.

Language Modelling

FiT: Flexible Vision Transformer for Diffusion Model

whlzy/fit 19 Feb 2024

Nature is infinitely resolution-free.

Image Cropping

YOLO-World: Real-Time Open-Vocabulary Object Detection

ailab-cvc/yolo-world 30 Jan 2024

The You Only Look Once (YOLO) series of detectors have established themselves as efficient and practical tools.

Instance Segmentation Language Modelling +4

Revisiting Feature Prediction for Learning Visual Representations from Video

facebookresearch/jepa arXiv preprint 2024

This paper explores feature prediction as a stand-alone objective for unsupervised learning from video and introduces V-JEPA, a collection of vision models trained solely using a feature prediction objective, without the use of pretrained image encoders, text, negative examples, reconstruction, or other sources of supervision.

