YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information

wongkinyiu/yolov9 21 Feb 2024

It can be used to obtain complete information, so that train-from-scratch models can achieve better results than state-of-the-art models pre-trained using large datasets, the comparison results are shown in Figure 1.

object-detection Object Detection

5,631
12.01 stars / hour

Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing

sally-sh/vsp-llm 23 Feb 2024

In visual speech processing, context modeling capability is one of the most important requirements due to the ambiguous nature of lip movements.

speech-recognition Translation +1

193
3.66 stars / hour

Intent-based Prompt Calibration: Enhancing prompt optimization with synthetic boundary cases

eladlev/autoprompt 5 Feb 2024

Recent studies have demonstrated the capabilities of LLMs to automatically conduct prompt engineering by employing a meta-prompt that incorporates the outcomes of the last trials and proposes an improved prompt.

Prompt Engineering

756
2.20 stars / hour

UFO: A UI-Focused Agent for Windows OS Interaction

microsoft/UFO 8 Feb 2024

We introduce UFO, an innovative UI-Focused agent to fulfill user requests tailored to applications on Windows OS, harnessing the capabilities of GPT-Vision.

Navigate

2,287
1.46 stars / hour

Neural Network Diffusion

nus-hpc-ai-lab/neural-network-diffusion 20 Feb 2024

The autoencoder extracts latent representations of a subset of the trained network parameters.

Video Generation

464
1.33 stars / hour

Scalable Diffusion Models with Transformers

facebookresearch/DiT ICCV 2023

We explore a new class of diffusion models based on the transformer architecture.

Image Generation

4,043
1.23 stars / hour

Vectorized and performance-portable Quicksort

google/highway 12 May 2022

Recent works showed that implementations of Quicksort using vector CPU instructions can outperform the non-vectorized algorithms in widespread use.

3,498
1.12 stars / hour

Cleaner Pretraining Corpus Curation with Neural Web Scraping

openmatch/neuscraper 22 Feb 2024

The web contains large-scale, diverse, and abundant information to satisfy the information-seeking needs of humans.

Language Modelling

132
1.03 stars / hour

AlphaFold Meets Flow Matching for Generating Protein Ensembles

bjing2016/alphaflow 7 Feb 2024

When trained and evaluated on the PDB, our method provides a superior combination of precision and diversity compared to AlphaFold with MSA subsampling.

155
1.00 stars / hour

World Model on Million-Length Video And Language With RingAttention

LargeWorldModel/LWM 13 Feb 2024

This work paves the way for training on massive datasets of long video and language to develop understanding of both human knowledge and the multimodal world, and broader capabilities.

Video Understanding

6,166
0.70 stars / hour