UFO: A UI-Focused Agent for Windows OS Interaction

microsoft/UFO 8 Feb 2024

We introduce UFO, an innovative UI-Focused agent to fulfill user requests tailored to applications on Windows OS, harnessing the capabilities of GPT-Vision.

Navigate

3,869
0.39 stars / hour

ReFT: Representation Finetuning for Language Models

stanfordnlp/pyreft 4 Apr 2024

LoReFT is a drop-in replacement for existing PEFTs and learns interventions that are 10x-50x more parameter-efficient than prior state-of-the-art PEFTs.

Arithmetic Reasoning

549
0.39 stars / hour

WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct

nlpxucan/wizardlm 18 Aug 2023

Through extensive experiments on two mathematical reasoning benchmarks, namely GSM8k and MATH, we reveal the extraordinary capabilities of our model.

Ranked #49 on Arithmetic Reasoning on GSM8K (using extra training data)

Arithmetic Reasoning GSM8K +2

8,776
0.36 stars / hour

AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation

scutzzj/aniportrait 26 Mar 2024

In this study, we propose AniPortrait, a novel framework for generating high-quality animation driven by audio and a reference portrait image.

Face Reenactment

3,513
0.36 stars / hour

TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models

yuyq96/texthawk 14 Apr 2024

We conduct extensive experiments on both general and document-oriented MLLM benchmarks, and show that TextHawk outperforms the state-of-the-art methods, demonstrating its effectiveness and superiority in fine-grained document perception and general abilities.

19
0.35 stars / hour

Bilateral Reference for High-Resolution Dichotomous Image Segmentation

zhengpeng7/birefnet 7 Jan 2024

It comprises two essential components: the localization module (LM) and the reconstruction module (RM) with our proposed bilateral reference (BiRef).

 Ranked #1 on RGB Salient Object Detection on HRSOD (using extra training data)

Camouflaged Object Segmentation Dichotomous Image Segmentation +3

143
0.35 stars / hour

PCToolkit: A Unified Plug-and-Play Prompt Compression Toolkit of Large Language Models

3DAgentWorld/Toolkit-for-Prompt-Compression 26 Mar 2024

Prompt compression is an innovative method for efficiently condensing input prompts while preserving essential information.

Code Completion Few-Shot Learning +2

130
0.33 stars / hour

Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation

showlab/show-1 27 Sep 2023

In this paper, we are the first to propose a hybrid model, dubbed as Show-1, which marries pixel-based and latent-based VDMs for text-to-video generation.

Text-to-Video Generation Video Alignment +1

830
0.33 stars / hour

MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

boheumd/MA-LMM 8 Apr 2024

However, existing LLM-based large multimodal models (e. g., Video-LLaMA, VideoChat) can only take in a limited number of frames for short video understanding.

Question Answering Video Captioning +4

100
0.32 stars / hour

Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance

KU-CVLAB/Perturbed-Attention-Guidance 26 Mar 2024

These techniques are often not applicable in unconditional generation or in various downstream tasks such as image restoration.

Deblurring Denoising +2

156
0.31 stars / hour