Trending Research

UFO: A UI-Focused Agent for Windows OS Interaction

microsoft/UFO • 8 Feb 2024

We introduce UFO, an innovative UI-Focused agent to fulfill user requests tailored to applications on Windows OS, harnessing the capabilities of GPT-Vision.

Navigate

3,835

0.39 stars / hour

Paper
Code

ReFT: Representation Finetuning for Language Models

stanfordnlp/pyreft • • 4 Apr 2024

LoReFT is a drop-in replacement for existing PEFTs and learns interventions that are 10x-50x more parameter-efficient than prior state-of-the-art PEFTs.

Arithmetic Reasoning

539

0.39 stars / hour

Paper
Code

WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct

nlpxucan/wizardlm • • 18 Aug 2023

Through extensive experiments on two mathematical reasoning benchmarks, namely GSM8k and MATH, we reveal the extraordinary capabilities of our model.

Ranked #49 on Arithmetic Reasoning on GSM8K (using extra training data)

Arithmetic Reasoning GSM8K +2

8,776

0.36 stars / hour

Paper
Code

AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation

scutzzj/aniportrait • • 26 Mar 2024

In this study, we propose AniPortrait, a novel framework for generating high-quality animation driven by audio and a reference portrait image.

Face Reenactment

3,493

0.36 stars / hour

Paper
Code

TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models

yuyq96/texthawk • 14 Apr 2024

We conduct extensive experiments on both general and document-oriented MLLM benchmarks, and show that TextHawk outperforms the state-of-the-art methods, demonstrating its effectiveness and superiority in fine-grained document perception and general abilities.

0.35 stars / hour

Paper
Code

Bilateral Reference for High-Resolution Dichotomous Image Segmentation

zhengpeng7/birefnet • • 7 Jan 2024

It comprises two essential components: the localization module (LM) and the reconstruction module (RM) with our proposed bilateral reference (BiRef).

Ranked #1 on RGB Salient Object Detection on HRSOD (using extra training data)

Camouflaged Object Segmentation Dichotomous Image Segmentation +3

139

0.35 stars / hour

Paper
Code

PCToolkit: A Unified Plug-and-Play Prompt Compression Toolkit of Large Language Models

3DAgentWorld/Toolkit-for-Prompt-Compression • • 26 Mar 2024

Prompt compression is an innovative method for efficiently condensing input prompts while preserving essential information.

Code Completion Few-Shot Learning +2

124

0.33 stars / hour

Paper
Code

Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation

showlab/show-1 • • 27 Sep 2023

In this paper, we are the first to propose a hybrid model, dubbed as Show-1, which marries pixel-based and latent-based VDMs for text-to-video generation.

Ranked #2 on Text-to-Video Generation on EvalCrafter Text-to-Video (ECTV) Dataset (using extra training data)

Text-to-Video Generation Video Alignment +1

809

0.33 stars / hour

Paper
Code

MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

boheumd/MA-LMM • • 8 Apr 2024

However, existing LLM-based large multimodal models (e. g., Video-LLaMA, VideoChat) can only take in a limited number of frames for short video understanding.

Ranked #1 on Video Classification on COIN

Question Answering Video Captioning +4

0.32 stars / hour

Paper
Code

Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance

KU-CVLAB/Perturbed-Attention-Guidance • • 26 Mar 2024

These techniques are often not applicable in unconditional generation or in various downstream tasks such as image restoration.

Deblurring Denoising +2

151

0.31 stars / hour

Paper
Code