Trending Research

SnapKV: LLM Knows What You are Looking for Before Generation

fasterdecoding/snapkv • • 22 Apr 2024

Specifically, SnapKV achieves a consistent decoding speed with a 3. 6x increase in generation speed and an 8. 2x enhancement in memory efficiency compared to baseline when processing inputs of 16K tokens.

16k

102

0.44 stars / hour

Paper
Code

PromptBench: A Unified Library for Evaluation of Large Language Models

microsoft/promptbench • • 13 Dec 2023

The evaluation of large language models (LLMs) is crucial to assess their performance and mitigate potential security risks.

Prompt Engineering

2,068

0.44 stars / hour

Paper
Code

TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models

jishengpeng/TextrolSpeech • 28 Aug 2023

The dataset comprises 236, 220 pairs of style prompt in natural text descriptions with five style factors and corresponding speech samples.

Language Modelling

0.43 stars / hour

Paper
Code

BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion

tencentarc/brushnet • • 11 Mar 2024

Image inpainting, the process of restoring corrupted images, has seen significant advancements with the advent of diffusion models (DMs).

Image Inpainting

945

0.43 stars / hour

Paper
Code

Generative Agents: Interactive Simulacra of Human Behavior

a16z-infra/ai-town • 7 Apr 2023

Believable proxies of human behavior can empower interactive applications ranging from immersive environments to rehearsal spaces for interpersonal communication to prototyping tools.

Language Modelling Large Language Model

6,653

0.42 stars / hour

Paper
Code

TokenHMR: Advancing Human Mesh Recovery with a Tokenized Pose Representation

saidwivedi/TokenHMR • 25 Apr 2024

We address the problem of regressing 3D human pose and shape from a single image, with a focus on 3D accuracy.

Human Mesh Recovery valid

0.41 stars / hour

Paper
Code

EasySpider: A No-Code Visual System for Crawling the Web

NaiboWang/EasySpider • ACM The Web Conference 2023

As such, web-crawling is an essential tool for both computational and non-computational scientists to conduct research.

Data Integration Marketing

22,635

0.41 stars / hour

Paper
Code

UFO: A UI-Focused Agent for Windows OS Interaction

microsoft/UFO • 8 Feb 2024

We introduce UFO, an innovative UI-Focused agent to fulfill user requests tailored to applications on Windows OS, harnessing the capabilities of GPT-Vision.

Navigate

4,310

0.41 stars / hour

Paper
Code

Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation

showlab/show-1 • • 27 Sep 2023

In this paper, we are the first to propose a hybrid model, dubbed as Show-1, which marries pixel-based and latent-based VDMs for text-to-video generation.

Ranked #2 on Text-to-Video Generation on EvalCrafter Text-to-Video (ECTV) Dataset (using extra training data)

Text-to-Video Generation Video Alignment +1

1,058

0.40 stars / hour

Paper
Code

GaussianTalker: Real-Time High-Fidelity Talking Head Synthesis with Audio-Driven 3D Gaussian Splatting

ku-cvlab/gaussiantalker • • 24 Apr 2024

A key insight is to encode the 3D Gaussian attributes into a shared implicit feature representation, where it is merged with audio features to manipulate each Gaussian attribute.

Attribute

0.39 stars / hour

Paper
Code