Trending Research

DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks

zzzhang-jx/docres • • 7 May 2024

This underscores the potential of DocRes across a broader spectrum of document image restoration tasks.

1.21 stars / hour

Paper
Code

CLLMs: Consistency Large Language Models

hao-ai-lab/Consistency_LLM • • 28 Feb 2024

Parallel decoding methods such as Jacobi decoding show promise for more efficient LLM inference as it breaks the sequential nature of the LLM decoding process and transforms it into parallelizable computation.

185

1.17 stars / hour

Paper
Code

PuLID: Pure and Lightning ID Customization via Contrastive Alignment

tothebeginning/pulid • • 24 Apr 2024

We propose Pure and Lightning ID customization (PuLID), a novel tuning-free ID customization method for text-to-image generation.

Text-to-Image Generation

596

1.08 stars / hour

Paper
Code

AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding

x-lance/anitalker • • 6 May 2024

The paper introduces AniTalker, an innovative framework designed to generate lifelike talking faces from a single portrait.

Metric Learning Self-Supervised Learning

1.04 stars / hour

Paper
Code

VILA: On Pre-training for Visual Language Models

efficient-large-model/vila • • 12 Dec 2023

Visual language models (VLMs) rapidly progressed with the recent success of large language models.

Ranked #24 on Visual Question Answering on MM-Vet

In-Context Learning Language Modelling +2

559

0.93 stars / hour

Paper
Code

AgentScope: A Flexible yet Robust Multi-Agent Platform

modelscope/agentscope • 21 Feb 2024

With the rapid advancement of Large Language Models (LLMs), significant progress has been made in multi-agent applications.

Multi-agent Integration

1,413

0.92 stars / hour

Paper
Code

How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

opengvlab/internvl • • 25 Apr 2024

Compared to both open-source and proprietary models, InternVL 1. 5 shows competitive performance, achieving state-of-the-art results in 8 of 18 benchmarks.

Ranked #6 on Visual Question Answering on MM-Vet

4k Language Modelling +3

1,671

0.72 stars / hour

Paper
Code

AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent

thudm/autowebglm • 4 Apr 2024

Large language models (LLMs) have fueled many intelligent agent tasks, such as web navigation -- but most existing agents perform far from satisfying in real-world webpages due to three factors: (1) the versatility of actions on webpages, (2) HTML text exceeding model processing capacity, and (3) the complexity of decision-making due to the open-domain nature of web.

Decision Making Language Modelling +1

438

0.71 stars / hour

Paper
Code

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

mcgill-nlp/llm2vec • • 9 Apr 2024

We outperform encoder-only models by a large margin on word-level tasks and reach a new unsupervised state-of-the-art performance on the Massive Text Embeddings Benchmark (MTEB).

Contrastive Learning Decoder

554

0.66 stars / hour

Paper
Code

AM-RADIO: Agglomerative Vision Foundation Model -- Reduce All Domains Into One

nvlabs/radio • • 10 Dec 2023

A handful of visual foundation models (VFMs) have recently emerged as the backbones for numerous downstream tasks.

Benchmarking object-detection +2

314

0.62 stars / hour

Paper
Code