DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks

zzzhang-jx/docres 7 May 2024

This underscores the potential of DocRes across a broader spectrum of document image restoration tasks.

Binarization Deblurring +3

57
1.21 stars / hour

CLLMs: Consistency Large Language Models

hao-ai-lab/Consistency_LLM 28 Feb 2024

Parallel decoding methods such as Jacobi decoding show promise for more efficient LLM inference as it breaks the sequential nature of the LLM decoding process and transforms it into parallelizable computation.

185
1.17 stars / hour

PuLID: Pure and Lightning ID Customization via Contrastive Alignment

tothebeginning/pulid 24 Apr 2024

We propose Pure and Lightning ID customization (PuLID), a novel tuning-free ID customization method for text-to-image generation.

Text-to-Image Generation

596
1.08 stars / hour

AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding

x-lance/anitalker 6 May 2024

The paper introduces AniTalker, an innovative framework designed to generate lifelike talking faces from a single portrait.

Metric Learning Self-Supervised Learning

78
1.04 stars / hour

VILA: On Pre-training for Visual Language Models

efficient-large-model/vila 12 Dec 2023

Visual language models (VLMs) rapidly progressed with the recent success of large language models.

In-Context Learning Language Modelling +2

559
0.93 stars / hour

AgentScope: A Flexible yet Robust Multi-Agent Platform

modelscope/agentscope 21 Feb 2024

With the rapid advancement of Large Language Models (LLMs), significant progress has been made in multi-agent applications.

Multi-agent Integration

1,413
0.92 stars / hour

How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

opengvlab/internvl 25 Apr 2024

Compared to both open-source and proprietary models, InternVL 1. 5 shows competitive performance, achieving state-of-the-art results in 8 of 18 benchmarks.

4k Language Modelling +3

1,671
0.72 stars / hour

AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent

thudm/autowebglm 4 Apr 2024

Large language models (LLMs) have fueled many intelligent agent tasks, such as web navigation -- but most existing agents perform far from satisfying in real-world webpages due to three factors: (1) the versatility of actions on webpages, (2) HTML text exceeding model processing capacity, and (3) the complexity of decision-making due to the open-domain nature of web.

Decision Making Language Modelling +1

438
0.71 stars / hour

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

mcgill-nlp/llm2vec 9 Apr 2024

We outperform encoder-only models by a large margin on word-level tasks and reach a new unsupervised state-of-the-art performance on the Massive Text Embeddings Benchmark (MTEB).

Contrastive Learning Decoder

554
0.66 stars / hour

AM-RADIO: Agglomerative Vision Foundation Model -- Reduce All Domains Into One

nvlabs/radio 10 Dec 2023

A handful of visual foundation models (VFMs) have recently emerged as the backbones for numerous downstream tasks.

Benchmarking object-detection +2

314
0.62 stars / hour