Trending Research

KAN: Kolmogorov-Arnold Networks

kindxiaoming/pykan • • 30 Apr 2024

Inspired by the Kolmogorov-Arnold representation theorem, we propose Kolmogorov-Arnold Networks (KANs) as promising alternatives to Multi-Layer Perceptrons (MLPs).

8,514

8.11 stars / hour

Paper
Code

StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation

hvision-nku/storydiffusion • • 2 May 2024

This module converts the generated sequence of images into videos with smooth transitions and consistent subjects that are significantly more stable than the modules based on latent spaces only, especially in the context of long video generation.

motion prediction Story Generation +1

2,009

7.01 stars / hour

Paper
Code

Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models

prometheus-eval/prometheus-eval • 2 May 2024

Proprietary LMs such as GPT-4 are often employed to assess the quality of responses from various LMs.

Language Modelling

337

2.12 stars / hour

Paper
Code

AM-RADIO: Agglomerative Vision Foundation Model -- Reduce All Domains Into One

nvlabs/radio • • 10 Dec 2023

A handful of visual foundation models (VFMs) have recently emerged as the backbones for numerous downstream tasks.

Benchmarking object-detection +2

268

1.38 stars / hour

Paper
Code

Improving Diffusion Models for Virtual Try-on

yisol/IDM-VTON • • 8 Mar 2024

Finally, we present a customization method using a pair of person-garment images, which significantly improves fidelity and authenticity.

Ranked #1 on Virtual Try-on on VITON-HD

Virtual Try-on

1,895

1.27 stars / hour

Paper
Code

VILA: On Pre-training for Visual Language Models

efficient-large-model/vila • • 12 Dec 2023

Visual language models (VLMs) rapidly progressed with the recent success of large language models.

Ranked #23 on Visual Question Answering on MM-Vet

In-Context Learning Language Modelling +2

480

0.91 stars / hour

Paper
Code

PuLID: Pure and Lightning ID Customization via Contrastive Alignment

tothebeginning/pulid • • 24 Apr 2024

We propose Pure and Lightning ID customization (PuLID), a novel tuning-free ID customization method for text-to-image generation.

Text-to-Image Generation

472

0.88 stars / hour

Paper
Code

RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language Processing

2471023025/ralm_survey • 30 Apr 2024

Large Language Models (LLMs) have catalyzed significant advancements in Natural Language Processing (NLP), yet they encounter challenges such as hallucination and the need for domain-specific knowledge.

Computational Efficiency Hallucination +2

123

0.86 stars / hour

Paper
Code

Spectrally Pruned Gaussian Fields with Neural Compensation

runyiyang/sundae • 1 May 2024

However, this comes with high memory consumption, e. g., a well-trained Gaussian field may utilize three million Gaussian primitives and over 700 MB of memory.

0.78 stars / hour

Paper
Code

How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

opengvlab/internvl • • 25 Apr 2024

Compared to both open-source and proprietary models, InternVL 1. 5 shows competitive performance, achieving state-of-the-art results in 8 of 18 benchmarks.

Ranked #6 on Visual Question Answering on MM-Vet

4k Language Modelling +3

1,572

0.71 stars / hour

Paper
Code