Trending Research

SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions

IDKiro/sdxs • • 25 Mar 2024

Recent advancements in diffusion models have positioned them at the forefront of image generation.

Image-to-Image Translation Text-to-Image Generation

465

0.47 stars / hour

Paper
Code

A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise

bradyfu/awesome-multimodal-large-language-models • 19 Dec 2023

They endow Large Language Models (LLMs) with powerful capabilities in visual understanding, enabling them to tackle diverse multi-modal tasks.

Visual Reasoning

8,673

0.46 stars / hour

Paper
Code

HairFastGAN: Realistic and Robust Hair Transfer with a Fast Encoder-Based Approach

airi-institute/hairfastgan • • 1 Apr 2024

Our paper addresses the complex task of transferring a hairstyle from a reference image to an input photo for virtual hair try-on.

166

0.45 stars / hour

Paper
Code

ShareGPT4V: Improving Large Multi-Modal Models with Better Captions

InternLM/InternLM-XComposer • • 21 Nov 2023

In the realm of large multi-modal models (LMMs), efficient modality alignment is crucial yet often constrained by the scarcity of high-quality image-text data.

Ranked #1 on visual instruction following on LLaVA-Bench

Descriptive visual instruction following +2

1,519

0.44 stars / hour

Paper
Code

Policy-Guided Diffusion

emptyjackson/policy-guided-diffusion • • 9 Apr 2024

Our approach provides an effective alternative to autoregressive offline world models, opening the door to the controllable generation of synthetic training data.

0.43 stars / hour

Paper
Code

StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text

picsart-ai-research/streamingt2v • • 21 Mar 2024

To overcome these limitations, we introduce StreamingT2V, an autoregressive approach for long video generation of 80, 240, 600, 1200 or more frames with smooth transitions.

Text-to-Video Generation Video Generation

657

0.43 stars / hour

Paper
Code

A Light CNN for Deep Face Representation with Noisy Labels

AlfredXiangWu/LightCNN • • 9 Nov 2015

This paper presents a Light CNN framework to learn a compact embedding on the large-scale face data with massive noisy labels.

Ranked #2 on Age-Invariant Face Recognition on CAFR

Face Identification Face Recognition +2

799

0.43 stars / hour

Paper
Code

InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD

internlm/internlm-xcomposer • • 9 Apr 2024

The Large Vision-Language Model (LVLM) field has seen significant advancements, yet its progression has been hindered by challenges in comprehending fine-grained visual content due to limited resolution.

Ranked #11 on Visual Question Answering on MM-Vet

4k Language Modelling +1

1,520

0.43 stars / hour

Paper
Code

Chronos: Learning the Language of Time Series

amazon-science/chronos-forecasting • • 12 Mar 2024

We introduce Chronos, a simple yet effective framework for pretrained probabilistic time series models.

Gaussian Processes Language Modelling +2

1,533

0.39 stars / hour

Paper
Code

JetMoE: Reaching Llama2 Performance with 0.1M Dollars

myshell-ai/jetmoe • • 11 Apr 2024

Large Language Models (LLMs) have achieved remarkable results, but their increasing resource demand has become a major obstacle to the development of powerful and accessible super-human intelligence.

843

0.39 stars / hour

Paper
Code