SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions

IDKiro/sdxs 25 Mar 2024

Recent advancements in diffusion models have positioned them at the forefront of image generation.

Image-to-Image Translation Text-to-Image Generation

465
0.47 stars / hour

A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise

bradyfu/awesome-multimodal-large-language-models 19 Dec 2023

They endow Large Language Models (LLMs) with powerful capabilities in visual understanding, enabling them to tackle diverse multi-modal tasks.

Visual Reasoning

8,673
0.46 stars / hour

HairFastGAN: Realistic and Robust Hair Transfer with a Fast Encoder-Based Approach

airi-institute/hairfastgan 1 Apr 2024

Our paper addresses the complex task of transferring a hairstyle from a reference image to an input photo for virtual hair try-on.

166
0.45 stars / hour

ShareGPT4V: Improving Large Multi-Modal Models with Better Captions

InternLM/InternLM-XComposer 21 Nov 2023

In the realm of large multi-modal models (LMMs), efficient modality alignment is crucial yet often constrained by the scarcity of high-quality image-text data.

Descriptive visual instruction following +2

1,519
0.44 stars / hour

Policy-Guided Diffusion

emptyjackson/policy-guided-diffusion 9 Apr 2024

Our approach provides an effective alternative to autoregressive offline world models, opening the door to the controllable generation of synthetic training data.

54
0.43 stars / hour

StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text

picsart-ai-research/streamingt2v 21 Mar 2024

To overcome these limitations, we introduce StreamingT2V, an autoregressive approach for long video generation of 80, 240, 600, 1200 or more frames with smooth transitions.

Text-to-Video Generation Video Generation

657
0.43 stars / hour

A Light CNN for Deep Face Representation with Noisy Labels

AlfredXiangWu/LightCNN 9 Nov 2015

This paper presents a Light CNN framework to learn a compact embedding on the large-scale face data with massive noisy labels.

Face Identification Face Recognition +2

799
0.43 stars / hour

InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD

internlm/internlm-xcomposer 9 Apr 2024

The Large Vision-Language Model (LVLM) field has seen significant advancements, yet its progression has been hindered by challenges in comprehending fine-grained visual content due to limited resolution.

4k Language Modelling +1

1,520
0.43 stars / hour

Chronos: Learning the Language of Time Series

amazon-science/chronos-forecasting 12 Mar 2024

We introduce Chronos, a simple yet effective framework for pretrained probabilistic time series models.

Gaussian Processes Language Modelling +2

1,533
0.39 stars / hour

JetMoE: Reaching Llama2 Performance with 0.1M Dollars

myshell-ai/jetmoe 11 Apr 2024

Large Language Models (LLMs) have achieved remarkable results, but their increasing resource demand has become a major obstacle to the development of powerful and accessible super-human intelligence.

843
0.39 stars / hour