OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on

levihsu/ootdiffusion 4 Mar 2024

We present OOTDiffusion, a novel network architecture for realistic and controllable image-based virtual try-on (VTON).

Denoising Image Generation +1

3,991
0.37 stars / hour

APISR: Anime Production Inspired Real-World Anime Super-Resolution

kiteretsu77/apisr 3 Mar 2024

In addition, we identify two anime-specific challenges of distorted and faint hand-drawn lines and unwanted color artifacts.

Super-Resolution

345
0.36 stars / hour

Scalable Optimal Transport Methods in Machine Learning: A Contemporary Survey

abdelwahed/ot_for_big_data 8 May 2023

This paper is about where and how optimal transport is used in machine learning with a focus on the question of scalable optimal transport.

62
0.36 stars / hour

LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement

squeezeailab/llm2llm 22 Mar 2024

LLM2LLM (1) fine-tunes a baseline student LLM on the initial seed data, (2) evaluates and extracts data points that the model gets wrong, and (3) uses a teacher LLM to generate synthetic data based on these incorrect data points, which are then added back into the training data.

Data Augmentation GSM8K +1

46
0.35 stars / hour

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

jiaweizzhao/galore 6 Mar 2024

Our approach reduces memory usage by up to 65. 5% in optimizer states while maintaining both efficiency and performance for pre-training on LLaMA 1B and 7B architectures with C4 dataset with up to 19. 7B tokens, and on fine-tuning RoBERTa on GLUE tasks.

950
0.35 stars / hour

PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model

zamling/psalm 21 Mar 2024

PSALM is a powerful extension of the Large Multi-modal Model (LMM) to address the segmentation task challenges.

Generalized Referring Expression Segmentation Image Segmentation +5

48
0.35 stars / hour

Towards Generalizable Tumor Synthesis

mrgiovanni/difftumor 29 Feb 2024

Tumor synthesis enables the creation of artificial tumors in medical images, facilitating the training of AI models for tumor detection and segmentation.

Computed Tomography (CT)

47
0.35 stars / hour

TextMonkey: An OCR-Free Large Multimodal Model for Understanding Document

yuliang-liu/monkey 7 Mar 2024

We present TextMonkey, a large multimodal model (LMM) tailored for text-centric tasks.

document understanding Key Information Extraction +4

1,228
0.34 stars / hour

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

state-spaces/mamba 1 Dec 2023

Foundation models, now powering most of the exciting applications in deep learning, are almost universally based on the Transformer architecture and its core attention module.

2D Pose Estimation Common Sense Reasoning +2

8,199
0.34 stars / hour

WavCraft: Audio Editing and Generation with Natural Language Prompts

jinhualiang/wavcraft 14 Mar 2024

We introduce WavCraft, a collective system that leverages large language models (LLMs) to connect diverse task-specific models for audio content creation and editing.

In-Context Learning

49
0.33 stars / hour