ClickDiffusion: Harnessing LLMs for Interactive Precise Image Editing

poloclub/clickdiffusion 5 Apr 2024

We demonstrate that by serializing both an image and a multi-modal instruction into a textual representation it is possible to leverage LLMs to perform precise transformations of the layout and appearance of an image.

Image Manipulation

44
0.33 stars / hour

RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment

RLHFlow/RLHF-Reward-Modeling 13 Apr 2023

Utilizing a reward model and a sufficient number of samples, our approach selects the high-quality samples, discarding those that exhibit undesired behavior, and subsequently enhancing the model by fine-tuning on these filtered samples.

Ethics

195
0.33 stars / hour

CDFormer:When Degradation Prediction Embraces Diffusion Model for Blind Image Super-Resolution

i2-multimedia-lab/cdformer 13 May 2024

Existing Blind image Super-Resolution (BSR) methods focus on estimating either kernel or degradation information, but have long overlooked the essential content details.

Image Super-Resolution

55
0.32 stars / hour

Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts

hitsz-tmg/umoe-scaling-unified-multimodal-llms 18 May 2024

Although the Mixture of Experts (MoE) architecture has been employed to efficiently scale large language and image-text models, these efforts typically involve fewer experts and limited modalities.

48
0.29 stars / hour

GIVT: Generative Infinite-Vocabulary Transformers

google-research/big_vision 4 Dec 2023

We introduce generative infinite-vocabulary transformers (GIVT) which generate vector sequences with real-valued entries, instead of discrete tokens from a finite vocabulary.

Conditional Image Generation Decoder +2

1,772
0.29 stars / hour

Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models

assafelovic/gpt-researcher 22 Feb 2024

We study how to apply large language models to write grounded and organized long-form articles from scratch, with comparable breadth and depth to Wikipedia pages.

Retrieval

11,095
0.28 stars / hour

Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head

om-ai-lab/OmDet 11 Mar 2024

End-to-end transformer-based detectors (DETRs) have shown exceptional performance in both closed-set and open-vocabulary object detection (OVD) tasks through the integration of language modalities.

Object object-detection +2

60
0.28 stars / hour

Improving Diffusion Models for Virtual Try-on

yisol/IDM-VTON 8 Mar 2024

Finally, we present a customization method using a pair of person-garment images, which significantly improves fidelity and authenticity.

Virtual Try-on

2,513
0.27 stars / hour

Kolmogorov-Arnold Networks are Radial Basis Function Networks

ZiyaoLi/fast-kan 10 May 2024

This short paper is a fast proof-of-concept that the 3-order B-splines used in Kolmogorov-Arnold Networks (KANs) can be well approximated by Gaussian radial basis functions.

191
0.27 stars / hour

Speaker Embedding-aware Neural Diarization for Flexible Number of Speakers with Textual Information

alibaba-damo-academy/FunASR 28 Nov 2021

In this paper, we reformulate this task as a single-label prediction problem by encoding the multi-speaker labels with power set.

Action Detection Activity Detection +2

3,836
0.26 stars / hour