LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models

hiyouga/llama-factory 20 Mar 2024

Efficient fine-tuning is vital for adapting large language models (LLMs) to downstream tasks.

Language Modelling Text Generation

14,901
0.58 stars / hour

ESRL: Efficient Sampling-based Reinforcement Learning for Sequence Generation

hiyouga/llama-efficient-tuning 4 Aug 2023

Applying Reinforcement Learning (RL) to sequence generation models enables the direct optimization of long-term rewards (\textit{e. g.,} BLEU and human feedback), but typically requires large-scale sampling over a space of action sequences.

Abstractive Text Summarization Language Modelling +5

14,921
0.57 stars / hour

OMG: Occlusion-friendly Personalized Multi-concept Generation in Diffusion Models

kongzhecn/omg 16 Mar 2024

We also observe that the initiation denoising timestep for noise blending is the key to identity preservation and layout.

Denoising Text-to-Image Generation

443
0.57 stars / hour

Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance

KU-CVLAB/Perturbed-Attention-Guidance 26 Mar 2024

These techniques are often not applicable in unconditional generation or in various downstream tasks such as image restoration.

Deblurring Denoising +2

74
0.57 stars / hour

SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time series

badripatro/simba 22 Mar 2024

Transformers have widely adopted attention networks for sequence mixing and MLPs for channel mixing, playing a pivotal role in achieving breakthroughs across domains.

Inductive Bias Time Series +1

71
0.55 stars / hour

UFO: A UI-Focused Agent for Windows OS Interaction

microsoft/UFO 8 Feb 2024

We introduce UFO, an innovative UI-Focused agent to fulfill user requests tailored to applications on Windows OS, harnessing the capabilities of GPT-Vision.

Navigate

3,338
0.54 stars / hour

GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation

justimyhxu/grm 21 Mar 2024

We introduce GRM, a large-scale reconstructor capable of recovering a 3D asset from sparse-view images in around 0. 1s.

3D Reconstruction Image to 3D +1

184
0.54 stars / hour

ChainForge: A Visual Toolkit for Prompt Engineering and LLM Hypothesis Testing

ianarawjo/ChainForge 17 Sep 2023

Evaluating outputs of large language models (LLMs) is challenging, requiring making -- and making sense of -- many responses.

Model Selection Prompt Engineering +1

1,823
0.53 stars / hour

Arcee's MergeKit: A Toolkit for Merging Large Language Models

cg123/mergekit 20 Mar 2024

The rapid expansion of the open-source language model landscape presents an opportunity to merge the competencies of these model checkpoints by combining their parameters.

Language Modelling Multi-Task Learning

2,920
0.52 stars / hour

InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding

opengvlab/internvideo2 22 Mar 2024

We introduce InternVideo2, a new video foundation model (ViFM) that achieves the state-of-the-art performance in action recognition, video-text tasks, and video-centric dialogue.

 Ranked #1 on Audio Classification on ESC-50 (using extra training data)

Action Classification Action Recognition +12

97
0.49 stars / hour