Align Anything: Training All-Modality Models to Follow Instructions with Language Feedback

pku-alignment/align-anything 20 Dec 2024

In this work, we make the first attempt to fine-tune all-modality models (i. e. input and output with any modality, also named any-to-any models) using human preference data across all modalities (including text, image, audio, and video), ensuring its behavior aligns with human intentions.

Instruction Following

618
0.84 stars / hour

Search-o1: Agentic Search-Enhanced Large Reasoning Models

sunnynexus/search-o1 9 Jan 2025

To address this limitation, we introduce \textbf{Search-o1}, a framework that enhances LRMs with an agentic retrieval-augmented generation (RAG) mechanism and a Reason-in-Documents module for refining retrieved documents.

Code Generation +4

452
0.70 stars / hour

PPTAgent: Generating and Evaluating Presentations Beyond Text-to-Slides

icip-cas/pptagent 7 Jan 2025

Automatically generating presentations from documents is a challenging task that requires balancing content quality, visual design, and structural coherence.

328
0.70 stars / hour

LatentSync: Audio Conditioned Latent Diffusion Models for Lip Sync

bytedance/LatentSync 12 Dec 2024

Since we did not change the overall training framework of SyncNet, our experience can also be applied to other lip sync and audio-driven portrait animation methods that utilize SyncNet.

Portrait Animation

2,009
0.69 stars / hour

KAG: Boosting LLMs in Professional Domains via Knowledge Augmented Generation

openspg/kag 10 Sep 2024

The recently developed retrieval-augmented generation (RAG) technology has enabled the efficient construction of domain-specific applications.

Knowledge Graphs Question Answering +2

4,523
0.69 stars / hour

FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors

ybybzhang/framepainter 14 Jan 2025

We highlight the effectiveness and efficiency of FramePainter across various of editing signals: it domainantly outperforms previous state-of-the-art methods with far less training data, achieving highly seamless and coherent editing of images, \eg, automatically adjust the reflection of the cup.

Image to Video Generation

277
0.58 stars / hour

OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs

akariasai/openscholar 21 Nov 2024

Scientific progress depends on researchers' ability to synthesize the growing body of literature.

Retrieval

589
0.57 stars / hour

TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second

automl/tabpfn 5 Jul 2022

We present TabPFN, a trained Transformer that can do supervised classification for small tabular datasets in less than a second, needs no hyperparameter tuning and is competitive with state-of-the-art classification methods.

AutoML Bayesian Inference +5

2,229
0.50 stars / hour

OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer

om-ai-lab/OmAgent 24 Jun 2024

Recent advancements in Large Language Models (LLMs) have expanded their capabilities to multimodal contexts, including comprehensive video understanding.

AI Agent Video Understanding

1,288
0.49 stars / hour

SVFR: A Unified Framework for Generalized Video Face Restoration

wangzhiyaoo/svfr 2 Jan 2025

In this paper, we propose a novel approach for the Generalized Video Face Restoration (GVFR) task, which integrates video BFR, inpainting, and colorization tasks that we empirically show to benefit each other.

Colorization Representation Learning

607
0.46 stars / hour