SEED-Story: Multimodal Long Story Generation with Large Language Model

tencentarc/seed-story 11 Jul 2024

We further propose multimodal attention sink mechanism to enable the generation of stories with up to 25 sequences (only 10 for training) in a highly efficient autoregressive manner.

Image Generation Language Modelling +3

320
2.52 stars / hour

Grounding Image Matching in 3D with MASt3R

naver/mast3r 14 Jun 2024

Image Matching is a core component of all best-performing algorithms and pipelines in 3D vision.

3D Reconstruction

357
1.91 stars / hour

MambaVision: A Hybrid Mamba-Transformer Vision Backbone

nvlabs/mambavision 10 Jul 2024

We propose a novel hybrid Mamba-Transformer backbone, denoted as MambaVision, which is specifically tailored for vision applications.

Image Classification Instance Segmentation +3

325
1.74 stars / hour

Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models

stanford-oval/storm 22 Feb 2024

We study how to apply large language models to write grounded and organized long-form articles from scratch, with comparable breadth and depth to Wikipedia pages.

Retrieval

6,284
1.71 stars / hour

Cradle: Empowering Foundation Agents Towards General Computer Control

baai-agents/cradle 5 Mar 2024

To handle this issue, we propose the General Computer Control (GCC) setting to restrict foundation agents to interact with software through the most unified and standardized interface, i. e., using screenshots as input and keyboard and mouse actions as output.

Efficient Exploration

998
1.69 stars / hour

FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs

funaudiollm/cosyvoice 4 Jul 2024

This report introduces FunAudioLLM, a model family designed to enhance natural voice interactions between humans and large language models (LLMs).

Emotion Recognition Event Detection +6

1,977
1.55 stars / hour

Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence

openbmb/ioa 9 Jul 2024

The rapid advancement of large language models (LLMs) has paved the way for the development of highly capable autonomous agents.

162
1.29 stars / hour

RouteLLM: Learning to Route LLMs with Preference Data

lm-sys/routellm 26 Jun 2024

Large language models (LLMs) exhibit impressive capabilities across a wide range of tasks, yet the choice of which model to use often involves a trade-off between performance and cost.

Data Augmentation Transfer Learning

1,867
1.18 stars / hour

OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training

PrimeIntellect-ai/OpenDiLoCo 10 Jul 2024

OpenDiLoCo is an open-source implementation and replication of the Distributed Low-Communication (DiLoCo) training method for large language models.

133
1.15 stars / hour

LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control

KwaiVGI/LivePortrait 3 Jul 2024

Instead of following mainstream diffusion-based methods, we explore and extend the potential of the implicit-keypoint-based framework, which effectively balances computational efficiency and controllability.

Computational Efficiency Face Reenactment +3

6,449
1.06 stars / hour