CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts

shi-labs/cumo 9 May 2024

Recent advancements in Multimodal Large Language Models (LLMs) have focused primarily on scaling by increasing text-image pair data and enhancing LLMs to improve performance on multimodal tasks.

 Ranked #1 on Visual Question Answering on MMBench (GPT-3.5 score metric)

Image Captioning visual instruction following +1

51
0.49 stars / hour

WavCraft: Audio Editing and Generation with Large Language Models

jinhualiang/wavcraft 14 Mar 2024

We introduce WavCraft, a collective system that leverages large language models (LLMs) to connect diverse task-specific models for audio content creation and editing.

In-Context Learning

330
0.47 stars / hour

How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

opengvlab/internvl 25 Apr 2024

Compared to both open-source and proprietary models, InternVL 1. 5 shows competitive performance, achieving state-of-the-art results in 8 of 18 benchmarks.

4k Language Modelling +3

1,724
0.45 stars / hour

VILA: On Pre-training for Visual Language Models

efficient-large-model/vila 12 Dec 2023

Visual language models (VLMs) rapidly progressed with the recent success of large language models.

In-Context Learning Language Modelling +2

586
0.42 stars / hour

Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models

prometheus-eval/prometheus-eval 2 May 2024

Proprietary LMs such as GPT-4 are often employed to assess the quality of responses from various LMs.

Language Modelling

452
0.41 stars / hour

Dynamic in Static: Hybrid Visual Correspondence for Self-Supervised Video Object Segmentation

nust-machine-intelligence-laboratory/hvc 21 Apr 2024

To achieve this objective, we present a unified self-supervised approach to learn visual representations of static-dynamic feature similarity.

Semantic Segmentation Video Object Segmentation +1

78
0.40 stars / hour

Make Your LLM Fully Utilize the Context

hsiehjackson/ruler 25 Apr 2024

While many contemporary large language models (LLMs) can process lengthy input, they still struggle to fully utilize information within the long context, known as the lost-in-the-middle challenge.

4k Information Retrieval +1

149
0.39 stars / hour

AIOS: LLM Agent Operating System

agiresearch/aios 25 Mar 2024

Inspired by these challenges, this paper presents AIOS, an LLM agent operating system, which embeds large language model into operating systems (OS) as the brain of the OS, enabling an operating system "with soul" -- an important step towards AGI.

Language Modelling Large Language Model +1

2,667
0.39 stars / hour

QLoRA: Efficient Finetuning of Quantized LLMs

internlm/xtuner NeurIPS 2023

Our best model family, which we name Guanaco, outperforms all previous openly released models on the Vicuna benchmark, reaching 99. 3% of the performance level of ChatGPT while only requiring 24 hours of finetuning on a single GPU.

Chatbot Instruction Following +2

2,617
0.39 stars / hour

AlphaMath Almost Zero: process Supervision without process

MARIO-Math-Reasoning/Super_MARIO 6 May 2024

We proceed to train a step-level value model designed to improve the LLM's inference process in mathematical domains.

Mathematical Reasoning

66
0.38 stars / hour