Trending Research

CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts

shi-labs/cumo • • 9 May 2024

Recent advancements in Multimodal Large Language Models (LLMs) have focused primarily on scaling by increasing text-image pair data and enhancing LLMs to improve performance on multimodal tasks.

Ranked #1 on Visual Question Answering on MMBench (GPT-3.5 score metric)

Image Captioning visual instruction following +1

0.49 stars / hour

Paper
Code

WavCraft: Audio Editing and Generation with Natural Language Prompts

jinhualiang/wavcraft • • 14 Mar 2024

We introduce WavCraft, a collective system that leverages large language models (LLMs) to connect diverse task-specific models for audio content creation and editing.

In-Context Learning

330

0.47 stars / hour

Paper
Code

How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

opengvlab/internvl • • 25 Apr 2024

Compared to both open-source and proprietary models, InternVL 1. 5 shows competitive performance, achieving state-of-the-art results in 8 of 18 benchmarks.

Ranked #6 on Visual Question Answering on MM-Vet

4k Language Modelling +3

1,724

0.45 stars / hour

Paper
Code

VILA: On Pre-training for Visual Language Models

efficient-large-model/vila • • 12 Dec 2023

Visual language models (VLMs) rapidly progressed with the recent success of large language models.

Ranked #24 on Visual Question Answering on MM-Vet

In-Context Learning Language Modelling +2

586

0.42 stars / hour

Paper
Code

Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models

prometheus-eval/prometheus-eval • 2 May 2024

Proprietary LMs such as GPT-4 are often employed to assess the quality of responses from various LMs.

Language Modelling

438

0.41 stars / hour

Paper
Code

Dynamic in Static: Hybrid Visual Correspondence for Self-Supervised Video Object Segmentation

nust-machine-intelligence-laboratory/hvc • • 21 Apr 2024

To achieve this objective, we present a unified self-supervised approach to learn visual representations of static-dynamic feature similarity.

Semantic Segmentation Video Object Segmentation +1

0.40 stars / hour

Paper
Code

Make Your LLM Fully Utilize the Context

hsiehjackson/ruler • • 25 Apr 2024

While many contemporary large language models (LLMs) can process lengthy input, they still struggle to fully utilize information within the long context, known as the lost-in-the-middle challenge.

4k Information Retrieval +1

149

0.39 stars / hour

Paper
Code

AIOS: LLM Agent Operating System

agiresearch/aios • 25 Mar 2024

Inspired by these challenges, this paper presents AIOS, an LLM agent operating system, which embeds large language model into operating systems (OS) as the brain of the OS, enabling an operating system "with soul" -- an important step towards AGI.

Language Modelling Large Language Model +1

2,667

0.39 stars / hour

Paper
Code

QLoRA: Efficient Finetuning of Quantized LLMs

internlm/xtuner • • NeurIPS 2023

Our best model family, which we name Guanaco, outperforms all previous openly released models on the Vicuna benchmark, reaching 99. 3% of the performance level of ChatGPT while only requiring 24 hours of finetuning on a single GPU.

Chatbot Instruction Following +2

2,617

0.39 stars / hour

Paper
Code

AlphaMath Almost Zero: process Supervision without process

MARIO-Math-Reasoning/Super_MARIO • 6 May 2024

We proceed to train a step-level value model designed to improve the LLM's inference process in mathematical domains.

Mathematical Reasoning

0.38 stars / hour

Paper
Code