Trending Research

Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models

FoundationVision/Groma • • 19 Apr 2024

We introduce Groma, a Multimodal Large Language Model (MLLM) with grounded and fine-grained visual perception ability.

Language Modelling Large Language Model +2

340

0.58 stars / hour

Paper
Code

OpenVoice: Versatile Instant Voice Cloning

myshell-ai/openvoice • • 3 Dec 2023

The voice styles are not directly copied from and constrained by the style of the reference speaker.

Voice Cloning

23,951

0.53 stars / hour

Paper
Code

AgentScope: A Flexible yet Robust Multi-Agent Platform

modelscope/agentscope • 21 Feb 2024

With the rapid advancement of Large Language Models (LLMs), significant progress has been made in multi-agent applications.

1,226

0.53 stars / hour

Paper
Code

QLoRA: Efficient Finetuning of Quantized LLMs

internlm/xtuner • • NeurIPS 2023

Our best model family, which we name Guanaco, outperforms all previous openly released models on the Vicuna benchmark, reaching 99. 3% of the performance level of ChatGPT while only requiring 24 hours of finetuning on a single GPU.

Chatbot Instruction Following +2

2,445

0.52 stars / hour

Paper
Code

STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases

snap-stanford/stark • • 19 Apr 2024

Answering real-world user queries, such as product search, often requires accurate retrieval of information from semi-structured knowledge bases or databases that involve blend of unstructured (e. g., textual descriptions of products) and structured (e. g., entity relations of products) information.

Benchmarking Retrieval

184

0.50 stars / hour

Paper
Code

SoccerNet Game State Reconstruction: End-to-End Athlete Tracking and Identification on a Minimap

soccernet/sn-gamestate • • 17 Apr 2024

This tracking and identification process is crucial for reconstructing the game state, defined by the athletes' positions and identities on a 2D top-view of the pitch, (i. e. a minimap).

Ranked #1 on Game State Reconstruction on SoccerNet-GSR

Camera Calibration Game State Reconstruction

128

0.50 stars / hour

Paper
Code

MolTC: Towards Molecular Relational Modeling In Language Models

MangoKiller/MolTC • • 6 Feb 2024

Molecular Relational Learning (MRL), aiming to understand interactions between molecular pairs, plays a pivotal role in advancing biochemical research.

Relational Reasoning

175

0.46 stars / hour

Paper
Code

UFO: A UI-Focused Agent for Windows OS Interaction

microsoft/UFO • 8 Feb 2024

We introduce UFO, an innovative UI-Focused agent to fulfill user requests tailored to applications on Windows OS, harnessing the capabilities of GPT-Vision.

Navigate

4,430

0.46 stars / hour

Paper
Code

WebLINX: Real-World Website Navigation with Multi-Turn Dialogue

McGill-NLP/webllama • • 8 Feb 2024

We propose the problem of conversational web navigation, where a digital agent controls a web browser and follows user instructions to solve real-world tasks in a multi-turn dialogue fashion.

Ranked #1 on Conversational Web Navigation on WebLINX

Conversational Web Navigation Text Generation +1

1,073

0.45 stars / hour

Paper
Code

Make Your LLM Fully Utilize the Context

microsoft/FILM • • 25 Apr 2024

While many contemporary large language models (LLMs) can process lengthy input, they still struggle to fully utilize information within the long context, known as the lost-in-the-middle challenge.

4k Information Retrieval +1

191

0.44 stars / hour

Paper
Code